A Noise Suppression Method for Speech Signal by Jointly Using Bayesian Estimation and Fuzzy Theory

Akira Ikuta; Hisako Orimoto; Kouji Hasegawa

doi:10.4236/jsea.2021.1412037

Journal of Software Engineering and Applications > Vol.14 No.12, December 2021

A Noise Suppression Method for Speech Signal by Jointly Using Bayesian Estimation and Fuzzy Theory

Akira Ikuta¹, Hisako Orimoto¹, Kouji Hasegawa²
¹Department of Management Information Systems, Prefectural University of Hiroshima, Hiroshima, Japan.
²Western Region Industrial Research Center, Hiroshima Prefectural Technology Research Institute, Kure, Japan.
DOI: 10.4236/jsea.2021.1412037 PDF HTML XML 153 Downloads 675 Views Citations

Abstract

Speech recognition systems have been applied to inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult. In these actual circumstances, some countermeasure methods for surrounding noise are indispensable. In this study, a new method to remove the noise for actual speech signal was proposed by using Bayesian estimation with the aid of bone-conducted speech and fuzzy theory. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal was theoretically derived. In the proposed noise suppression method, bone-conducted speech signal with the reduced high-frequency components was regarded as fuzzy observation data, and a stochastic model for the bone-conducted speech was derived by applying the probability measure of fuzzy events. The proposed method was applied to speech signals measured in real environment with low SNR, and better results were obtained than an algorithm based on observation of only air-conducted speech.

Keywords

Air- and Bone-Conducted Speeches, Noise Suppression, Bayesian Estimation, Fuzzy Data

Share and Cite:

Ikuta, A. , Orimoto, H. and Hasegawa, K. (2021) A Noise Suppression Method for Speech Signal by Jointly Using Bayesian Estimation and Fuzzy Theory. Journal of Software Engineering and Applications, 14, 631-645. doi: 10.4236/jsea.2021.1412037.

1. Introduction

Speech recognition systems have been applied to various fields, for example, to inspection and maintenance operations in industrial factories and at construction sites, etc. where hand-writing is difficult. For speech recognition in such actual circumstances, some suppression methods for surrounding noises are indispensable.

Previously reported methods for noise reduction in speech recognition can be classified into two categories. One is based on a single microphone [1] [2], and the other uses a microphone array [3]. Since the latter requires a priori information on the number of noise sources, and the number of microphones larger than that of the noise sources is needed in the case of multi-noise sources, this category demands large scale systems. Therefore, the former based on a single microphone is more advantageous than the latter [4] [5]. In such a noise suppression task for speech signals based on a single microphone, many algorithms applying the Kalman filter have been proposed up to now [6] [7] [8] [9]. However, the Kalman filter is originally based on the assumption of Gaussian white noise [10]. The actual noises show complex fluctuation forms with non-Gaussian and non-white properties.

From the above viewpoint, in our previously reported study, a noise suppression algorithm for the actual speech signals without requirement of the assumption of Gaussian white noise has been proposed [11]. The method can be applied to actual complex situation where both the noise statistics and the fluctuation forms of speech signal are unknown. By applying the algorithm to real speech signals with several kinds of noises, its effectiveness has been experimentally confirmed in comparison with the Kalman filter.

Furthermore, signal processing methods to remove the noise for actual speech signals have been proposed by jointly using the measured data of bone- and air-conducted speech signals [12] [13]. However, the algorithms of the previous methods were introduced a simple additive model of the original speech signal and surrounding noise for the air-conducted speech observation. Furthermore, the derived algorithms have applied to only the signals mixed with noises on computer, and not to signals in real environment under existence of noises.

In this study, a new noise suppression method for speech signals is proposed by using Bayes theorem after employing a posterior distribution based on the air-conducted speech observation contaminated by surrounding noise. In the proposed algorithm, in order to improve the accuracy of estimation of speech signal, an expansion expression of conditional probability density function reflecting all linear and non-linear correlation information between original speech signal and air-conducted speech observation is adopted as the model of the speech observation. Then, a probability distribution with parameters estimated from the bone-conducted speech is adopted as the prior distribution. Furthermore, the algorithm proposed in this study is applied to signals measured in real environment under existence of noises.

Though the bone-conducted speech signal is a kind of solid propagation sound with less effect by the surrounding noise, the high frequency components of the signal are reduced through the propagation process [14]. After considering the bone-conducted speech signal with the reduction of higher components as fuzzy data, applying the probability measure of fuzzy events [15], a new simplified noise suppression method is derived by reflecting the air- and bone-conducted speech signals.

The effectiveness of the proposed method is confirmed by applying it to bone- and air-conducted speech measured in a real environment under the existence of surrounding noise.

2. Theoretical Consideration

2.1. Stochastic Model for Air- and Bone-Conducted Speech Signals by Introducing Fuzzy Theory

In the actual environment with a surrounding noise, let $x_{k}$ , $y_{k}$ and $z_{k}$ be the original speech signal, the observations of air- and bone-conducted speech signals at a discrete time k. The observation $y_{k}$ is contaminated by a surrounding noise $v_{k}$ . In our previous studies, a simple additive model was considered for the air-conducted speech observation $y_{k}$ [12] [13]. In this study, in order to improve the accuracy of estimation of speech signal $x_{k}$ , an expansion expression of conditional probability density function $P (y_{k} | x_{k})$ [11] reflecting all linear and non-linear correlation information between $x_{k}$ and $y_{k}$ is adopted as the model of air-conducted speech observation.

$\begin{matrix} P (y_{k} | x_{k}) = P (x_{k}, y_{k}) / P (x_{k}) \\ = P (y_{k}) \sum_{r = 0}^{\infty} \sum_{s = 0}^{\infty} A_{r s} θ_{r}^{(1)} (x_{k}) θ_{s}^{(2)} (y_{k}) \end{matrix}$ (1)

with

$A_{r s} \equiv 〈 θ_{r}^{(1)} (x_{k}) θ_{s}^{(2)} (y_{k}) 〉$ , (2)

where $〈 〉$ denotes the averaging operation on variables.

As the probability density functions $x_{k}$ and $y_{k}$ showing non-Gaussian distribution, the following statistical orthonormal expansion series expressions are adopted.

$P (x_{k}) = N (x_{k}; μ_{x}, σ_{x}^{2}) \sum_{i = 0}^{\infty} B_{i} \frac{1}{\sqrt{i!}} H_{i} (\frac{x_{k} - μ_{x}}{σ_{x}})$ , (3)

$P (y_{k}) = N (y_{k}; μ_{y}, σ_{y}^{2}) \sum_{i = 0}^{\infty} C_{i} \frac{1}{\sqrt{i!}} H_{i} (\frac{y_{k} - μ_{y}}{σ_{y}})$ (4)

with

$μ_{x} \equiv 〈 x_{k} 〉$ , $σ_{x}^{2} \equiv 〈 {(x_{k} - μ_{x})}^{2} 〉$ ,

$B_{i} \equiv 〈 \frac{1}{\sqrt{i!}} H_{i} (\frac{x_{k} - μ_{x}}{σ_{x}}) 〉$ , $B_{0} = 1$ , $B_{1} = B_{2} = 0$ ,

$C_{i} \equiv 〈 \frac{1}{\sqrt{i!}} H_{i} (\frac{y_{k} - μ_{y}}{σ_{y}}) 〉$ , $C_{0} = 1$ , $C_{1} = C_{2} = 0$ ,

$N (x; μ, σ^{2}) \equiv \frac{1}{\sqrt{2 π σ^{2}}} \exp {- \frac{{(x - μ)}^{2}}{2 σ^{2}}}$ , (5)

where $H_{i} ()$ is a Hermite polynomial with ith order. Functions $θ_{r}^{(1)} (x_{k})$ and $θ_{s}^{(2)} (y_{k})$ are orthonormal polynomials having weighting functions $P (x_{k})$ and $P (y_{k})$ , respectively. These orthonormal polynomials can be decomposed into linearly independent series as

$θ_{r}^{(1)} (x_{k}) = \sum_{i = 0}^{r} λ_{r i}^{(1)} \frac{1}{\sqrt{i!}} H_{i} (\frac{x_{k} - μ_{x}}{σ_{x}})$ , (6)

$θ_{s}^{(2)} (y_{k}) = \sum_{i = 0}^{s} λ_{s i}^{(2)} \frac{1}{\sqrt{i!}} H_{i} (\frac{y_{k} - μ_{y}}{σ_{y}})$ . (7)

The coefficients $λ_{r i}^{(1)}$ and $λ_{s i}^{(2)}$ are calculated beforehand by using Schmidt’s orthogonalization algorithm [16]. The expansion coefficients $A_{r s}$ with order $r \leq R$ , $s \leq S$ can be obtained from the correlation relationship between original speech signal $x_{k}$ and noisy observation of air-conducted speech $y_{k}$ . Since the original speech signal is unknown in the presence of noise, these coefficients have to be estimated on the basis of the observation $y_{k}$ . Let’s regard the expansion coefficients $A_{r s}$ as unknown parameter vector $a$ :

$a \equiv (a_{11}, \dots, a_{R 1}, a_{12}, \dots, a_{R 2}, \dots, a_{1 S}, \dots, a_{R S})$ ,

$a_{r s} \equiv A_{r s}$ , $(r = 1, 2, \dots, R; s = 1, 2, \dots, S)$ , (8)

the following simple dynamical model is introduced for the simultaneous estimation of the parameters with the specific signal $x_{k}$ :

$a_{k + 1} = a_{k}$ , (9)

Next, in order to express the relationship between the original speech signal and bone-conducted speech, after regarding the bone-conducted speech as fuzzy data, the conditional probability distribution function $P (x_{k} | z_{k})$ can be obtained by applying the probability measure of fuzzy events [15] to (1), as follows.

$\begin{array}{l} P (x_{k} | z_{k}) = P (x_{k}, z_{k}) / P (z_{k}) \\ = \int m_{{\bar{y}}_{k}} (y_{k}) P (x_{k}, y_{k}) d y_{k} / \int m_{{\bar{y}}_{k}} (y_{k}) P (y_{k}) d y_{k} (\equiv N (x_{k}, z_{k}) / D (z_{k})) \end{array}$ (10)

where $m_{{\bar{y}}_{k}} (y_{k})$ is a membership function of the bone-conducted speech $z_{k}$ , and a Gaussian type function:

$m_{{\bar{y}}_{k}} (y_{k}) = \exp {- α {(y_{k} - {\bar{y}}_{k})}^{2}}$ , $({\bar{y}}_{k} \equiv a + b z_{k})$ , (11)

where a and b are constants and $α (> 0)$ is a parameter, is adopted. Accordingly, by considering $P (x_{k}, y_{k})$ in Equation (1) and $P (y_{k})$ in Equation (4), and the membership function in Equation (11), the numerator of Equation (10) can be expressed as follows:

$\begin{matrix} N (x_{k}, z_{k}) = P (x_{k}) \frac{e^{K_{3}}}{\sqrt{2 K_{1} σ_{y}^{2}}} \int^{} (\frac{1}{\sqrt{π / K_{1}}}) \exp {- \frac{{(y_{k} - K_{2})}^{2}}{1 / K_{1}}} \\ \cdot \sum_{i = 0}^{\infty} C_{i} \frac{1}{\sqrt{i!}} H_{i} (\frac{y_{k} - μ_{y}}{σ_{y}}) \sum_{r = 0}^{\infty} \sum_{s = 0}^{\infty} A_{r s} θ_{r}^{(1)} (x_{k}) θ_{s}^{(2)} (y_{k}) d y_{k} \end{matrix}$ (12)

with

$K_{1} \equiv (2 α σ_{y}^{2} + 1) / (2 σ_{y}^{2})$ , $K_{2} \equiv (2 α σ_{y}^{2} {\bar{y}}_{k} + μ_{y}) / (2 α σ_{y}^{2} + 1)$ ,

$K_{3} \equiv K_{1} (K_{2}^{2} - \frac{2 α σ_{y}^{2} {\bar{y}}_{k}^{2} + μ_{y}^{2}}{2 α σ_{y}^{2} + 1})$ . (13)

After considering the equality on Hermite polynomial:

$H_{i} (\frac{y_{k} - μ_{y}}{σ_{y}}) = \sum_{j = 0}^{i} d_{i j} H_{i} (\frac{y_{k} - K_{2}}{\sqrt{1 / 2 K_{1}}})$ , (14)

where $d_{i j}$ are expansion coefficients reflecting bone-conducted speech signal, and using the orthonormal condition:

$\int N (y_{k}; K_{2}, 1 / 2 K_{1}) H_{j} (\frac{y_{k} - K_{2}}{\sqrt{1 / 2 K_{1}}}) H_{j^{'}} (\frac{y_{k} - K_{2}}{\sqrt{1 / 2 K_{1}}}) d y_{k} = j! \cdot δ_{j j^{'}}$ , (15)

the integral in Equation (12) can be calculated. Thus, the following expression is derived

$N (x_{k}, z_{k}) = P (x_{k}) \frac{e^{K_{3}}}{\sqrt{2 K_{1} σ_{y}^{2}}} \sum_{i = 0}^{\infty} \frac{1}{\sqrt{i!}} C_{i} \sum_{r = 0}^{R} \sum_{s = 0}^{S} F_{s i} (z_{k}) a_{r s, k} θ_{r}^{(1)} (x_{k})$ , (16)

$F_{s i} (z_{k}) \equiv \sum_{t = 0}^{a} \sum_{j = 0}^{\min {i, t}} λ_{s t}^{(2)} \frac{1}{\sqrt{t!}} d_{i j} d_{t j} j!$ . (17)

Furthermore, through the similar calculation process, the denominator of Equation (10) can be derived as follows:

$D (z_{k}) = \frac{e^{K_{3}}}{\sqrt{2 K_{1} σ_{y}^{2}}} G (z_{k})$ , $G (z_{k}) \equiv \sum_{i = 0}^{\infty} \frac{1}{\sqrt{i!}} C_{i} d_{i 0}$ . (18)

Therefore, by substituting Equations (16) and (18) into Equation (10), the conditional probability distribution function $P (x_{k} | z_{k})$ can be expressed explicitly.

2.2. Derivation of Noise Suppression Algorithm Based on Bayesian Estimation

To derive an estimation algorithm for the speech signal $x_{k}$ , the Bayes’ theorem for the conditional probability distribution [17] is first considered. Since the parameter $a$ is also unknown, the conditional joint probability distribution of $x_{k}$ and $a_{k}$ is expressed as

$P (x_{k}, a_{k} | Y_{k}) = P (x_{k}, a_{k}, y_{k} | Y_{k - 1}) / P (y_{k} | Y_{k - 1})$ , (19)

where $Y_{k} (\equiv {y_{1}, y_{2}, \dots, y_{k}})$ is a set of air-conducted speech data up to time k. By expanding the conditional joint probability distribution $P (x_{k}, a_{k}, y_{k} | Y_{k - 1})$ in a statistical orthogonal expansion series on the basis of the well-known Gaussian distribution and calculating the conditional expectation, the estimates of $x_{k}$ and $a_{r s, k}$ for mean can be derived as follows:

$\begin{array}{l} {\hat{x}}_{k} \equiv 〈 x_{k} | Y_{k} 〉 \\ = \sum_{n = 0}^{\infty} {B_{00 n} E_{00}^{10} + B_{10 n} E_{10}^{10}} \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) / \sum_{n = 0}^{\infty} B_{00 n} \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) \end{array}$ (20)

$\begin{array}{l} {\hat{a}}_{r s, k} \equiv 〈 a_{r s, k} | Y_{k} 〉 \\ = \sum_{n = 0}^{\infty} {B_{00 n} E_{00}^{01} + B_{01 n} E_{01}^{01}} \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) / \sum_{n = 0}^{\infty} B_{00 n} \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) \end{array}$ (21)

with

$E_{00}^{10} = x_{k}^{*} (\equiv 〈 x_{k} | Y_{k - 1} 〉)$ , $E_{10}^{10} = \sqrt{Γ_{x_{k}}}$ , $Γ_{x_{k}} \equiv 〈 {(x_{k} - x_{k}^{*})}^{2} | Y_{k - 1} 〉$ ,

$E_{00}^{01} = a_{r s, k}^{*} (\equiv 〈 a_{r s, k} | Y_{k - 1} 〉)$ , $E_{01}^{01} = \sqrt{Γ_{a_{r s, k}}}$ , $Γ_{a_{r s, k}} \equiv 〈 {(a_{r s, k} - a_{r s, k}^{*})}^{2} | Y_{k - 1} 〉$ ,

$y_{k}^{*} \equiv 〈 y_{k} | Y_{k - 1} 〉$ , $Ω_{k} \equiv 〈 {(y_{k} - y_{k}^{*})}^{2} | Y_{k - 1} 〉$ ,

$B_{l m n} \equiv 〈 \frac{1}{\sqrt{l!}} H_{l} (\frac{x_{k} - x_{k}^{*}}{\sqrt{Γ_{x_{k}}}}) \prod_{r = 0}^{R} \prod_{s = 0}^{S} \frac{1}{\sqrt{m_{r s}!}} H_{m_{r s}} (\frac{a_{r s, k} - a_{r s, k}^{*}}{\sqrt{Γ_{a_{r s, k}}}}) \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) | Y_{k - 1} 〉$ . (22)

Furthermore, the estimate of $a_{r s, k}$ for variance is derived as follows:

$\begin{array}{l} P_{a_{r s, k}} \equiv 〈 {(a_{r s, k} - {\hat{a}}_{r s, k})}^{2} | Y_{k} 〉 \\ = \sum_{n = 0}^{\infty} {B_{00 n} E_{00}^{02} + B_{01 n} E_{01}^{02} + B_{02 n} E_{02}^{02}} \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) / \sum_{n = 0}^{\infty} B_{00 n} \frac{1}{\sqrt{n!}} H_{n} (\frac{y_{k} - y_{k}^{*}}{\sqrt{Ω_{k}}}) \end{array}$ (23)

with

$E_{00}^{02} = Γ_{a_{r s, k}} + {(a_{r s, k}^{*} - {\hat{a}}_{r s, k})}^{2}$ , $E_{01}^{02} = 2 \sqrt{Γ_{a_{r s, k}}} (a_{r s, k}^{*} - {\hat{a}}_{r s, k})$ , $E_{02}^{02} = \sqrt{2} Γ_{a_{r s, k}}$ . (24)

Using Equation (1) and the orthonormal property of $θ_{s}^{(2)} (y_{k})$ , variables $y_{k}^{*}$ and $Ω_{k}$ in Equations (20) (21) and (23) can be calculated as follows:

$\begin{matrix} y_{k}^{*} = 〈 \int y_{k} P (y_{k} | x_{k}) d y_{k} | Y_{k - 1} 〉 \\ = 〈 \sum_{r = 0}^{\infty} \sum_{s = 0}^{1} e_{1 s} A_{r s} θ_{r}^{(1)} (x_{k}) | Y_{k - 1} 〉 \\ = \sum_{r = 0}^{R} \sum_{s = 0}^{1} e_{1 s} a_{r s, k}^{*} 〈 θ_{r}^{(1)} (x_{k}) | Y_{k - 1} 〉 \end{matrix}$ (25)

$\begin{matrix} Ω_{k} = 〈 \int {(y_{k} - y_{k}^{*})}^{2} P (y_{k} | x_{k}) d y_{k} | Y_{k - 1} 〉 \\ = \sum_{r = 0}^{R} \sum_{s = 0}^{2} e_{2 s} a_{r s, k}^{*} 〈 θ_{r}^{(1)} (x_{k}) | Y_{k - 1} 〉 \end{matrix}$ (26)

with

$e_{10} = μ_{y}$ , $e_{11} = σ_{y}$ ,

$e_{20} = f_{20} - (\frac{f_{21}}{λ_{11}^{(2)}} - \frac{f_{22}}{λ_{11}^{(2)} λ_{22}^{(2)}} λ_{21}^{(2)}) λ_{10}^{(2)} - \frac{f_{22}}{λ_{22}^{(2)}} λ_{20}^{(2)}$ ,

$e_{21} = \frac{f_{21}}{λ_{11}^{(2)}} - \frac{f_{22}}{λ_{11}^{(2)} λ_{22}^{(2)}} λ_{21}^{(2)}$ , $e_{22} = \frac{f_{22}}{λ_{22}^{(2)}}$ ,

$f_{20} = {(μ_{y} - y_{k}^{*})}^{2} + σ_{y}^{2}$ , $f_{21} = 2 σ_{y} (μ_{y} - y_{k}^{*})$ , $f_{22} = \sqrt{2} σ_{y}^{2}$ . (27)

Furthermore, by considering Equations (10) (16) (18) and orthonormal property of $θ_{r}^{(1)} (x_{k})$ , variables $x_{k}^{*}$ , $Γ_{x_{k}}$ in Equation (22) and the conditional expectation in Equations (25) (26) can be calculated as follows:

$\begin{matrix} x_{k}^{*} = 〈 \int x_{k} P (x_{k} | z_{k}) d x_{k} | Y_{k - 1} 〉 \\ = \sum_{i = 0}^{\infty} \frac{1}{\sqrt{i!}} C_{i} \sum_{r = 0}^{1} \sum_{s = 0}^{S} h_{1 r} F_{s i} (z_{k}) a_{r s, k}^{*} / G (z_{k}) \end{matrix}$ (28)

$\begin{matrix} Γ_{x_{k}} = 〈 \int {(x_{k} - x_{k}^{*})}^{2} P (x_{k} | z_{k}) d x_{k} | Y_{k - 1} 〉 \\ = \sum_{i = 0}^{\infty} \frac{1}{\sqrt{i!}} C_{i} \sum_{r = 0}^{2} \sum_{s = 0}^{S} h_{2 r} F_{s i} (z_{k}) a_{r s, k}^{*} / G (z_{k}) \end{matrix}$ (29)

$\begin{matrix} 〈 θ_{r}^{(1)} (x_{k}) | Y_{k - 1} 〉 = 〈 \int θ_{r}^{(1)} (x_{k}) P (x_{k} | z_{k}) d x_{k} | Y_{k - 1} 〉 \\ = \sum_{i = 0}^{\infty} \frac{1}{\sqrt{i!}} C_{i} \sum_{s = 0}^{S} F_{s i} (z_{k}) a_{r s, k}^{*} / G (z_{k}) \end{matrix}$ (30)

with

$h_{10} = μ_{x}$ , $h_{11} = σ_{x}$ ,

$h_{20} = p_{20} - (\frac{p_{21}}{λ_{11}^{(1)}} - \frac{p_{22}}{λ_{11}^{(1)} λ_{22}^{(1)}} λ_{21}^{(1)}) λ_{10}^{(1)} - \frac{p_{22}}{λ_{22}^{(1)}} λ_{20}^{(1)}$ ,

$h_{21} = \frac{p_{21}}{λ_{11}^{(1)}} - \frac{p_{22}}{λ_{11}^{(1)} λ_{22}^{(1)}} λ_{21}^{(1)}$ , $h_{22} = \frac{p_{22}}{λ_{22}^{(1)}}$ ,

$p_{20} = {(μ_{x} - x_{k}^{*})}^{2} + σ_{x}^{2}$ , $p_{21} = 2 σ_{x} (μ_{x} - x_{k}^{*})$ , $p_{22} = \sqrt{2} σ_{x}^{2}$ . (31)

Since Equations (28) (29) and (30) can be evaluated by measuring bone-conducted speech $z_{k}$ , no time transition models of $x_{k}$ are necessary. Therefore, computation time of the proposed algorithm can be reduced than the previous one [12]. Furthermore, by considering Equation (9), two parameters $a_{r s, k}^{*}$ and $Γ_{a_{r s, k}}$ in Equation (22) are given by the estimates of $a_{r s, k}$ at the discrete time $k - 1$ , as follows:

$a_{r s, k}^{*} = {\hat{a}}_{r s, k - 1}$ , $Γ_{a_{r s, k}} = P_{a_{r s, k - 1}}$ . (32)

Finally, considering Equations (1) (9) and (10), the expansion coefficients $B_{l m n}$ in the estimation algorithm in Equations (20) (21) and (23) are given by the measurement of bone-conducted speech $z_{k}$ , estimates of parameter $a_{r s, k}$ at the discrete time $k - 1$ , through the similar calculation process to Equations (25)-(30). Therefore, recursive estimation of the speech signal $x_{k}$ can be achieved.

3. Application to Speech Signal in Real Environment

In order to confirm the actual usefulness of the proposed noise suppression algorithm, it was applied to speech signals in real noise environment. Though, in the previous studies [12] [13], the noisy air-conducted speeches were created on a computer by mixing the original air-conducted speech signal measured in a noise-free environment, the algorithm proposed in this study was applied to signals measured in real environment under existence of actual noises. For a female and a male speech signals digitized with sampling frequency of 10 kHz and quantization of 16 bits, we estimated the speech signal based on the observation corrupted by additive noise.

More specifically, air-conducted speeches were measured in real environment under existence of a white noise generated from a noise generator and an actual machine noise. The bone-conducted speech was simultaneously measured by use of an acceleration sensor with the air-conducted speech. By setting roughly the amplitude of the noises at two levels, the proposed algorithm was applied to extremely difficult situations with low SNR (noise-free air-conducted speech signal to noise ratio defined by $SNR = 10 \log_{10} (\sum x_{k}^{2} / \sum v_{k}^{2})$ ) being approximately −3 dB and −5 dB.

Using the observed bone-conducted speech and noisy observation on air-con ducted speech, constants a and b are first calculated by introducing the linear regression model in Equation (11) and applying the least squared method to this model. Secondly, the parameter $α$ of the membership function is obtained by calculating the standard deviation $σ$ of $y_{k}$ around ${\bar{y}}_{k}$ , as $α = 2 σ$ after assuming Gaussian distribution for the deviation.

The observed signals on air-conducted female speech contaminated by the white noise and machine noise are shown in Figure 1 and Figure 2. Furthermore, for the male speech signal, noisy air-conducted speech observations are shown in Figure 3 and Figure 4 respectively.

The estimated results by using the algorithm based on Equations (20)-(24) are shown in Figure 5 and Figure 6 for the female speech signal and in Figure 7 and Figure 8 for the male speech signal. For comparison, the estimated results of the female and male speech signals by using the estimation algorithm based on only the observation of air-conducted speech are shown in Figures 9-12.

By comparing Figures 5-8 with Figures 9-12, it is obvious that the proposed method can suppress the effects of white noise and real machine noise better than the method based on observation of only air-conducted speech.

The air-conducted female and male speech signals spoken by the same speakers in the different situation without any noises are shown in Figure 13 and Figure 14 as references. By comparing these speech signals measured in noise-free circumstance with the estimated results by the proposed method and the results by using the algorithm based on the observation of only air-conducted signal, the effectiveness of the proposed method is obvious. Furthermore, the computation time of the proposed method was reduced by 55.2% of the algorithm based on the only air-conducted observation, because it is unnecessary for the proposed method to calculate recursively the estimate of variance of $x_{k}$ based on the air-conducted speech $y_{k}$ .

Figure 1. Observed female speech signal contaminated by white noise with $SNR ≅ - 3 dB$ .

Figure 2. Observed female speech signal contaminated by machine noise with $SNR ≅ - 5 dB$ .

Figure 3. Observed male speech signal contaminated by white noise with $SNR ≅ - 3 dB$ .

Figure 4. Observed male speech signal contaminated by machine noise with $SNR ≅ - 5 dB$ .

Figure 5. Estimated female speech signal by use of the proposed method based on observation contaminated by white noise with $SNR ≅ - 3 dB$ .

Figure 6. Estimated female speech signal by use of the proposed method based on observation contaminated by machine noise with $SNR ≅ - 5 dB$ .

Figure 7. Estimated male speech signal by use of the proposed method based on observation contaminated by white noise with $SNR ≅ - 3 dB$ .

Figure 8. Estimated male speech signal by use of the proposed method based on observation contaminated by machine noise with $SNR ≅ - 5 dB$ .

Figure 9. Estimated female speech signal by use of the method based on only air-conducted observation contaminated by white noise with $SNR ≅ - 3 dB$ .

Figure 10. Estimated female speech signal by use of the method based on only air-conducted observation contaminated by machine noise with $SNR ≅ - 5 dB$ .

Figure 11. Estimated male speech signal by use of the method based on only air-conducted observation contaminated by white noise with $SNR ≅ - 3 dB$ .

Figure 12. Estimated male speech signal by use of the method based on only air-conducted observation contaminated by machine noise with $SNR ≅ - 5 dB$ .

Figure 13. Air-conducted female speech signal in the different situation without any noises.

Figure 14. Air-conducted male speech signal in the different situation without any noises.

4. Conclusions

In this paper, after considering the bone-conducted speech signal with the reduction of higher components as fuzzy data, applying the probability measure of fuzzy events, a new noise suppression method is derived on the basis of Bayes’ theorem as the fundamental principle of estimation. Furthermore, the proposed algorithm has been applied to real speech signals contaminated by noises measured in actual environment with low SNR. As a result, it has been revealed by experiments that better estimation results may be obtained by the proposed algorithm as compared with the method based on only air-conducted observations.

The proposed approach is quite different from the traditional standard techniques. However, we are still in an early stage of development, and a number of practical problems are yet to be investigated in the future. These include: 1) application to a diverse range of speech signals in actual noise environment, 2) extension to cases with multi-noise sources, and 3) finding an optimal number of expansion terms for the expansion-based probability expressions adopted.

Acknowledgements

The authors are grateful to Ms. Yui Maeda of the Prefectural University of Hiroshima for her help during this study. This work was supported in part by fund from the Grant-in-Aid for Scientific Research No. 19K04428 from the Ministry of Education, Culture, Sports, Science and Technology-Japan.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Yamashita, K. and Shimamura, T. (2005) Nonstationary Noise Estimation Using Low-Frequency Regions for Spectral Subtraction. IEEE Signal Processing Letters, 12, 465-468. https://doi.org/10.1109/LSP.2005.847864
[2]	Plapous, C., Marro, C. and Scalart, P. (2006) Improved Signal-to-Noise Ratio Estimation for Speech Enhancement. IEEE Transactions on Speech and Audio Processing, 14, 2098-2108. https://doi.org/10.1109/TASL.2006.872621
[3]	McCowan, I.A. and Bourlard, H. (2003) Microphone Array Post-Filter Based on Noise Field Coherence. IEEE Transactions on Speech and Audio Processing, 11, 709-716. https://doi.org/10.1109/TSA.2003.818212
[4]	Kawamura, A., Fujii, K., Itoh, Y. and Fukui, Y. (2002) A Noise Reduction Method Based on Linear Prediction Analysis. IEICE Transactions on Fundamentals, J85-A, 415-423. https://doi.org/10.1109/ICASSP.2002.1004860
[5]	Kawamura, A., Fujii, K. and Itoh, Y. (2005) A Noise Reduction Method Based on Linear Prediction with Variable Step-Size. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E88-A, 855-861. https://doi.org/10.1093/ietfec/e88-a.4.855
[6]	Kim, W. and Ko, H. (2001) Noise Variance Estimation for Kalman Filtering of Noisy Speech. IEICE Transactions on Information and Systems, E84-D, 155-160.
[7]	Li, H., Wang, X., Dai, B. and Lu, W. (2007) A Kalman Smoothing Algorithm for Speech Enhancement Based on the Properties of Vocal Tract Varying Slowly. Proceedings of Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Qingdao, 30 July-1 Aug. 2007, 832-836.
[8]	Tanabe, N., Furukawa, T. and Tsuji, S. (2008) Robust Noise Suppression Algorithm with the Kalman Filter Theory for White and Colored Disturbance. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E91-A, 818-829. https://doi.org/10.1093/ietfec/e91-a.3.818
[9]	Jia, H., Zhang, X. and Jin, C. (2009) A Modified Speech Enhancement Algorithm Based on the Subspace. Proceedings of 2009 Second International Symposium on Knowledge Acquisition and Modeling, Wuhan, 30 November-1 December 2009, 344-347. https://doi.org/10.1109/KAM.2009.19
[10]	Candy, J.V. (2009) Bayesian Signal Processing: Classical, Modern, and Particle Filtering Methods. John Wiley & Sons Ltd., Hoboken. https://doi.org/10.1002/9780470430583
[11]	Ikuta, A. and Orimoto, H. (2011) Adaptive Noise Suppression Algorithm for Speech Signal Based on Stochastic System Theory. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E94-A, 1618-1627. https://doi.org/10.1587/transfun.E94.A.1618
[12]	Ikuta, A., Orimoto, H. and Gallagher, G. (2018) Noise Suppression Method by Jointly Using Bone- and Air-Conducted Speech Signals. Noise Control Engineering Journal, 66, 472-488. https://doi.org/10.3397/1/376640
[13]	Orimoto, H., Ikuta, A. and Hasegawa, K. (2021) Speech Signal Detection Based on Bayesian Estimation by Observing Air-Conducted Speech under Existence of Surrounding Noise with the Aid of Bone-Conducted Speech. Intelligent Information Management, 13, 199-213. https://doi.org/10.4236/iim.2021.134011
[14]	Shin, H.S., Kang, H.G. and Fingscheidt, T. (2012) Survey of Speech Enhancement Supported by a Bone Conduction Microphone. Proceedings of 10th ITG Conference on Speech Communication, Braunschweig, 26-28 September 2012, 47-50.
[15]	Ikuta, A. and Orimoto, H. (2014) Fuzzy Signal Processing of Sound and Electromagnetic Environment by Introducing Probability Measure of Fuzzy Events. Proceedings of International Conference on Fuzzy Computation Theory and Applications, Rome, 22-24 October 2014, 5-13.
[16]	Orimoto, H. and Ikuta, A. (2012) Prediction of Response Probability Distribution by Considering Additive Property of Energy and Evaluation in Decibel Scale for Sound Environment System with Unknown Structure. Transactions of the Society of Instrument and Control Engineers, 48, 830-836. https://doi.org/10.9746/sicetr.48.830
[17]	Orimoto, H. and Ikuta, A. (2019) State Estimation for Sound Environment System with Nonlinear Observation Characteristics by Introducing Wide-Sense Particle Filter. Intelligent Information Management, 11, 87-101. https://doi.org/10.4236/iim.2019.116008

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies