Speech Signal Recovery Based on Source Separation and Noise Suppression

Abstract

In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively suppress noises. An automatic speech recognition (ASR) process to deal with the multi-speaker task is designed and implemented. Evaluation tests have been carried out by using the speech da- tabase NOIZEUS and the experimental results show that the proposed algorithm achieves impressive performance improvements.

Share and Cite:

Wang, Z. , Zhang, H. and Bi, G. (2014) Speech Signal Recovery Based on Source Separation and Noise Suppression. Journal of Computer and Communications, 2, 112-120. doi: 10.4236/jcc.2014.29015.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Boll, S. (197) Suppression of Acoustic Noise In Speech Using Spectral Subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27, 113-120. http://dx.doi.org/10.1109/TASSP.1979.1163209
[2] Junqua, J.C., Mak, B. and Reaves, B. (1994) A Robust Algorithm forward Boundary Detection in the Presence of Noise. IEEE Transactions on Speech and Audio Processing, 2, 406-421. http://dx.doi.org/10.1109/89.294354
[3] Beritelli, F., Casale, S., Ruggeri, G., et al. (2002) Performances Evaluation and Comparison of G.729/AMR/Fuzzy Voice Activity Detectors. IEEE Signal Processing Letters, 9, 85-88. http://dx.doi.org/10.1109/97.995824
[4] Abdallah, I., Montresor, S. and Baudry, M. (1997) Robust Speech/Non-Speech Detection in Adverse Conditions Using an Entropy Based Estimator. International Conference on Digital Signal Processing, Santorini, 757-760.
[5] Zhang, H., Bi, G., Razul, S.G. and See, C.-M. (2013) Estimation of Underdetermined Mixing Matrix with Unknown Number of Overlapped Sources in Short-Time Fourier Transform Domain. IEEE ICASSP, 6486-6490.
[6] Comaniciu, D. and Meer, P. (2002) Mean Shift: A Robust Approach toward Feature Space Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603-619. http://dx.doi.org/10.1109/34.1000236
[7] Aissa-El-Bey, A., Linh-Trung, N., Abed-Meraim, K. and Grenier, Y. (2007) Underdetermined Blind Separation of Nondisjoint Sources in the Time-Frequency Domain. IEEE Transactions on Signal Processing, 55, 897-907. http://dx.doi.org/10.1109/TSP.2006.888877
[8] Griffin, D. and Lim, J.S. (1984) Signal Estimation from Modified Short-Time Fourier Transform. IEEE Transactions on Acoustics Speech and Signal Processing, 32, 236-243. http://dx.doi.org/10.1109/TASSP.1984.1164317
[9] Chang, H.Y., Lee, A.K. and Li, H.Z. (2009) An GMM Super-vector Kernel with Bhattacharyya Distance for SVM Based Speaker Recognition. IEEE ICASSP, 4221-4224.
[10] Hu, Y. and Loizou, P. (2006) Subjective Comparison of Speech Enhancement Algorithms. IEEE ICASSP, 1, 153-156.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.