Neural Network Based Missing Feature Method For Text-Independent Speaker Identification
Ying WANG, Wei LU
.
DOI: 10.4236/ijcns.2010.31005   PDF    HTML     6,370 Downloads   10,315 Views  

Abstract

The first step of missing feature methods in text-independent speaker identification is to identify highly corrupted spectrographic representation of speech as missing feature. Most mask estimation techniques rely on explicit estimation of the characteristics of the corrupting noise and usually fail to work with inaccurate estimation of noise. We present a mask estimation technique that uses neural networks to determine the reliability of spectrographic elements. Without any prior knowledge of the noise or prior probability of speech, this method exploits only the characteristics of the speech signal. Experiments were performed on speech corrupted by stationary F16 noise and non-stationary Babble noise from 5dB to 20 dB separately, using cluster based reconstruction missing feature method. The result performs better recognition accuracy than conventional spectral subtraction mask estimation methods.

Share and Cite:

Y. WANG and W. LU, "Neural Network Based Missing Feature Method For Text-Independent Speaker Identification," International Journal of Communications, Network and System Sciences, Vol. 3 No. 1, 2010, pp. 43-47. doi: 10.4236/ijcns.2010.31005.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] M. P. Cooke, A. Morris, and P. D. Green, “Recognition occluded speech,” ESCA Tutorial and Workshop on Auditory Basis of Speech Perception, Keele University, July 15–19, 1996.
[2] A. Vizinho, P. Green, M. P. Cooke, and L. Josifovski, “Missing data theory, spectral subtraction and signal-to- noise estimation for robust ASR: An integrated study [C],” Proceedings of Sixth European Conference on Speech Communication and Technology, Eurospeech, Budapest, pp. 2407–2410, 1999.
[3] M. P. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and uncertain acoustic data [J],” Speech Communication, pp. 267–285, 2001.
[4] A. Drygajlo and M. El-Maliki, “Speaker verification in noisy environments with combined spectral subtraction and missing feature theory [M],” Proceedings of IEEE ICASSP 98, Seattle, IEEE, USA, pp. 121–124, 1998.
[5] M. L. Seltzer, B. Raj, and R. M. Stern, “A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition [J],” Speech Communication, Vol. 43, pp. 379–393, 2004.
[6] B. Raj, M. L. Seltzer, and R. M. Stern, “Reconstruction of missing features for robust speech recognition [J],” Speech Communication, Vol. 43, pp. 275–296, 2004.
[7] B. Raj, “Reconstruction of incomplete spectrograms for robust speech recognition [D],” Pittsburgh, ECE Department, USA, Carnegie Mellon University, 2000.
[8] Z. Q. Bian and X. G. Zhang, “Pattern recognition [M],” Tsinghua University, Beijing, pp. 235–237, 2000.
[9] R. J. Higgins, “Digital signal processing in VLSI, Englewood Cliffs,” Prentice Hall, NJ, 1990.
[10] http://www.fon.hum.uva.nl/praat/download_win.html.
[11] J. P. Leblanc and P. L. De Leon, “Speech separation by kurtosis maximization,” Proceedings of ICASSP_98, 1998.
[12] J. P. Leblanc and P. L. De Leon, “Noise estimation techniques for robust speech recognition,” Proceedings of ICASSP’ 95, pp. 153–156, 1998.
[13] J. Campbell, “Testing with the YOHO CD-ROM voice verification corpus [C],” Proceedings of IEEE ICASSP. Detroit, USA, IEEE, pp. 341–344, 1995.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.