Single Channel Source Separation Using Filterbank and 2D Sparse Matrix Factorization

Abstract

We present a novel approach to solve the problem of single channel source separation (SCSS) based on filterbank technique and sparse non-negative matrix two dimensional deconvolution (SNMF2D). The proposed approach does not require training information of the sources and therefore, it is highly suited for practicality of SCSS. The major problem of most existing SCSS algorithms lies in their inability to resolve the mixing ambiguity in the single channel observation. Our proposed approach tackles this difficult problem by using filterbank which decomposes the mixed signal into sub-band domain. This will result the mixture in sub-band domain to be more separable. By incorporating SNMF2D algorithm, the spectral-temporal structure of the sources can be obtained more accurately. Real time test has been conducted and it is shown that the proposed method gives high quality source separation performance.

Share and Cite:

X. Lu, B. Gao, L. Khor, W. Woo, S. Dlay, W. Ling and C. Chin, "Single Channel Source Separation Using Filterbank and 2D Sparse Matrix Factorization," Journal of Signal and Information Processing, Vol. 4 No. 2, 2013, pp. 186-196. doi: 10.4236/jsip.2013.42026.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] T. Kristjansson, H. Attias and J. Hershey, “Single Microphone Source Separation Using High Resolution Signal Reconstruction,” Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Quebec, 17-21 May 2004, pp. 817-820.
[2] B. Gao, W. L. Woo and S. S. Dlay, “Single Channel Source Separation Using EMD-Subband Variable Regularized Sparse Features,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 4, 2011, pp. 961-976. doi:10.1109/TASL.2010.2072500
[3] M. H. Radfa and R. M. Dansereau, “Single-Channel Speech Separation Using Soft Mask Filtering,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, 2007, pp. 2299-2310.
[4] D. Ellis, “Model-Based Scene Analysis,” In: D. Wang and G. Brown, Eds. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley/ IEEE Press, New York, 2006.
[5] T. Kristjansson, H. Attias and J. Hershey, “Single MicroPhone Source Separation Using High Resolution Signal Reconstruction,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Montreal, 17-21 May 2004, pp. 817-820.
[6] B. Gao, W. L. Woo and S. S. Dlay, “Adaptive Sparsity NonNegative Matrix Factorization for Single Channel Source Separation,” IEEE Journal of Selected Topics in Signal Processing, Vol. 5, No. 5, 2011, pp. 1932-4553.
[7] G. J. Brown and M. Cooke, “Computational Auditory Scene Analysis,” Computer Speech and Language, Vol. 8, No. 4, 1994, pp. 297-336. doi:10.1006/csla.1994.1016
[8] M. Helén and T. Virtanen, “Separation of Drums from Polyphonic Music Using Nonnegative Matrix Factorization and Support Vector Machine,” 13th European Signal Processing Conference, Turkey, 6 September 2005.
[9] P. Smaragdis and J. C. Brown, “Non-Negative Matrix Factorization for Polyphonic Music Transcription,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 19-22 October 2003, pp. 177-180.
[10] R. Kompass, “A Generalized Divergence Measure for Nonnegative Matrix Factorization,” Proceedings of the Neuroinformatics Workshop, Torun, September 2005.
[11] A. Cichocki, R. Zdunek, and S. I. Amari, “Csiszár’s divergences for non-negative matrix factorization: family of new algorithms,” Proceedings of the 6th International Conference on Independent Component Analysis and Blind Signal Separation, Vol. 3889, Springer, Charleston, 2006, pp. 32-39. doi:10.1007/11679363_5
[12] P. D. O. Grady, “Sparse Separation of Under-Determined Speech Mixtures,” Ph.D. Thesis, National University of Ireland Maynooth, Kildare, 2007.
[13] D. Lee and H. Seung, “Learning the Parts of Objects by Nonnegative Matrix Factorisation,” Nature, Vol. 401, No. 6755, 1999, pp. 788-791. doi:10.1038/44565
[14] D. D. Lee and H. S. Seung, “Algorithms for Non-Negative Matrix Factorization,” MIT Press, Cambridge, 2002, pp. 556-562.
[15] M. M?rup and M. N. Schmidt, “Sparse Non-Negative Matrix Factor 2-D Deconvolution for Automatic Transcription of Polyphonic Music,” Technical University of Denmark, Lyngby, 2006.
[16] A. Mertins, “Signal Analysis Wavelets, Filter Banks, TimeFrequency Transforms and Applications,” John Wiley & Sons, Hoboken, 1999, pp. 143-195.
[17] B. Gao, W. L. Woo and S. S. Dlay, “Variational Bayesian Regularized 2-D Nonnegative Matrix Factorization,” IEEE Transactions on Neural Networks and Learning Systems, Vol. 23, No. 5, 2012, pp. 703-716. doi:10.1109/TNNLS.2012.2187925
[18] Md. K. I. Molla and K. Hirose, “Single-Mixture Audio Source Separation by Subspace Decomposition of Hilbert Spectrum,” The IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 3, 2007, pp. 893-900.
[19] J. Taghia and M. Ali Doostari, “Subband-Based SingleChannel Source Separation of Instantaneous Audio Mixtures,” World Applied Sciences Journal, Vol. 6, No. 6, 2009, pp. 784-792.
[20] K. Kokkinakis and P. C. Loiziu, “Subband-Based Blind Signal Processing for Source Separation in Convolutive Mixtures of Speech,” IEEE International Conference on Acoustic, Speech and Signal Processing, Honolulu, 15-20 April 2007, pp. 917-920.
[21] P. P. Vaidyanathan, “Multirate Systems and Filter Banks,” Prentice-Hall, Englewood Cliffs, 1993.
[22] E. Vincent, R. Gribonval and C. Fevotte, “Performance Measurement in Blind Audio Source Separation,” The IEEE Transactions on Audio, Speech and Language Processing, Vol. 14, No. 4, 2005, pp. 1462-1469.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.