Feature Extraction for Audio Classification of Gunshots Using the Hartley Transform


In audio classification applications, features extracted from the frequency domain representation of signals are typically focused on the magnitude spectral content, while the phase spectral content is ignored. The conventional Fourier Phase Spectrum is a highly discontinuous function; thus, it is not appropriate for feature extraction for classification applications, where function continuity is required. In this work, the sources of phase spectral discontinuities are detected, categorized and compensated, resulting in a phase spectrum with significantly reduced discontinuities. The Hartley Phase Spectrum, introduced as an alternative to the conventional Fourier Phase Spectrum, encapsulates the phase content of the signal more efficiently compared with its Fourier counterpart because, among its other properties, it does not suffer from the phase ‘wrapping ambiguities’ introduced due to the inverse tangent function employed in the Fourier Phase Spectrum computation. In the proposed feature extraction method, statistical features extracted from the Hartley Phase Spectrum are combined with statistical features extracted from the magnitude related spectrum of the signals. The experimental results show that the classification score is higher in case the magnitude and the phase related features are combined, as compared with the case where only magnitude features are used.

Share and Cite:

I. Paraskevas and M. Rangoussi, "Feature Extraction for Audio Classification of Gunshots Using the Hartley Transform," Open Journal of Acoustics, Vol. 2 No. 3, 2012, pp. 131-142. doi: 10.4236/oja.2012.23015.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] L. D. Alsteris and K. K. Paliwal, “Short-Time Phase Spectrum in Speech Processing: A Review and Some Experimental Results,” Digital Signal Processing, Vol. 17, No. 3, 2007, pp. 578-616. doi:10.1016/j.dsp.2006.06.007
[2] P. Aarabi, G. Shi, M. M. Shanechi and S. A. Rabi, “PhaseBased Speech Processing,” World Scientific Publishing Co Pte Ltd., Singapore, 2006.
[3] D. Eck and N. Casagrande, “Finding Meter in Music Using an Autocorrelation Phase Matrix and Shannon Entropy,” Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), London, 11-15 September 2005, pp. 504-509.
[4] E. Cano, G. Schuller and C. Dittmar, “Exploring Phase Information in Sound Source Separation Applications,” Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, 6-10 September 2010, pp. 1-7.
[5] F. Jin, S. Krishnan and F. Sattar, “Adventitious Sounds Identification and Extraction Using Temporal-Spectral Dominance-Based Features,” IEEE Transactions on Biomedical Engineering, Vol. 58, No. 11, 2011, pp. 30783087.
[6] R. M. Hegde, H. A. Murthy and V. R. R. Gadde, “Significance of the Modified Group Delay Feature in Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, No. 1, 2007, pp. 190-202. doi:10.1109/TASL.2006.876858
[7] B. Bozkurt, L. Couvreur and T. Dutoit, “Chirp Group Delay Analysis of Speech Signals,” Speech Communication, Vol. 49, No. 3, 2007, pp. 159-176. doi:10.1016/j.specom.2006.12.004
[8] J. M. Tribolet, “A New Phase Unwrapping Algorithm,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 25, No. 2, 1977, pp. 170-177. doi:10.1109/TASSP.1977.1162923
[9] H. Al-Nashi, “Phase Unwrapping of Digital Signals,” IEEE Transactions on Acoustics, Speech and Audio Processing, Vol. 37, No. 11, 1989, pp. 1693-1702. doi:10.1109/29.46552
[10] I. Paraskevas and E. Chilton, “Combination of Magnitude and Phase Statistical Features for Audio Classification,” Acoustics Research Letters Online, Vol. 5, No. 3, 2004, pp. 111-117. http://asadl.org/arlo/resource/1/arlofj/v5/i3/p111_s1?bypassSSO=1 doi:10.1121/1.1755731
[11] Y. Wang, Z. Liu and J.-C. Huang, “Multimedia Content Analysis Using both Audio and Visual Cues,” IEEE Signal Processing Magazine, Vol. 17, No. 6, 2000, pp. 12-36. doi:10.1109/79.888862
[12] S. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 28, No. 4, 1980, pp. 357-366. doi:10.1109/TASSP.1980.1163420
[13] J. G. Proakis and D. G. Manolakis, “Digital Signal Processing Principles, Algorithms, and Applications,” Macmillan Publishing Company, New York, 1992.
[14] G. E. Forsythe, M. A. Malcom and C. B. Moler, “Computer Methods for Mathematical Computations,” Prentice-Hall, Upper Saddle River, 1977.
[15] G. A. Sitton, C. S. Burrus, J. W. Fox and S. Treitel, “Factoring Very-High-Degree Polynomials,” IEEE Signal Processing Magazine, Vol. 20, No. 6, 2003, pp. 27-42. doi:10.1109/MSP.2003.1253552
[16] R. N. Bracewell, “The Fourier Transform and Its Applications,” McGraw-Hill Book Company, New York City, 1986.
[17] I. Paraskevas and M. Rangoussi, “The Hartley Phase Spectrum as an Assistive Feature for Classification,” Advances in Nonlinear Speech Processing, Springer Lecture Notes in Computer Science, Vol. 5933, 2010, pp. 51-59.
[18] I. Paraskevas and M. Rangoussi, “The Hartley Phase Cepstrum as a Tool for Improved Phase Estimation,” Proceedings of the 16th International Conference on Systems, Signals and Image Processing (IWSSIP 2009), Chalkida, 18-20 June 2009, pp. 1-4. doi:10.1109/IWSSIP.2009.5367774
[19] I. Moreno, V. Kober, V. Lashin, J. Campos, L. P. Yaroslavsky and M. J. Yzuel, “Color Pattern Recognition with Circular Component Whitening,” Optics Letters, Vol. 21, No. 7, 1996, pp. 498-500. doi:10.1364/OL.21.000498
[20] E. Chilton, “An 8kb/s Speech Coder Based on the Hartley Transform,” Proceedings of the ICCS ‘90 Communication Systems: Towards Global Integration, Singapore, 5-9 November 1990, Vol. 1, pp. 13.5.1-13.5.5.
[21] I. Paraskevas and M. Rangoussi, “The Hartley Phase Cepstrum as a Tool for Signal Analysis,” Advances in Nonlinear Speech Processing, Springer Lecture Notes in Computer Science, Vol. 4885, 2007, pp. 204-212.
[22] I. Paraskevas and M. Rangoussi, “The Hartley Phase Spectrum as a Noise-Robust Feature in Speech Analysis,” Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech Analysis and Processing for Knowledge Discovery, Aalborg, 4-6 June 2008.
[23] A. Papoulis, “Probability and Statistics,” Prentice-Hall, Inc., Upper Saddle River, 1990.
[24] T. Lambrou, P. Kudumakis, R. Speller, M. Sandler and A. Linney, “Classification of Audio Signals Using Statistical Features on Time and Wavelet Transform Domains,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 1998), Seattle, 12-15 May 1998, Vol. 6, pp. 3621-3624.
[25] S. Theodoridis and K. Koutroumbas, “Pattern Recognition,” Academic Press, San Diego, 1999.
[26] P. C. Mahalanobis, “On the Generalized Distance in Statistics,” Proceedings of the National Institute of Science of India, Vol. 2, No. 1, 1936, pp. 49-55.
[27] R. O. Duda, P. E. Hart and D. G. Stork, “Pattern Classification,” 2nd Edition, John Wiley & Sons, Hoboken, 2000.
[28] Audio Database, “505 Digital Sound Effects, (Disk 3/5: 101 Sounds of the Machines of War),” Delta Entertainment Corporation, Santa Monica, 1993.
[29] M. R. Azimi-Sadjadi, Y. Jiang and S. Srinivasan, “Acoustic Classification of Battlefield Transient Events Using Wavelet Sub-Band Features,” SPIE Proceedings, Vol. 6562, 2007, p. 656215. doi:10.1117/12.722296
[30] E. G. P. Schuijers, A. W. J. Oomen, A. C. den Brinker and A. J. Gerrits, “Advances in Parametric Coding for High-Quality Audio,” Proceedings of the 1st IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPCA 2002), Leuven, 15 November 2002, pp. 73-79.
[31] J. Jinshui, C. Hao, L. Bin and S. Perming, “Research on the Method of Features Extraction for Non-Stationary Transient Signal Based on EMD Method,” Proceedings of the International Conference on Communication Software and Networks, 2009 (ICCSN 2009), Macau, 27-28 February 2009, pp. 637-641.
[32] P. Gough, “A Particular Example of Phase Unwrapping Using Noisy Experimental Data,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 31, No. 3, 1983, pp. 742-744. doi:10.1109/TASSP.1983.1164099
[33] L. Gillick and S. J. Cox, “Some Statistical Issues in the Comparison of Speech Recognition Algorithms,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 1989), Glasgow, 23-26 May 1989, Vol. 1, pp. 532-535.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.