Multivariate Modality Inference Using Gaussian Kernel

The number of modes (also known as modality) of a kernel density estimator (KDE) draws lots of interests and is important in practice. In this paper, we develop an inference framework on the modality of a KDE under multivariate setting using Gaussian kernel. We applied the modal clustering method proposed by [1] for mode hunting. A test statistic and its asymptotic distribution are derived to assess the significance of each mode. The inference procedure is applied on both simulated and real data sets.

KEYWORDS

Cite this paper

Cheng, Y. and Ray, S. (2014) Multivariate Modality Inference Using Gaussian Kernel. Open Journal of Statistics, 4, 419-434. doi: 10.4236/ojs.2014.45041.

Conflicts of Interest

The authors declare no conflicts of interest.

 [1] Li, J., Ray, S. and Lindsay, B.G. (2007) A Nonparametric Statistical Approach to Clustering via Mode Identification. Journal of Machine Learning Research, 8, 1687-1723. [2] Tibshirani, R., Walther, G. and Hastie, T. (2001) Estimating the Number of Clusters in a Data Set via the Gap Statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411-423.http://dx.doi.org/10.1111/1467-9868.00293 [3] McLachlan, G. and Peel, D. (2004) Finite Mixture Models. Wiley, Hoboken. [4] Lloyd, S. (1982) Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28, 129-137. http://dx.doi.org/10.1109/TIT.1982.1056489 [5] Fraley, C. and Raftery, A.E. (2002) Model-Based Clustering, Discriminant Analysis, and Density Estimation. Journal of the American Statistical Association, 97, 611-631. http://dx.doi.org/10.1198/016214502760047131 [6] Silverman, B.W. (1981) Using Kernel Density Estimates to Investigate Multimodality. Journal of the Royal Statistical Society, Series B (Methodological), 43, 97-99. [7] Efron, B. (1979) Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7, 1-26.http://dx.doi.org/10.1214/aos/1176344552 [8] Minnotte, M.C. (1997) Nonparametric Testing of the Existence of Modes. The Annals of Statistics, 25, 1646-1660.http://dx.doi.org/10.1214/aos/1031594735 [9] Burman, P. and Polonik, W. (2009) Multivariate Mode Hunting: Data Analytic Tools with Measures of Significance. Journal of Multivariate Analysis, 100, 1198-1218. http://dx.doi.org/10.1016/j.jmva.2008.10.015 [10] Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. Academic Press, Waltham. [11] Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley, New York. [12] Li, Q. and Racine, J.S. (2011) Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton. [13] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38. [14] Ray, S. and Lindsay, B.G. (2005) The Topography of Multivariate Normal Mixtures. The Annals of Statistics, 33, 2042-2065. http://dx.doi.org/10.1214/009053605000000417 [15] Dmitrienko, A., Tamhane, A.C. and Bretz, F. (2010) Multiple Testing Problems in Pharmaceutical Statistics. CRC Press, Boca Raton. [16] Ray, S. and Pyne, S. (2012) A Computational Framework to Emulate the Human Perspective in Flow Cytometric Data Analysis. PloS One, 7, Article ID: e35693. http://dx.doi.org/10.1371/journal.pone.0035693 [17] Flury, B. and Riedwyl, H. (1988) Multivariate Statistics: A Practical Approach. Chapman & Hall, Ltd., London. http://dx.doi.org/10.1007/978-94-009-1217-5 [18] Lindsay, B.G., Markatou, M., Ray, S., Yang, K. and Chen, S.C. (2008) Quadratic Distances on Probabilities: A Unified Foundation. The Annals of Statistics, 36, 983-1006. http://dx.doi.org/10.1214/009053607000000956