Share This Article:

Random Subspace Learning Approach to High-Dimensional Outliers Detection

Abstract Full-Text HTML XML Download Download as PDF (Size:3956KB) PP. 618-630
DOI: 10.4236/ojs.2015.56063    2,469 Downloads   2,882 Views   Citations

ABSTRACT

We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Liu, B. and Fokoué, E. (2015) Random Subspace Learning Approach to High-Dimensional Outliers Detection. Open Journal of Statistics, 5, 618-630. doi: 10.4236/ojs.2015.56063.

References

[1] Rousseeuw, P.J. (1984) Least Median of Squares Regression. Journal of the American Statistical Association, 79, 871-880.
http://dx.doi.org/10.1080/01621459.1984.10477105
[2] Rousseeuw, P. and Van Driessen, K. (1999) A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics, 41, 212-223.
http://dx.doi.org/10.1080/00401706.1999.10485670
[3] Filzmoser, P., Maronna, R. and Werner, M. (2008) Outlier Identification in High Dimensions. Computational Statistics & Data Analysis, 52, 1694-1711.
http://dx.doi.org/10.1016/j.csda.2007.05.018
[4] Fritsch, V., Varoquaux, G., Thyreau, B., Poline, J.-B. and Thirion, B. (2011) Detecting Outlying Subjects in High-Dimensional Neuroimaging Datasets with Regularized Minimum Covariance Determinan 225. In: Fichtinger, G., Martel, A. and Peters, T., Eds., Medical Image Computing and Computer-Assisted Intervention MICCAI 2011, Springer, Berlin Heidelberg, 264-271.
[5] Angiulli, F. and Pizzuti, C. (2002) Fast Outlier Detection in High Dimensional Spaces. In: Tapio, E., Heikki, M. and Hannu, T., Eds., Principles of Data Mining and 230 Knowledge Discovery, Springer, Rende, 15-27.
http://dx.doi.org/10.1007/3-540-45681-3_2
[6] Aggarwal, C. and Yu, S. (2005) An Effective and Efficient Algorithm for High-Dimensional Outlier Detection. The VLDB Journal, 14, 211-221.
http://dx.doi.org/10.1007/s00778-004-0125-5
[7] Ghoting, A., Parthasarathy, S. and Otey, M.E. (2008) Fast Mining of Distance-Based Outliers in High-Dimensional 235 Datasets. Data Mining and Knowledge Discovery, 16, 349-364.
http://dx.doi.org/10.1007/s10618-008-0093-2
[8] Kriegel, H.-P., Kröger, P., Schubert, E. and Zimek, A. (2009) Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data. In: Editor, Ed., Advances in Knowledge Discovery and Data Mining, Springer, München, 831-838.
http://dx.doi.org/10.1007/978-3-642-01307-2_86
[9] Coppersmith, D. and Winograd, S. (1990) Matrix Multiplication via Arithmetic Progressions. Journal of Symbolic Computation, 9, 251-280.
http://dx.doi.org/10.1016/S0747-7171(08)80013-2
[10] Le Gall, F. (2014) Powers of Tensors and Fast Matrix Multiplication. Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, New York, 23-25 July 2014. http://dx.doi.org/10.1145/2608628.2608664
[11] Hubert, M. and Engelen, S. (2004) Robust PCA and Classification in Biosciences. Bioinformatics, 20, 1728-1736.
http://dx.doi.org/10.1093/bioinformatics/bth158
[12] Croux, C. and Ruiz-Gazen, A. (1996) A Fast Algorithm for Robust Principal Components Based on Projection Pursuit. In: Prat, A., Ed., COMPSTAT, Springer, Heidelberg, 211-216.
http://dx.doi.org/10.1007/978-3-642-46992-3_22
[13] Li, G.Y. and Chen, Z.L. (1985) Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory. Journal of the American Statistical Association, 80, 759-766.
http://dx.doi.org/10.1080/01621459.1985.10478181
[14] Ho, T.K. (1998) The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832-844.
http://dx.doi.org/10.1109/34.709601
[15] Lopuhaa, H.P. and Rousseeuw, P.J. (1991) Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance. The Annals of Statistics, 19, 229-248.
http://dx.doi.org/10.1214/aos/1176347978
[16] Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C. (1999) Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13, 1443-1471.
[17] Hubert, M., Rousseeuw, P.J. and VandenBranden, K. (2005) Robpca: A New Approach to Robust Principal Component Analysis. Technometrics, 47, 64-79.
http://dx.doi.org/10.1198/004017004000000563
[18] Manevitz, L.M. and Yousef, M. (2002) One-Class SVMs for Document Classification. The Journal of Machine Learning Research, 2, 139-154.
[19] Zhang, R., Zhang, S., Muthuraman, S. and Jiang, J. (2007) One Class Support Vector Machine for Anomaly Detection in the Communication. Proceedings of the 5th Conference on Applied Electromagnetics, Wireless and Optical Communications, ELECTROSCIENCE’07, 14-16 December 2007, Tenerife, World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, 31-37.
[20] Amer, M., Goldstein, M. and Abdennadher, S. (2013) Enhancing One-Class Support Vector Machines for Unsupervised Anomaly Detection. Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ODD’13, ACM, New York, 2013, 8-15.
http://dx.doi.org/10.1145/2500853.2500857

  
comments powered by Disqus

Copyright © 2019 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.