ML-CLUBAS: A Multi Label Bug Classification Algorithm

DOI: 10.4236/jsea.2012.512113   PDF   HTML   XML   5,098 Downloads   6,824 Views   Citations

Abstract

In this paper, a multi label variant of CLUBAS [1] algorithm, ML-CLUBAS (Multi Label-Classification of software Bugs Using Bug Attribute Similarity) is presented. CLUBAS is a hybrid algorithm, and is designed by using text clustering, frequent term calculations and taxonomic terms mapping techniques, and is an example of classification using clustering technique. CLUBAS is a single label algorithm, where one bug cluster is exactly mapped to a single bug category. However a bug cluster can be mapped into the more than one bug category in case of cluster label matches with the more than one category term, for this purpose ML-CLUBAS a multi label variant of CLUBAS is presented in this work. The designed algorithm is evaluated using the performance parameters F-measures and accuracy, number of clusters and purity. These parameters are compared with the CLUBAS and other multi label text clustering algorithms.

Share and Cite:

N. Nagwani and S. Verma, "ML-CLUBAS: A Multi Label Bug Classification Algorithm," Journal of Software Engineering and Applications, Vol. 5 No. 12, 2012, pp. 983-990. doi: 10.4236/jsea.2012.512113.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] N. K. Nagwani and S. Verma, “CLUBAS: An Algorithm and Java Based Tool for Software Bug Classification Using Bug Attributes Similarities,” Journal of Software Engineering and Applications, Vol. 5 No. 6, 2012, pp. 436-447. doi:10.4236/jsea.2012.56050
[2] S. Chapman, “Simmetrics, Java Based API for Text Similarity Measurement,” 2011. http://www.dcs.shef.ac.uk/~sam/simmetrics.html.
[3] C. D. Manning, P. Raghavan and H. Schuitze, “Introduction to Information Retrieval,” 2008. http://nlp.standford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
[4] H. Li, K. Zhang and T. Jiang, “Minimum Entropy Clustering and Applications to Gene Expression Analysis,” Proceedings of IEEE Computational System Bioinformatics Conference, Stanford, August 2004, pp. 142-151.
[5] I. H. Witten, E. Frank, L. E. Trigg, M. A. Hall, G. Holmes and S. J. Cunningham, “Weka (Waikato Environment for Knowledge Analysis),” 2011. www.cs.waikato.ac.nz/ml/weka
[6] “Android Bug Repository,” 2011. http://code.google.com/p/android/issues.
[7] JBoss-Seam, “Bug Repository,” 2011. https://issues.jboss.org/browse/JBSEAM.
[8] “Mozilla Bug Repository,” 2011. https://bugzilla.mozilla.org.
[9] MySql, “Bug Repository,” 2011. http://bugs.mysql.com.
[10] S. Osinski, J. Stefanowski and D. Weiss, “Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition,” Proceedings of the International Intelligent Information Processing and Web Mining Conference, Zakopane, 17-20 May 2004, pp. 359-368.
[11] S. Osinski, “An Algorithm for Clustering of Web Search Results,” Master’s thesis, Poznań University of Technology, Poznań, 2003.
[12] O. Zamir, O. Etzioni, “Grouper: A Dynamic Clustering Interface for Web Search Results,” Computer Networks, Vol. 31, No. 11-16, 1999, pp. 1361-1374. doi:10.1016/S1389-1286(99)00054-7
[13] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Melbourne, 24-28 August 1998, pp. 46-54.
[14] W. Li, “Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution,” IEEE Transactions on Information Theory, Vol. 38, No. 6, 1992, pp. 1842-1845. doi:10.1109/18.165464
[15] W. J. Reed, “The Pareto, Zipf and Other Power Laws,” Economics Letters, Vol. 74, No. 1, 2001, pp. 15-19. doi:10.1016/S0165-1765(01)00524-9

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.