Mobile SMS Spam Filtering for Nepali Text Using Naïve Bayesian and Support Vector Machine

Abstract

Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to determine whether it is Spam or not. The content is very dynamic and it is very challenging to represent all information in a mathematical model of classification. For instance, in content-based Spam filtering, the characteristics used by the filter to identify Spam message are constantly changing over time. Na?ve Bayes method represents the changing nature of message using probability theory and support vector machine (SVM) represents those using different features. These two methods of classification are efficient in different domains and the case of Nepali SMS or Text classification has not yet been in consideration; these two methods do not consider the issue and it is interesting to find out the performance of both the methods in the problem of Nepali Text classification. In this paper, the Na?ve Bayes and SVM-based classification techniques are implemented to classify the Nepali SMS as Spam and non-Spam. An empirical analysis for various text cases has been done to evaluate accuracy measure of the classification methodologies used in this study. And, it is found to be 87.15% accurate in SVM and 92.74% accurate in the case of Na?ve Bayes.

Share and Cite:

T. Shahi and A. Yadav, "Mobile SMS Spam Filtering for Nepali Text Using Naïve Bayesian and Support Vector Machine," International Journal of Intelligence Science, Vol. 4 No. 1, 2014, pp. 24-28. doi: 10.4236/ijis.2014.41004.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, “A Bayesian Approach to Filtering Junk E-mail,” Learning for Text Categorization Papers from the AAAI Workshop, 1998, pp. 55-62.
[2] S. Carrerasx. and L. Marquez, “Boosting Trees for AntiSpam Email Filtering,” Proceeding of RANLP, Tizigovchark, 5-7 September 2001, pp. 58-64.
[3] H. Drucker, D. H. Wu and V. N. Vapnik, “Support Vector Machines for Spam Categorization,” IEEE Transaction on Neural Networks, Vol. 10, No. 5, 1999, pp. 1048-1054.
http://dx.doi.org/10.1109/72.788645
[4] I. Androutsopoulos and J. Koutsias, “An Evaluation of Naive Bayesian Networks,” Machine Learning in the New Information Age, Barcelona, 2000, pp. 9-17.
[5] W. Cohen, “Learning Rules That Classify E-Mail,” AAAI Spring Symposium on Machine Learning in Information Access, Stanford, 25-27 March 1996, pp. 18-25.
[6] C. Apte, F. Damerau and S. M. Weiss, “Automated Learning of Decision Rules for Text Categorization,” ACM Transactions on Information Systems, Vol. 12, No. 3, 1994, pp. 233-251.
http://dx.doi.org/10.1145/183422.183423
[7] T. A. Almeida, J. M. G. Hidalgo and A. Yamakami, “Contributions to the Study of SMS Spam Filtering: New Collection and Results,” ACM Transection on Information System, Mountain View, 19-22 September 2011, pp. 20-25.
[8] P. Graham, “A Plan for Spam,” 2002.
http://www.paulgraham.com/Spam.html
[9] V. T. Joachims, “Making Large-Scale SVM Learning Practical,” In: B. Scholkopf, C. Burges and A. Smola, Eds., Advances in Kernel Methods Support Vector Learning, MIT-Press, Cambridge, 1999.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.