Survey on Spam Filtering Techniques

.
DOI: 10.4236/cn.2011.33019   PDF   HTML     7,533 Downloads   15,880 Views   Citations

Abstract

In the recent years spam became as a big problem of Internet and electronic communication. There developed a lot of techniques to fight them. In this paper the overview of existing e-mail spam filtering methods is given. The classification, evaluation, and comparison of traditional and learning-based methods are provided. Some personal anti-spam products are tested and compared. The statement for new approach in spam filtering technique is considered.

Share and Cite:

S. Nazirova, "Survey on Spam Filtering Techniques," Communications and Network, Vol. 3 No. 3, 2011, pp. 153-160. doi: 10.4236/cn.2011.33019.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Wikipedia, “Spam”. http://en.wikipedia.org/wiki/Spam_(electronic)
[2] Wikipedia, “E-mail spam”. http://en.wikipedia.org/wiki/E-mail_spam
[3] Symantec, “State of Spam and Phishing. A Monthly Report 2010,” 2010. http://symantec.com/content/en/us/enterprise/other_rsources/b-state_of_spam_and_phishing_report_09-2010.en-us.pdf.
[4] J. P. Denning, “ACM President’s Letter: Electronic Junk,” Communications of the ACM, Vol. 25, No. 3, March 1982, pp. 163-165. doi:10.1145/358453.358454
[5] M. Sahami, “Learning Limited Dependence Bayesian Classifiers,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, The AAAI Press, Menlo Park, 1996, pp. 334-338.
[6] M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, “A Bayesian Approach to Filtering Junk Email,” AAAI Technical Report WS-98-05, AAAI Workshop on Learning for Text Categorization, 1998.
[7] J. R. Hall, “How to Avoid Unwanted Email,” Communications of the ACM, Vol. 41, No. 3, 1998, pp. 88-95. doi:10.1145/272287.272329
[8] E. Gabber, M. Jakobsson, Y. Matias and A.J. Mayer, “Curbing Junk E-Mail via Secure Classification,” Proceedings of the Second International Conference on Financial Cryptography, Springer-Verlag London, 23-25 March 1998, pp. 198-213.
[9] R. A. Fisher, “On Some Extensions of Bayesian Inference Proposed by Mr. Lindley,” Journal of the Royal Statistical Society: Series B, Vol. 22, No. 2, 1960, pp. 299-301.
[10] G. Robinson, “A Statistical Approach to the Spam Problem,” 2003. http://www.linuxjournal.com/article.php?sid=6467 (accessed March 2011).
[11] P. Boldi, M. Santini and S. Vigna, “PageRank as a Function of the Damping Factor,” Proceedings of the 14th International Conference on World Wide Web, ACM New York, 10-14 May 2005. doi:10.1145/1060745.1060827
[12] J. Gordillo and E. Conde, “An HMM for Detecting Spam Mail,” Expert Systems with Applications, Vol. 33, No. 3, 2007, pp. 667-682. doi:10.1016/j.eswa.2006.06.016
[13] L. M. Spracklin and L. V. Saxton, “Filtering Spam Using Kolmogorov Complexity Estimates,” in Russian, 21st International Conference on Advanced Information Networking and Applications Workshops (Ainaw’07), Niagara Falls, 21-23 May 2007, pp. 321-328.
[14] S. V. Korelov, A. K. Kryukov and L. U. Rotkov, “Text Messages’ Digital Analysis on Spam Identification,” in Russian, Proceedings of Scientific Conference on Radiophysics, Nizhni Novgorod State University, Nizhny Novgorod Oblast, 2006.
[15] W.-F. Hsiao and T.-M. Chang, “An Incremental Cluster-Based Approach to Spam Filtering,” Expert Systems with Applications, No. 34, No. 3, 2008, pp. 1599-1608. doi:10.1016/j.eswa.2007.01.018
[16] S. M. Lee, D. S. Kim and J. S. Park, “Spam Detection Using Feature Selection and Parameters Optimization,” IEEE International Conference on Intelligent and Software Intensive Systems, Krakow, 15-18 February 2010, pp. 883-888. doi:10.1109/CISIS.2010.116
[17] M. F. Saeddian and H. Beigy, “Spam Detection Using Dynamic Weighted Voting Based on Clustering,” Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application, Vol. 2, pp. 122-126. doi:10.1109/IITA.2008.140
[18] M. Sasaki and H. Shinnou, “Spam Detection Using Text Clustering,” IEEE Proceedings of the 2005 International Conference on Cyberwords, Singapore, 23-25 November 2005, pp. 316-319. doi:10.1109/CW.2005.83
[19] P. Cortez, C. Lopes, P. Sousa, M. Rocha and M. Rio, “Symbiotic Data Mining for Personalized Spam Filtering,” IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Milan, 15-18 September 2009, pp. 149-156. doi:10.1109/WI-IAT.2009.30
[20] W. Lauren, “Spam Wars,” Communications of the ACM —Program Compaction, Vol. 46, No. 8, 2003, p. 136.
[21] G. Pawel and M. Jacek, “Fighting the Spam Wars: A Re-Mailer Approach with Restrictive Aliasing,” ACM Transactions on Internet Technology (TOIT), Vol. 4, No. 1, 2004, pp. 1-30.
[22] F. Li, H. Mo-Han and G. Pawel, “The Community Behavior of Spammers” 2011. http://web.media.mit.edu/~fulu/ClusteringSpammers.pdf.
[23] K. S. Xu, M. Kliger, Y. Chen, P. J. Woolf and A. O. Hero, “Revealing Social Networks of Spammers through Spectral Clustering,” IEEE International Conference on Communications, Dresden, 14-18 June 2009, pp. 1-6. doi:10.1109/ICC.2009.5199418
[24] K. S. Xu, M. Kliger and A. O. Hero, “Tracking Communities of Spammers by Evolutionary Clustering,” 2011. http://www.eecs.umich.edu/~xukevin/xu_spam_icml_2010_sna.pdf.
[25] Laboratory CSAIL MIT in USA, 2011. http://projects.csail.mit.edu/spamconf/.
[26] Computer Laboratory Faculty Cambridge University in UK, 2011. http://www.cl.cam.ac.uk/~rnc1/.
[27] National Center for Scientific Research, “Demokritos,” 2011. http://www.iit.demokritos.gr/.
[28] D. Mertz, “Spam Filtering Techniques,” 2002. http://www.ibm.com/developerworks/linux/library/l-spamf.html.
[29] R. Segal, J. Crawford, J. Kephart and B. Leib, “SpamGuru: An Enterprise Anti-Spam Filtering System,” IBM Thomas J. Watson Research Center. http://www.research.ibm.com/people/r/rsegal/papers/spamguru-overview.pdf.
[30] Microsoft Antispam Technologies. http://www.microsoft.com/mscorp/safety/technologies/antispam/default.mspx.
[31] Symantec Antispam Protection for E-Mail. http://www.symantec.com/business/premium-antispam.
[32] Kasperskiy Ant-Spam. http://www.kaspersky.ru/anti-spam.
[33] Anti-Spam Research Group. http://asrg.sp.am/.
[34] The Internet Engineering Task Force. http://www.ietf.org/.
[35] Spam Events. http://spamlinks.net/conf.htm.
[36] S. A. Nazirova, “Anti-Spam Module for Filtering the Outgoing Correspondence,” in Russian, Transactions of ANAS, Informatics and Control Problems, Vol. XXVIII, No. 3, 2008, pp. 158-162.
[37] S. A. Nazirova, “New Anti Spam Methods,” Proceedings on the Second International Conference on Problems of Cybernetics and Informatics, Baku, 10-12 September 2008, pp. 89-92.
[38] Spam URL Realtime Block Lists. http://www.surbl.org/.
[39] Razor’s homepage. http://razor.sourceforge.net/.
[40] Pyzor’s homepage. http://sourceforge.net/apps/trac/pyzor/.
[41] DCC Spam Control Delayed Your E-Mail. http://mail.cc.umanitoba.ca/grey/.
[42] Symantec Brightmail Anti-Spam. http://www.symantec.com/business/premium-antispam.
[43] Yandex, “Some Automatic Spam Detection Methods”. http://company.yandex.ru/public/articles/antispam.xml.
[44] Microsoft Sender ID Framework. http://www.microsoft.com/mscorp/safety/technologies/senderid/default.mspx.
[45] Sender Policy Framework. http://www.openspf.org/Introduction.
[46] J. Klensin, “RFC-2821: Simple Mail Transfer Protocol,” April 2001. http://www.rfc-ref.org/RFC-TEXTS/2821/index.html.
[47] T.-J. Liu, W.-L. Tsao and C.-L. Lee, “A High Performance Image-Spam Filtering System,” Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 10-12 August 2010, Hong Kong, pp. 445-449. doi:10.1109/DCABES.2010.97
[48] M. Soranamageswari and C. Meena, “Statistical Feature Extraction for Classification of Image Spam Using Artificial Neural Networks,” Second International Conference on Machine Learning and Computing, Bangalore, 9-11 February, 2010, pp. 101-105. doi:10.1109/ICMLC.2010.72
[49] Bag of Words Model. http://en.wikipedia.org/wiki/Bag_of_words_model_in_computer_vision.
[50] K. Li, Z. Zhong and L. Ramaswamy, “Privacy-Aware Collaborative Spam Filtering,” IEEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 5, May 2009, pp. 725-739. doi:10.1109/TPDS.2008.143
[51] F. Weidong and D. Shoubin, “Addressing Interest Diversity in P2P Based Collaborative Spam Filtering,” Fifth International Conference on Grid and Cooperative Computing Workshops, Hunan, October 2006, pp. 163-169. doi:10.1109/GCCW.2006.16
[52] J. S. Kong, B. A. Rezaei, N. Sarshar, V. P. Roychowdhury and P. O. Boykin, “Collaborative Spam Filtering Using E-Mail Networks,” IEEE Computer Society on Computer, Vol. 39, No. 8, 2006, pp. 67-73.
[53] A. Gray and M. Haahr, “Personalised, Collaborative Spam Filtering,” Proceedings of the First Conference on Email and Anti-Spam (CEAS), Mountain View, 30-31 July 2004.
[54] R. M. Alguliyev and S. H. Nazirova, “Multilayer and Multiagent Automated Email Filtration System,” Telecommunications and Radioengeneering, Vol. 67, No. 12, pp. 1089-1095.
[55] P. A. Chirita, J. Diederich and W. Nejdl, “MailRank: Using Ranking for Spam Detection,” Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, 31 October-5 November 2005.
[56] R. Bhuleskar, A. Sherlekar and A. Pandit, “Hybrid Spam E-Mail Filtering,” 2009 First International Conference on Computational Intelligence, Communication Systems and Networks, Indore, 23-25 July 2009, pp. 302-307. doi:10.1109/CICSYN.2009.34
[57] Google Message Security Postini Services. http://www.google.com/postini/email.html.
[58] R. M. Alguliyev and S. H. Nazirova, “Architecture of Hierarchical Intellectual Nation-Wide System of Struggle against Spam,” in Russian, Information Technologies, Moscow, No. 8, 2006, pp. 32-36.
[59] R. M. Alguliyev and S. H. Nazirova, “Mechanism of Formation and Realisation of Anti-Spam Policy,” in Russian, Telecommunications, Moscow, No. 12, 2009, pp. 6-10.

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.