TITLE:
An Online Malicious Spam Email Detection System Using Resource Allocating Network with Locality Sensitive Hashing
AUTHORS:
Siti-Hajar-Aminah Ali, Seiichi Ozawa, Junji Nakazato, Tao Ban, Jumpei Shimamura
KEYWORDS:
Malicious Spam Email Detection System, Incremental Learning, Resource Allocating Network, Locality Sensitive Hashing
JOURNAL NAME:
Journal of Intelligent Learning Systems and Applications,
Vol.7 No.2,
April
22,
2015
ABSTRACT: In this paper, we propose a new online
system that can quickly detect malicious spam emails and adapt to the changes
in the email contents and the Uniform Resource Locator (URL) links leading to
malicious websites by updating the system daily. We introduce an autonomous
function for a server to generate training examples, in which double-bounce
emails are automatically collected and their class labels are given by a
crawler-type software to analyze the website maliciousness called SPIKE. In
general, since spammers use botnets to spread numerous malicious emails within
a short time, such distributed spam emails often have the same or similar
contents. Therefore, it is not necessary for all spam emails to be learned. To
adapt to new malicious campaigns quickly, only new types of spam emails should
be selected for learning and this can be realized by introducing an active
learning scheme into a classifier model. For this purpose, we adopt Resource
Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier
model with a data selection function. In RAN-LSH, the same or similar spam
emails that have already been learned are quickly searched for a hash table in
Locally Sensitive Hashing (LSH), in which the matched similar emails located in
“well-learned” are discarded without being used as training data. To analyze
email contents, we adopt the Bag of Words (BoW) approach and generate feature
vectors whose attributes are transformed based on the normalized term
frequency-inverse document frequency (TF-IDF). We use a data set of
double-bounce spam emails collected at National Institute of Information and
Communications Technology (NICT) in Japan from March 1st, 2013 until May 10th,
2013 to evaluate the performance of the proposed system. The results confirm
that the proposed spam email detection system has capability of detecting with
high detection rate.