TITLE:
The Optimization and Improvement of MapReduce in Web Data Mining
AUTHORS:
Jun Qu, Chang-Qing Yin, Shangwei Song
KEYWORDS:
Cloud Computing, Web Data, MapReduce, Map-Reduce-Merge
JOURNAL NAME:
Journal of Software Engineering and Applications,
Vol.8 No.8,
August
24,
2015
ABSTRACT: Extracting and mining social networks
information from massive Web data is of both theoretical and practical
significance. However, one of definite features of this task was a large scale
data processing, which remained to be a great challenge that would be
addressed. MapReduce is a kind of distributed programming model. Just through
the implementation of map and reduce those two functions, the distributed tasks
can work well. Nevertheless, this model does not directly support heterogeneous
datasets processing, while heterogeneous datasets are common in Web. This
article proposes a new framework which improves original MapReduce framework
into a new one called Map-Reduce-Merge. It adds merge phase that can
efficiently solve the problems of heterogeneous data processing. At the same
time, some works of optimization and improvement are done based on the features
of Web data.