TITLE:
Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service
AUTHORS:
Yuzhong Yan, Mahsa Hanifi, Liqi Yi, Lei Huang
KEYWORDS:
Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service
JOURNAL NAME:
Journal of Computer and Communications,
Vol.3 No.5,
May
25,
2015
ABSTRACT:
Cloud Computing as a
disruptive technology, provides a dynamic, elastic and promising computing
climate to tackle the challenges of big data processing and analytics. Hadoop
and MapReduce are the widely used open source frameworks in Cloud Computing for
storing and processing big data in the scalable fashion. Spark is the latest
parallel computing engine working together with Hadoop that exceeds MapReduce
performance via its in-memory computing and high level programming features. In
this paper, we present our design and implementation of a productive, domain-specific
big data analytics cloud platform on top of Hadoop and Spark. To increase user’s
productivity, we created a variety of data processing templates to simplify the
programming efforts. We have conducted experiments for its productivity and
performance with a few basic but representative data processing algorithms in
the petroleum industry. Geophysicists can use the platform to productively
design and implement scalable seismic data processing algorithms without handling
the details of data management and the complexity of parallelism. The Cloud
platform generates a complete data processing application based on user’s
kernel program and simple configurations, allocates resources and executes it
in parallel on top of Spark and Hadoop.