This item is licensed Korea Open Government License
dc.contributor.author
이경하
dc.contributor.author
서영균
dc.contributor.author
강우람
dc.date.accessioned
2019-08-28T07:42:14Z
dc.date.available
2019-08-28T07:42:14Z
dc.date.issued
2018-12-02
dc.identifier.issn
1058-9244
dc.identifier.uri
https://repository.kisti.re.kr/handle/10580/14739
dc.description.abstract
Apache Hadoop has been a popular parallel processing tool in this era of big data.
While practitioners have rewritten many conventional analysis algorithms to make them accustomed to Hadoop,
the I/O inefficiency of Hadoop-based programs has been repeatedly reported in the literature.
In this article, we address the problem of I/O inefficiency in Hadoop-based massive data analysis
by introducing our efficient modification of Hadoop.
We first incorporate a columnar data layout into the conventional Hadoop framework without any modification
of the Hadoop internals. We also provide an indexing capability into Hadoop to save many I/Os
while processing not only selection predicates but also star-join queries which are frequently used in
many analysis tasks.
dc.language
eng
dc.relation.ispartofseries
Scientific programming
dc.title
Improving I/O efficiency in Hadoop-based Massive Data Analysis programs