download0 view197
twitter facebook

공공누리This item is licensed Korea Open Government License

Title
Improving I/O efficiency in Hadoop-based Massive Data Analysis programs
Author(s)
이경하서영균강우람
Publication Year
2018-12-02
Abstract
Apache Hadoop has been a popular parallel processing tool in this era of big data.
While practitioners have rewritten many conventional analysis algorithms to make them accustomed to Hadoop,
the I/O inefficiency of Hadoop-based programs has been repeatedly reported in the literature.
In this article, we address the problem of I/O inefficiency in Hadoop-based massive data analysis
by introducing our efficient modification of Hadoop.
We first incorporate a columnar data layout into the conventional Hadoop framework without any modification
of the Hadoop internals. We also provide an indexing capability into Hadoop to save many I/Os
while processing not only selection predicates but also star-join queries which are frequently used in
many analysis tasks.
Keyword
Parallel processing; MapReduce; Data layout; bitmap index
Journal Title
Scientific programming
ISSN
1058-9244
Files in This Item:
There are no files associated with this item.
Appears in Collections:
7. KISTI 연구성과 > 학술지 발표논문
URI
https://repository.kisti.re.kr/handle/10580/14739
Export
RIS (EndNote)
XLS (Excel)
XML

Browse