KISTI Institutional Repository: Improving I/O efficiency in Hadoop-based Massive Data Analysis programs

KISTI repository

download0 view1,446

This item is licensed Korea Open Government License

Title: Improving I/O efficiency in Hadoop-based Massive Data Analysis programs

Abstract: Apache Hadoop has been a popular parallel processing tool in this era of big data.
While practitioners have rewritten many conventional analysis algorithms to make them accustomed to Hadoop,
the I/O inefficiency of Hadoop-based programs has been repeatedly reported in the literature.
In this article, we address the problem of I/O inefficiency in Hadoop-based massive data analysis
by introducing our efficient modification of Hadoop.
We first incorporate a columnar data layout into the conventional Hadoop framework without any modification
of the Hadoop internals. We also provide an indexing capability into Hadoop to save many I/Os
while processing not only selection predicates but also star-join queries which are frequently used in
many analysis tasks.

KISTI 국가과학기술데이터본부 디지털큐레이션센터 데이터표준화팀
우)34141 대전광역시 유성구 대학로 245 한국과학기술정보연구원
Tel 042) 869-1004,1234 FAX 042) 869-1091

KISTI Institutional Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.