한국과학기술정보연구원 Korea Institute of Science and Technology Information
Publication Year
2012-09
Description
funder : 미래창조과학부 funder : KA agency : 한국과학기술정보연구원 agency : Korea Institute of Science and Technology Information
Abstract
1 Overview
Many analysis tasks perform computations on a subset of data records selected based on the attributes data values. For example, when analyzing an astronomy dataset, one might want to plot a set of light curves for objects in a particular patch of the sky. The data needed for this operation could be retrieved from data records satisfying certain range conditions on Right Ascension and Declination. This subsetting procedure reduces the amount of data to be transported to the Cloud Computing facilities and is therefore critical to the overall effectiveness of the distributed analysis system.
Selected data records often span many different data files. The analysis programs must know which data files include the selected records. Extracting the values out of these files can be time-consuming especially if the number of files is large. In some cases, the selected data records have to be reorganized such as clusters based on the time of the observation. Even though such subsetting and reorganization functions are frequently required, they are not well supported by current scientific data management systems.
In this project, our goal is to develop a generalized attribute-based unified data access service that provides transparent and highly efficient data-access mechanisms and optimizes network resource utilization by reducing the data at the source. This requires a high-level coordination of data discovery, data selection, index generation, data access, and data delivery. Our approach is to provide a generalpurpose service framework so that clients of portals, such as Astronomical Data Analysis Portal, can manage the data flows easily and efficiently. Figure 1 shows the high-level design of the attribute-based unified data access service in the context of Astronomical Data Analysis Portal.
This document summarizes the technical results in LBNL-KISTI collaboration project for the project period from April 15, 2012 to September 30, 2012.