KISTI Institutional Repository: CASS: A distributed network clustering algorithm based on structure similarity for large-scale network

KISTI repository

download0 view1,317

This item is licensed Korea Open Government License

Title: CASS: A distributed network clustering algorithm based on structure similarity for large-scale network

Abstract: As the size of networks increases, it is becoming important to analyze large-scale network
data. A network clustering algorithm is useful for analysis of network data. Conventional network
clustering algorithms in a single machine environment rather than a parallel machine
environment are actively being researched. However, these algorithms cannot analyze
large-scale network data because of memory size issues. As a solution, we propose a network
clustering algorithm for large-scale network data analysis using Apache Spark by
changing the paradigm of the conventional clustering algorithm to improve its efficiency in
the Apache Spark environment. We also apply optimization approaches such as Bloom filter
and shuffle selection to reduce memory usage and execution time. By evaluating our proposed
algorithm based on an average normalized cut, we confirmed that the algorithm can
analyze diverse large-scale network datasets such as biological, co-authorship, internet
topology and social networks. Experimental results show that the proposed algorithm can
develop more accurate clusters than comparative algorithms with less memory usage. Furthermore,
we confirm the proposed optimization approaches and the scalability of the proposed
algorithm. In addition, we validate that clusters found from the proposed algorithm
can represent biologically meaningful functions.

URI: https://repository.kisti.re.kr/handle/10580/14747
http://www.ndsl.kr/ndsl/search/detail/article/articleSearchResultDetail.do?cn=NART90641578

KISTI 국가과학기술데이터본부 디지털큐레이션센터 데이터표준화팀
우)34141 대전광역시 유성구 대학로 245 한국과학기술정보연구원
Tel 042) 869-1004,1234 FAX 042) 869-1091

KISTI Institutional Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.