download0 view200
twitter facebook

공공누리This item is licensed Korea Open Government License

CASS: A distributed network clustering algorithm based on structure similarity for large-scale network
Publication Year
As the size of networks increases, it is becoming important to analyze large-scale network
data. A network clustering algorithm is useful for analysis of network data. Conventional network
clustering algorithms in a single machine environment rather than a parallel machine
environment are actively being researched. However, these algorithms cannot analyze
large-scale network data because of memory size issues. As a solution, we propose a network
clustering algorithm for large-scale network data analysis using Apache Spark by
changing the paradigm of the conventional clustering algorithm to improve its efficiency in
the Apache Spark environment. We also apply optimization approaches such as Bloom filter
and shuffle selection to reduce memory usage and execution time. By evaluating our proposed
algorithm based on an average normalized cut, we confirmed that the algorithm can
analyze diverse large-scale network datasets such as biological, co-authorship, internet
topology and social networks. Experimental results show that the proposed algorithm can
develop more accurate clusters than comparative algorithms with less memory usage. Furthermore,
we confirm the proposed optimization approaches and the scalability of the proposed
algorithm. In addition, we validate that clusters found from the proposed algorithm
can represent biologically meaningful functions.
network alaysks; algorithm; clustering
Journal Title
PLOS one
Citation Volume
Files in This Item:
There are no files associated with this item.
Appears in Collections:
7. KISTI 연구성과 > 학술지 발표논문
RIS (EndNote)
XLS (Excel)