download0 view750
twitter facebook

공공누리This item is licensed Korea Open Government License

dc.contributor.author
이정훈
dc.contributor.author
이윤준
dc.date.accessioned
2019-08-28T07:41:37Z
dc.date.available
2019-08-28T07:41:37Z
dc.date.issued
2014-03-01
dc.identifier.issn
0219-1377
dc.identifier.uri
https://repository.kisti.re.kr/handle/10580/14345
dc.identifier.uri
http://www.ndsl.kr/ndsl/search/detail/article/articleSearchResultDetail.do?cn=NART68728514
dc.description.abstract
Clustering is to group similar data and find out hidden information about the characteristics of dataset for the further analysis. The concept of dissimilarity of objects is a decisive factor for good quality of results in clustering. When attributes of data are not just numerical but categorical and high dimensional, it is not simple to discriminate the dissimilarity of objects which have synonymous values or unimportant attributes. We suggest a method to quantify the level of difference between categorical values and to weigh the implicit influence of each attribute on constructing a particular cluster. Our method exploits distributional information of data correlated with each categorical value so that intrinsic relationship of values can be discovered. In addition, it measures significance of each attribute in constructing respective cluster dynamically. Experiments on real datasets show the propriety and effectiveness of the method, which improves the results considerably even with simple clustering algorithms. Our approach does not couple with a clustering algorithm tightly and can also be applied to various algorithms flexibly.
dc.language
eng
dc.relation.ispartofseries
Knowledge and Information Systems
dc.title
An effective dissimilarity measure for clustering of high-dimensional categorical data
dc.citation.endPage
757
dc.citation.number
3
dc.citation.startPage
743
dc.citation.volume
38
dc.identifier.bibliographicCitation
vol. 38, no. 3, page. 743 - 757
dc.subject.keyword
Similarity
dc.subject.keyword
Dissimilarity
dc.subject.keyword
Clustering
dc.subject.keyword
High-dimensional data
dc.subject.keyword
Categorical data
Appears in Collections:
7. KISTI 연구성과 > 학술지 발표논문
Files in This Item:
There are no files associated with this item.

Browse