A Clustering-Based Algorithm for Data Reduction

Yeh, Chi-Yuan; Ouyang, Jeng; Lee, Shie-Jue

Permalink : https://ousar.lib.okayama-u.ac.jp/19639

ID	19639
Eprint ID	19639
FullText URL	IWCIA2009_A1202.pdf 514 KB
Author	Yeh Chi-Yuan Ouyang Jeng Lee Shie-Jue
Abstract	Finding an efficient data reduction method for large-scale problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods.
Keywords	Large-scale dataset fuzzy similarity data reduction prototype reduction instance-filtering instance-abstraction
Published Date	2009-11-12
Publication Title	Proceedings : Fifth International Workshop on Computational Intelligence & Applications
Volume	volume2009
Issue	issue1
Publisher	IEEE SMC Hiroshima Chapter
Start Page	65
End Page	70
ISSN	1883-3977
NCID	BB00577064
Content Type	Conference Paper
language	English
Copyright Holders	IEEE SMC Hiroshima Chapter
Event Title	5th International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter ： IWCIA 2009
Event Location	東広島市
Event Location Alternative	Higashi-Hiroshima City
File Version	publisher
Refereed	True
Eprints Journal Name	IWCIA