A Novel Method for Document Clustering Using Ant-fuzzy Algorithm
-
2431
Downloads
-
3542
Views
Authors
Javad Rajaie
- Department of Computer Engineering, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran
Babak Fakhar
- Department of Computer Engineering, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran
Abstract
Availability of large full-text document collection in electronic forms has created a need for tools techniques that assist users in organization. Document clustering is one of the popular methods used for this purpose. Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm. The ant behavior model is modified to pursue better algorithmic performance. In this paper, a hybrid approach based on Ant clustering and Fuzzy clustering methods is used. First ant based clustering is used for creating raw and imprecise clusters and then these clusters are refined by means of fuzzy C-Mean (FCM) algorithm. For large datasets these two stages does not suffice and many homogenous small clusters are formed. Thus more iteration of these two stages is usually required and clusters from previous iterations are used as a building block in the following iterations to build finer and larger clusters.
The proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.
Share and Cite
ISRP Style
Javad Rajaie, Babak Fakhar, A Novel Method for Document Clustering Using Ant-fuzzy Algorithm, Journal of Mathematics and Computer Science, 4 (2012), no. 2, 182--196
AMA Style
Rajaie Javad, Fakhar Babak, A Novel Method for Document Clustering Using Ant-fuzzy Algorithm. J Math Comput SCI-JM. (2012); 4(2):182--196
Chicago/Turabian Style
Rajaie, Javad, Fakhar, Babak. "A Novel Method for Document Clustering Using Ant-fuzzy Algorithm." Journal of Mathematics and Computer Science, 4, no. 2 (2012): 182--196
Keywords
- Ant colony optimization
- Ant-based clustering
- text clustering
- ant movement strategy.
MSC
References
-
[1]
M. Ankerst, M. Breunig, H. P. Kriegel, J. Sander, OPTICS: Ordering points to identify clustering structure, Proceedings of the ACM SIGMOD Conference, 1999 (1999), 49--60
-
[2]
B. Wu, Y. Zheng, S. Liu, Z. Shi, SIM: A Document Clustering Algorithm Based on Swarm Intelligence, IEEE World Congress on Computational Intelligence (Hawaiian), 2002 (2002), 477--482
-
[3]
M. Berry(ed.) , Survey of Text Mining: Clustering, Classification, and Retrieval, Springer, Berlin (2003)
-
[4]
E. Bonabeau, M. Dorigo, G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems , Oxford University Press , New York (1999)
-
[5]
Y-C. Chiou, L.W. Lan, Genetic clustering algorithms, European journal of operational research, 135 (2001), 413--427
-
[6]
J. L. Deneubourg, S. Goss, N. Frank, A. Sendova-Hanks, C. Detrain, L. Chrerien, The dynamics of collective sorting: robot-like ants and ant-like robots, Proceedings of the 1st International Conference on Simulation of Adaptive Behavior: From Animals to Animats, 1991 (1991), 356--363
-
[7]
J. Handl, J. Knowles, M. Dorigo, On the performance of ant-based clustering, Proceedings of the Third International Conference on Hybrid Intelligent Systems, 2003 (2003), 204--213
-
[8]
J. Hartigan, M. Wong, Algorithm AS136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), 28 (1979), 100--108
-
[9]
A. Hotho, S. Staab, G. Stumme, Wordnet improves text document clustering, Proceedings of the Semantic Web Workshop at SIGIR-2003, 26th Annual International ACM SIGIR Conference, Toronto, July 28-August 1, Canada (2003)
-
[10]
A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review, ACM Computing Surveys, 31 (1999), 264--323
-
[11]
P. M. Kanade, L.O. Hall, Fuzzy ants as a clustering concept, Proceedings of the 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS), 2003 (2003), 227--232
-
[12]
P. Kuntz, D. Snyers, P. Layzell, A stochastic heuristic for visualising graph clusters in a bi-dimensional space prior to partitioning, Journal of Heuristics, 5 (1998), 327--351
-
[13]
N. Labroché, N. Monmarché, G. Venturini, A new clustering algorithm based on the chemical recognition system of ants, Proceedings of the 2002 European Conference on Artificial Intelligence, 2002 (2002), 345--349
-
[14]
D. Lewis, Reuters-21578 text categorization test collection, http://www.daviddlewis.com, U.S.A. (2006)
-
[15]
E. Lumer, B. Faieta, Diversity and adaption in populations of clustering ants, Proceedings of the Third International Conference on Simulation of Adaptive Behaviour, Cambridge (1994)
-
[16]
Megaputer Intelligence Inc., Online introduction to TextAnalystTM, http://www.megaputer.com/, U.S.A. (2006)
-
[17]
http://www.megaputer.com/products/, , , (2006)
-
[18]
M. F. Porter, An algorithm for suffix stripping, in: Readings in information retrieval, 313--316, (1997)
-
[19]
N. Monmarché, M. Slimane, G. Venturini, AntClass: discovery of clusters in numeric data by an hybridization of an ant colony with the Kmeans algorithm , Internal Repport No. 213, U.S.A. (1999)
-
[20]
P. Berkhin, Survey of Clustering Data Mining Techniques, Accrue Software Research Paper, U.S.A. (2002)
-
[21]
V. Ramos, A. Abraham, ANTIDS: self organized ant based clustering model for intrusion detection system, Proceedings of The Fourth IEEE International Workshop on Soft Computing as Transdisciplinary Science and Technology (WSTST'05), 2005 (2005), 977--986
-
[22]
G. Salton, A. Wong, C. S. Yang, A vector space model for automatic indexing, Communications of the ACM, 18 (1975), 613--620
-
[23]
J. H. Ward, Hierarchical grouping to optimize an objective function, Journal of the American statistical association, 58 (1963), 236--244
-
[24]
H. Xia, S. Wang, T. Yoshida, Toward a revised ant-based text clustering algorithm, Proceedings of 7th International Symposium on Knowledge and Systems Sciences, 159--166, (2006)