Farsi Font Recognition Based on the Fonts of Text Samples Extracted by Som
- Engineering Faculty, Golestan University, Gorgan, Iran
- Engineering Faculty, Golestan University, Gorgan, Iran
A Farsi font recognition algorithm based on the fonts of some frequent text samples is proposed. Some
features are extracted from the connected components of a text image. The feature vectors are clustered
by using a Self-Organizing Map (SOM) clustering method. The clusters with more members determine
the most frequent connected components (MFCCs). A number of members of these big clusters are
extracted from the input image. This procedure is applied to both training and test images. Since the
frequent samples in different Farsi texts are very similar, it can be guaranteed that a large number of
samples of the detected MFCCs for a test image surely are in the extracted training samples set. The font
type and font style of the extracted test samples are recognized by matching between them and the
training samples. The most frequent recognized font of the extracted samples is considered as the font of
the input text. To achieve a more accurate algorithm with lower complexity, the font size is determined in
the second phase after the phase of the font type and style recognition. Using a lexicon reduction
procedure reduces the complexities and processing time. The font size estimation is carried out based on
the size of a particular MFCC in a text image. Experiments show that the proposed method outperforms
other font recognition methods.
- Farsi font recognition
- Most-frequent connected components
H. S. Baird, G. Nagy, A Self-Correcting 100-Font Classifier, In Proc. of SPIE, 2181 (1994), 106-115.
H. Ma, D. Doermann, Font Identification Using the Grating Cell Texture Operator, In Proc. of DRR, (2005), 148-156.
A. Zramdini, R. Ingold, Optical Font Recognition from Projection Profiles, Electronic Publishing, 6 (1993), 249-260.
A. Zramdini, R. Ingold, Optical Font Recognition Using Typographical Features, IEEE Trans. on PAMI, 20 (1998), 877-882.
M. C. Jung, Y. C. Shin, S. N. Srihari, Multifont Classification using Typographical Attributes, In Proc. of ICDAR, India, (1999), 353-356.
Y. Zhu, T. Tan, Y. Wang, Font Recognition Based on Global Texture Analysis, IEEE Trans. on PAMI, 23 (2001), 1192-1200.
S. H. Kim, Word-Level Optical Font Recognition Using Typographical Features, IJPRAI, 18 (2004), 541-561.
C. A. Cruz, R. R. Kuoppa, M. R. Ayala, A. A. Gonzalez, R. E. Perez, High-order Statistical Texture Analysis-Font Recognition Applied , Pattern Recognition Letters, 26 (2005), 135-145.
B. B. Chaudhuri, U. Garain, Extraction of Type Style-based Meta-information from Image Documents, IJDAR, 3 (2001), 138-149.
B. Allier, H. Emptoz, Font Type Extraction and Character Prototyping Using Gabor Filters, In Proc. of ICDAR, (2003), 799-803.
C. F. Lin, Y. F. Fang, Y. T. Juang, Chinese text distinction and font identification by recognizing most frequently used characters, Image and Vision Computing, 19 (2001), 329-338.
Z. Yang, L. Yang, D. Qi, C.Y. Suen, An EMD-based Recognition Method for Chinese Fonts and Styles, Pattern Recognition Letters, 27 (2006), 1692-1701.
X. Ding, L. Chen, T. Wu, Character Independent Font Recognition on a Single Chinese Character, IEEE Trans. on PAMI, 29 (2007), 197-204.
I. S. I. Abuhaiba, Arabic Font Recognition Based on Templates, Int. Arab Journal of Information Technology, 1 (2003), 33-39.
I. S. I. Abuhaiba, Arabic Font Recognition Using Decision Trees Built from Common Words, Journal of Computing and Information Technology (CIT), 13 (2005), 211-223.
B. Moussa, A. Zahour, M. A. Alimi, A. Benabdelhafid, Can Fractal Dimension Be Used in Font Classification, In Proc. of ICDAR, (2005), 146-150.
A. Borji, M. Hamidi, Support Vector Machine for Persian Font Recognition, Int. Journal of Intelligent Systems and Technologies, 2 (2007), 178-183.
H. Khosravi, E. Kabir, Farsi font recognition based on Sobel–Roberts features, Pattern Recognition Letters, 31 (2010), 75–82.
M. Ziaratban, F. Bagheri, Improving Farsi font recognition accuracy by using proposed directional elliptic Gabor filters, First Iranian Conference on Pattern Recognition and Image Analysis (PRIA), (2013), 1 – 5.
M. Ziaratban, K. Faez, F. Bagheri, Content-Independent Farsi Font Recognition Based on Dynamic Most-Frequent Connected Components, 21st International Conference on Pattern Recognition (ICPR 2012) Tsukuba, Japan, 11-15 (2012), 729-733.
S. M. Lajevardi, Z. M. Hussain, Feature Extraction for Facial Expression Recognition based on Hybrid Face Regions, Advances in Electrical and Computer Engineering, 9 (2009), 63-67.
R. Maghsoudi, A. Ghorbannia Delavar, S. Hoseyny, R. Asgari, Y. Heidari, Representing the New Model for Improving K-Means Clustering Algorithm based on Genetic Algorithm, The Journal of Mathematics and Computer Science , 2 (2011), 329-336.
J. Rajaie, B. Fakhar, A Novel Method for Document Clustering using Ant-Fuzzy Algorithm, The Journal of Mathematics and Computer Science , 4 (2012), 182 – 196.
K. Azaryuon, B. Fakhar, A Novel Document Clustering Algorithm Based on Ant Colony Optimization Algorithm, The Journal of mathematics and computer Science , 7 (2013), 171-180.
J. Vahidi, S. Mirpour, Introduce a New Algorithm for Data Clustering by Genetic Algorithm, The Journal of Mathematics and Computer Science , 10 (2014), 144 – 156.
Gh. H. Mohebpour, A. Ghorbannia Delavar, Some new mutation operators for genetic data clustering, The Journal of Mathematics and Computer Science , 12 (2014), 282-294.
Gh. H. Mohebpour, A. Ghorbannia Delavar, CCGDC: A new crossover operator for genetic data clustering, The Journal of Mathematics and Computer Science, 11 (2014), 191-208.
T. Kohonen, The self-organizing map, Proc. IEEE, 78(9) (1990), 1464-1480.
A. Mowlaei, K. Faez, A. T. Haghighat, Feature Extraction with Wavelet Transform for Recognition of Isolated Handwritten Farsi/Arabic Characters and Numerals , In Proc. of Int. Conf. on Digital Signal Processing, 2 (2002), 923-926.
Y. Freund, R. E. Schapire, Experiments with a new boosting algorithm, In Proc. of Int. Conf. on Machine Learning, Bari, Italy, (1996), 148–156.