A Bagheri
PSA : a hybrid feature selection approach for Persian text classification
Bagheri, A; Saraee, MH; Nadi, S
Abstract
In recent decades, as enormous amount of data being accumulated, the number of text documents is increasing vastly. E-mails, web pages, texts, news and articles are only part of this grow. Thus the need for text mining techniques, including automatic text classification, is rising. In automatic text classification, feature selection from within any text appears to be the most important step. Since the feature space in textual data includes tens of thousands of words, feature selection is used for dimension reduction. Different techniques, from statistical to machine learning approaches for feature selection in text have been reported in literature, each with advantages and disadvantages. However up to now there have been very rare researches on utilizing advantages of both learning and statistical approaches. In this paper a new algorithm for feature selection in text is presented to improve the classification performance substantially. The proposed approach - PSA - is based on simulated annealing algorithm and document frequency method. So it can benefit from advantages of both statistical and learning techniques. The simulated annealing algorithm requires an appropriate function for fitness evaluation, where document frequency method as an evaluation function has low computational cost. In addition, a new Persian text dataset, i.e. Persian 7-NewsGroups Dataset, is introduced for evaluating the proposed approach. Therefore, to justify and evaluate our approach, the performance of the PSA is compared to famous methods such as chi-square and correlation coefficient on Persian 7-NewsGroups dataset. The results show that the PSA has overall better performance in comparison to the other methods.
Citation
Bagheri, A., Saraee, M., & Nadi, S. (2015). PSA : a hybrid feature selection approach for Persian text classification. Journal of computing and security (Online), 1(4), 261-272
Journal Article Type | Article |
---|---|
Acceptance Date | Sep 22, 2014 |
Online Publication Date | Feb 27, 2015 |
Publication Date | Feb 27, 2015 |
Deposit Date | Jul 31, 2017 |
Journal | Journal of Computing and Security |
Print ISSN | 2322-4460 |
Electronic ISSN | 2383-0417 |
Volume | 1 |
Issue | 4 |
Pages | 261-272 |
Publisher URL | http://www.jcomsec.org/index.php/JCS/article/view/136 |
Related Public URLs | http://www.jcomsec.org/ |
You might also like
Features in extractive supervised single-document summarization: case of Persian news
(2024)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search