Optimal Feature Subset Selection Based on Combining Document Frequency and Term Frequency for Text Classification

Thirumoorthy Karpagalingam; Muneeswaran Karuppaiah

doi:10.31577/cai_2020_5_881

Authors

Thirumoorthy Karpagalingam Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India
Muneeswaran Karuppaiah Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India

DOI:

https://doi.org/10.31577/cai_2020_5_881

Keywords:

Feature selection, text classification, document frequency, term frequency

Abstract

Feature selection plays a vital role to reduce the high dimension of the feature space in the text document classification problem. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of a proper subset of the significant features of the text corpus is needed to classify the data in less computational time with higher accuracy. In this proposed research, a novel feature selection method which combines the document frequency and the term frequency (FS-DFTF) is used to measure the significance of a term. The optimal feature subset which is selected by our proposed work is evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The experimental outcome confirms that the proposed method has a better classification accuracy when compared with other feature selection techniques.

Downloads

Download data is not yet available.

Optimal Feature Subset Selection Based on Combining Document Frequency and Term Frequency for Text Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information

Make a Submission

Keywords