Effect of Term Weighting on Keyword Extraction in Hierarchical Category Structure

Authors

  • Boonthida Chiraratanasopha Institute of Informatics, Walailak University, Thailand
  • Salin Boonbrahm Institute of Informatics, Walailak University, Thailand
  • Thanaruk Theeramunkong Sirindhorn International Institute of Technology, Thammasat University, Thailand & Associate Fellow, The Royal Society of Thailand, Thailand

DOI:

https://doi.org/10.31577/cai_2021_1_57

Keywords:

Keyword Extraction, text classification, term weighting, hierarchical category structure

Abstract

While there have been several studies related to the effect of term weighting on classification accuracy, relatively few works have been conducted on how term weighting affects the quality of keywords extracted for characterizing a document or a category (i.e., document collection). Moreover, many tasks require more complicated category structure, such as hierarchical and network category structure, rather than a flat category structure. This paper presents a qualitative and quantitative study on how term weighting affects keyword extraction in the hierarchical category structure, in comparison to the flat category structure. A hierarchical structure triggers special characteristic in assigning a set of keywords or tags to represent a document or a document collection, with support of statistics in a hierarchy, including category itself, its parent category, its child categories, and sibling categories. An enhancement of term weighting is proposed particularly in the form of a series of modified TFIDF's, for improving keyword extraction. A text collection of public-hearing opinions is used to evaluate variant TFs and IDFs to identify which types of information in hierarchical category structure are useful. By experiments, we found that the most effective IDF family, namely TF-IDFr, is identity>sibling>child>parent in order. The TF-IDFr outperforms the vanilla version of TFIDF with a centroid-based classifier.

Downloads

Download data is not yet available.

Downloads

Published

2021-08-03

How to Cite

Chiraratanasopha, B., Boonbrahm, S., & Theeramunkong, T. (2021). Effect of Term Weighting on Keyword Extraction in Hierarchical Category Structure. COMPUTING AND INFORMATICS, 40(1), 57–82. https://doi.org/10.31577/cai_2021_1_57