Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM

Authors

  • Di Wu Department of Information and Electronic Engineering, Hebei University of Engineering, Handan, Hebei, China
  • Zhuyun Huang Department of Information and Electronic Engineering, Hebei University of Engineering, Handan, Hebei, China

DOI:

https://doi.org/10.31577/cai_2022_3_788

Keywords:

Danmaku text, short text clustering, feature extension, OBTM, new word discovery

Abstract

The danmaku text clustering is a hot topic in online video reviews. Given the problem of unsatisfactory clustering accuracy caused by short text and many new words, the danmaku text clustering algorithm based on feature extension and word-pair filtering OBTM is proposed. First, a new-word discovery algorithm based on weight optimization is proposed to retain the features of new words in the danmaku text. Then, the internal information and external knowledge of new words are used to expand the features of the danmaku text for reduced feature sparsity. Furthermore, the OBTM topic model based on word-pair filtering is designed to eliminate noise features. Finally, the Single-Pass algorithm based on cluster center iteration is proposed to obtain the clustering results of topic feature words. Experimental results show that the algorithm proposed in this paper is 13.33 %, 8.52 %, 6.25 % higher than the OBTM, Word2vec+BTM, OurE.Drift* algorithm, respectively, in terms of clustering accuracy.

Downloads

Download data is not yet available.

Downloads

Published

2022-09-08

How to Cite

Wu, D., & Huang, Z. (2022). Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM. COMPUTING AND INFORMATICS, 41(3), 788–812. https://doi.org/10.31577/cai_2022_3_788