Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM

Di Wu; Zhuyun Huang

doi:10.31577/cai_2022_3_788

Authors

Di Wu Department of Information and Electronic Engineering, Hebei University of Engineering, Handan, Hebei, China
Zhuyun Huang Department of Information and Electronic Engineering, Hebei University of Engineering, Handan, Hebei, China

DOI:

https://doi.org/10.31577/cai_2022_3_788

Keywords:

Danmaku text, short text clustering, feature extension, OBTM, new word discovery

Abstract

The danmaku text clustering is a hot topic in online video reviews. Given the problem of unsatisfactory clustering accuracy caused by short text and many new words, the danmaku text clustering algorithm based on feature extension and word-pair filtering OBTM is proposed. First, a new-word discovery algorithm based on weight optimization is proposed to retain the features of new words in the danmaku text. Then, the internal information and external knowledge of new words are used to expand the features of the danmaku text for reduced feature sparsity. Furthermore, the OBTM topic model based on word-pair filtering is designed to eliminate noise features. Finally, the Single-Pass algorithm based on cluster center iteration is proposed to obtain the clustering results of topic feature words. Experimental results show that the algorithm proposed in this paper is 13.33 %, 8.52 %, 6.25 % higher than the OBTM, Word2vec+BTM, OurE.Drift* algorithm, respectively, in terms of clustering accuracy.

Downloads

Download data is not yet available.

Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords