A Lightweight Text Representation Method Integrating Topic Information

Shuobin Zhang; Jianhua Sun; Huanling Tang; Wenhao Duan; Quansheng Dou; Mingyu Lu

Authors

Shuobin Zhang College of Computer Science and Technology, Shandong Technology and Business University, Yantai, 264005, China
Jianhua Sun Weihai Municipal Hospital, Cheeloo College of Medicine, Shandong University, Weihai, 264200, China
Huanling Tang College of Computer Science and Technology, Shandong Technology and Business University, Co-Innovation Center of Shandong Colleges and Universities, Yantai, 264005, China
Wenhao Duan College of Computer Science and Technology, Shandong Technology and Business University, Co-Innovation Center of Shandong Colleges and Universities, Yantai, 264005, China
Quansheng Dou College of Computer Science and Technology, Shandong Technology and Business University, Co-Innovation Center of Shandong Colleges and Universities, Yantai, 264005, China
Mingyu Lu Information Science and Technology College, Dalian Maritime University, Dalian, 116026, China

Keywords:

Lightweight, topic model, deep learning, text classification, feature representation

Abstract

Transformer models and their variants have shown significant advantages in natural language processing tasks, but their high computational requirements limit their deployment on resource-constrained devices. To achieve the balance of computational cost and accuracy, the lightweight model pNLP-Mixer utilizes parameter-free projection to generate text embeddings. Nevertheless, the text embeddings generated by projection involve shallow semantic information and ignore the exploration of implicit semantic information. To overcome the challenges of high parameter cost and insufficient representation capabilities in existing models, we propose a method that incorporates topic information to enhance semantic richness. Our approach leverages the Latent Dirichlet Allocation (LDA) topic model to capture latent semantic relationships between words, thereby improving the expressiveness of text representations for downstream tasks. Building on this method, we propose a lightweight model named TEP-Mixer, which integrates multiple feature extraction modules to further enhance representation capabilities. Experimental results demonstrate that TEP-Mixer outperforms other lightweight models in accuracy while maintaining a lower parameter count across multiple benchmark datasets. It is suitable for resource-constrained devices.

Downloads

Download data is not yet available.

A Lightweight Text Representation Method Integrating Topic Information

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords