A Lightweight Text Representation Method Integrating Topic Information

Authors

  • Shuobin Zhang College of Computer Science and Technology, Shandong Technology and Business University, Yantai, 264005, China
  • Jianhua Sun Weihai Municipal Hospital, Cheeloo College of Medicine, Shandong University, Weihai, 264200, China
  • Huanling Tang College of Computer Science and Technology, Shandong Technology and Business University, Co-Innovation Center of Shandong Colleges and Universities, Yantai, 264005, China
  • Wenhao Duan College of Computer Science and Technology, Shandong Technology and Business University, Co-Innovation Center of Shandong Colleges and Universities, Yantai, 264005, China
  • Quansheng Dou College of Computer Science and Technology, Shandong Technology and Business University, Co-Innovation Center of Shandong Colleges and Universities, Yantai, 264005, China
  • Mingyu Lu Information Science and Technology College, Dalian Maritime University, Dalian, 116026, China

Keywords:

Lightweight, topic model, deep learning, text classification, feature representation

Abstract

Transformer models and their variants have shown significant advantages in natural language processing tasks, but their high computational requirements limit their deployment on resource-constrained devices. To achieve the balance of computational cost and accuracy, the lightweight model pNLP-Mixer utilizes parameter-free projection to generate text embeddings. Nevertheless, the text embeddings generated by projection involve shallow semantic information and ignore the exploration of implicit semantic information. To overcome the challenges of high parameter cost and insufficient representation capabilities in existing models, we propose a method that incorporates topic information to enhance semantic richness. Our approach leverages the Latent Dirichlet Allocation (LDA) topic model to capture latent semantic relationships between words, thereby improving the expressiveness of text representations for downstream tasks. Building on this method, we propose a lightweight model named TEP-Mixer, which integrates multiple feature extraction modules to further enhance representation capabilities. Experimental results demonstrate that TEP-Mixer outperforms other lightweight models in accuracy while maintaining a lower parameter count across multiple benchmark datasets. It is suitable for resource-constrained devices.

Downloads

Download data is not yet available.

Published

2026-02-13

How to Cite

Zhang, S., Sun, J., Tang, H., Duan, W., Dou, Q., & Lu, M. (2026). A Lightweight Text Representation Method Integrating Topic Information. Computing and Informatics, 44(6). Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/7311