Lightweight Dual-Stream Human Behavior Inference Network Based on Multi-Layer Perceptron
DOI:
https://doi.org/10.31577/cai_2025_2_408Keywords:
Human-human interaction, human action recognition, skeleton jointsAbstract
The recognition of human behaviors in videos is a critical domain within human activity analysis. However, the current architectures and mechanisms of human behavior recognition methods dominated by CNN, GCNs, and LSTM are unduly complex resulting in high computational complexity of the models. Furthermore, these methods often exhibit poor robustness when it comes to recognizing behaviors across different environmental conditions and video angles. To address these challenges, this paper introduces a lightweight human skeleton interaction behavior inference network based on a multi-layer perceptron. This network leverages human skeleton information and utilizes minimal prior knowledge to infer limb behavior encoding. To reduce computational complexity, videos are divided into smaller segments, serving as the minimum computation units. This approach integrates three essential types of information: independent global information about individual postures, local interaction information regarding different limb parts, and temporal distance information. These three types of information are coupled through LSTM, incorporating temporal changes into network for recognition and classification. In comparison to previous similar methods, our proposed method is more lightweight, exhibits stronger robustness against interference and enables behavior recognition across different environments and perspectives.
