CF-YOLO: Towards Highly Effective Small Face Detection in Crowded Scenes
DOI:
https://doi.org/10.31577/cai_2026_1_182Keywords:
Face detection, label assignment, NMS, EMA+Abstract
To address the issue of low recall rates in detecting small faces within crowded scenes, this paper conducts an analysis of the primary reasons behind this challenge and introduces a real-time face detection system named CF-YOLO (Crowded-Face-YOLO). The study identifies a crucial factor contributing to this problem, which is the insufficient provision of positive samples for small faces during the training phase by conventional face detectors. To tackle this limitation, a Sa-SimOTA strategy is proposed to enhance the availability of positive samples for small targets. Additionally, in the post-processing stage, the utilization of the non-maximum suppression (NMS) algorithm for assigning optimal bounding boxes to detected faces is discussed. The traditional fixed threshold employed in the NMS algorithm for decision-making often results in the loss of small face detection boxes in crowded scenarios. To alleviate this issue, a Soft-Face-NMS algorithm is introduced, which incorporates facial feature variables into the Soft-NMS algorithm for weighted processing, facilitating the selection of face boxes with higher confidence in overlapping regions. Furthermore, to augment the feature extraction capabilities of the YOLO backbone, an EMA+ attention module is proposed, and modifications are made to the network structure of YOLOv7 to enhance the extraction of more effective features conducive to small face detection. The proposed model demonstrates impressive accuracy rates of 97.3 %, 96.4 %, and 92.8 % on the easy, medium, and hard subsets of the Wider-Face dataset, respectively. Notably, the accuracy achieved on the hard subset approaches the state-of-the-art level, which further demonstrates the effectiveness of our proposed approach for face detection in crowded scenes.