CF-YOLO: Towards Highly Effective Small Face Detection in Crowded Scenes

Hongbo Huang; Longfei Xu; Xiaoxu Yan; Linkai Huang

doi:10.31577/cai_2026_1_182

Authors

Hongbo Huang Computer School, Beijing Information Science and Technology University, Beijing, 100000, China
Longfei Xu Computer School, Beijing Information Science and Technology University, Beijing, 100000, China
Xiaoxu Yan Computer School, Beijing Information Science and Technology University, Beijing, 100000, China
Linkai Huang Computer School, Beijing Information Science and Technology University, Beijing, 100000, China

DOI:

https://doi.org/10.31577/cai_2026_1_182

Keywords:

Face detection, label assignment, NMS, EMA+

Abstract

To address the issue of low recall rates in detecting small faces within crowded scenes, this paper conducts an analysis of the primary reasons behind this challenge and introduces a real-time face detection system named CF-YOLO (Crowded-Face-YOLO). The study identifies a crucial factor contributing to this problem, which is the insufficient provision of positive samples for small faces during the training phase by conventional face detectors. To tackle this limitation, a Sa-SimOTA strategy is proposed to enhance the availability of positive samples for small targets. Additionally, in the post-processing stage, the utilization of the non-maximum suppression (NMS) algorithm for assigning optimal bounding boxes to detected faces is discussed. The traditional fixed threshold employed in the NMS algorithm for decision-making often results in the loss of small face detection boxes in crowded scenarios. To alleviate this issue, a Soft-Face-NMS algorithm is introduced, which incorporates facial feature variables into the Soft-NMS algorithm for weighted processing, facilitating the selection of face boxes with higher confidence in overlapping regions. Furthermore, to augment the feature extraction capabilities of the YOLO backbone, an EMA+ attention module is proposed, and modifications are made to the network structure of YOLOv7 to enhance the extraction of more effective features conducive to small face detection. The proposed model demonstrates impressive accuracy rates of 97.3 %, 96.4 %, and 92.8 % on the easy, medium, and hard subsets of the Wider-Face dataset, respectively. Notably, the accuracy achieved on the hard subset approaches the state-of-the-art level, which further demonstrates the effectiveness of our proposed approach for face detection in crowded scenes.

Downloads

Download data is not yet available.

CF-YOLO: Towards Highly Effective Small Face Detection in Crowded Scenes

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information

Make a Submission

Keywords