From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection Researchers propose a frequency-guided feature representation learner for small object detection that shifts from spatial to spectral processing. The method, instantiated via lightweight plug-and-play modules, achieves consistent gains on multi-domain benchmarks while requiring only 1/6 of the parameters of YOLOv11 models. arXiv:2606.23825v1 Announce Type: new Abstract: Efficient small object detection is bottlenecked by the inherent feature scarcity of tiny targets, which is further aggravated by operations of spatial-domain detectors that indiscriminately discard critical high-frequency details. Recovering these fragile cues within the spatial domain is notoriously difficult, as it often requires computationally expensive architectural upscaling that inadvertently amplifies background noise. To bridge this gap, we propose a paradigm \textbf{shift from spatial to spectral} feature processing, introducing a holistic solution with the following novelty: 1 A versatile \textbf{Frequency-Guided Feature Representation framework} that generalizes across diverse detector architectures both CNN and Transformer-based , offering a robust alternative to spatial-only feature extraction; 2 The unified \textbf{Decompose--Enhance--Reconstruct DER } operator, instantiated via three \textbf{lightweight, plug-and-play} modules -- Wavelet-Difference Gate WDG , Log-Gabor Enhancer LGE , and Frequency-Driven Head FDHead -- to systematically inject frequency-aware modulation into the backbone, neck, and head. This mechanism decouples feature modeling from resolution reduction, capturing discriminative high-frequency components to enable accurate localization with significantly reduced parameter redundancy; 3 Extensive validation on multi-domain benchmarks VisDrone2019, UAVDT, TinyPerson, DOTAv1 demonstrating consistent gains. Notably, our proposed \textbf{DERNet} series outperforms YOLOv11 models under the same scale while requiring \textbf{only 1/6 of the parameters}, backed by rigorous spectral diagnostics and error decomposition analysis.