cd /news/computer-vision/from-spatial-to-spectral-an-efficien… · home topics computer-vision article
[ARTICLE · art-37203] src=arxiv.org ↗ pub= topic=computer-vision verified=true sentiment=↑ positive

From Spatial to Spectral: An Efficient, Frequency-Guided Feature Representation Learner for Small Object Detection

Researchers propose a frequency-guided feature representation learner for small object detection that shifts from spatial to spectral processing. The method, instantiated via lightweight plug-and-play modules, achieves consistent gains on multi-domain benchmarks while requiring only 1/6 of the parameters of YOLOv11 models.

read1 min views5 publishedJun 24, 2026

arXiv:2606.23825v1 Announce Type: new Abstract: Efficient small object detection is bottlenecked by the inherent feature scarcity of tiny targets, which is further aggravated by operations of spatial-domain detectors that indiscriminately discard critical high-frequency details. Recovering these fragile cues within the spatial domain is notoriously difficult, as it often requires computationally expensive architectural upscaling that inadvertently amplifies background noise. To bridge this gap, we propose a paradigm \textbf{shift from spatial to spectral} feature processing, introducing a holistic solution with the following novelty: (1) A versatile \textbf{Frequency-Guided Feature Representation framework} that generalizes across diverse detector architectures (both CNN and Transformer-based), offering a robust alternative to spatial-only feature extraction; (2) The unified \textbf{Decompose--Enhance--Reconstruct (DER)} operator, instantiated via three \textbf{lightweight, plug-and-play} modules -- Wavelet-Difference Gate (WDG), Log-Gabor Enhancer (LGE), and Frequency-Driven Head (FDHead) -- to systematically inject frequency-aware modulation into the backbone, neck, and head. This mechanism decouples feature modeling from resolution reduction, capturing discriminative high-frequency components to enable accurate localization with significantly reduced parameter redundancy; (3) Extensive validation on multi-domain benchmarks (VisDrone2019, UAVDT, TinyPerson, DOTAv1) demonstrating consistent gains. Notably, our proposed \textbf{DERNet} series outperforms YOLOv11 models under the same scale while requiring \textbf{only 1/6 of the parameters}, backed by rigorous spectral diagnostics and error decomposition analysis.

── more in #computer-vision 4 stories · sorted by recency
── more on @visdrone2019 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/from-spatial-to-spec…] indexed:0 read:1min 2026-06-24 ·