cd /news/computer-vision/vigilformer-deformable-attention-for… · home topics computer-vision article
[ARTICLE · art-28911] src=arxiv.org ↗ pub= topic=computer-vision verified=true sentiment=↑ positive

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

Researchers introduced VigilFormer, a video anomaly detection framework combining deformable spatio-temporal attention with causal temporal modeling, achieving state-of-the-art AUC scores of 87.83% on UCF-Crime, 97.21% on ShanghaiTech, and 89.74% on CUHK Avenue at 41.5 FPS on a single GPU. The model uses a Deformable Spatio-Temporal Encoder to reduce computational cost and an Adaptive Confidence Scheduler to skip low-information frames, outperforming existing weakly-supervised methods in both accuracy and speed.

read1 min views1 publishedJun 16, 2026

arXiv:2606.14724v1 Announce Type: new Abstract: Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput, a tension that existing methods address either through stronger feature extractors or more efficient architectures, but rarely both. We present VigilFormer, a unified framework that combines deformable spatio-temporal attention with causal temporal modeling to detect anomalies in untrimmed surveillance video. The proposed Deformable Spatio-Temporal Encoder (DSTE) attends to a sparse set of informative locations across frames, avoiding the quadratic cost of dense attention while retaining the ability to capture irregular motion patterns. A Causal Anomaly Classifier (CAC) applies dilated causal convolutions over snippet-level features and optimizes a contrastive multiple-instance learning objective that separates anomalous and normal representations without frame-level labels. To meet deployment constraints, an Adaptive Confidence Scheduler (ACS) dynamically skips low-information frames at inference time, reducing redundant computation in static scenes. Evaluated on UCF-Crime, ShanghaiTech, and CUHK Avenue, VigilFormer achieves AUC scores of 87.83%, 97.21%, and 89.74% respectively, at 41.5 FPS on a single GPU, outperforming recent weakly-supervised methods in both accuracy and speed.

── more in #computer-vision 4 stories · sorted by recency
── more on @vigilformer 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/vigilformer-deformab…] indexed:0 read:1min 2026-06-16 ·