Hierarchical GRU with Input-Conditioned Slot Queries for Ball Action Anticipation Researchers introduced a hierarchical model for ball action anticipation in football broadcast video, achieving 17.91% mAP on the SoccerNet benchmark. The system uses a GRU and Transformer decoder with input-conditioned slot queries to predict actions from a 30-second observation window. arXiv:2606.14730v1 Announce Type: new Abstract: We present a hierarchical model for ball action anticipation in football broadcast video. Given a 30-second observation window, the system predicts actions occurring in the subsequent 5-second window across 10 classes. A shared local Transformer encodes clip-level features within each 5-second sub-window; a GRU then aggregates temporal context across all sub-windows; finally, a Transformer decoder with K input-conditioned event slots decodes the anticipation target via three decoupled heads objectness, class, temporal offset . We introduce frequency-reweighted Hungarian matching that systematically favours rare action classes, and Gaussian soft targets for temporal bin supervision. On the SoccerNet Ball Action Anticipation benchmark, our method achieves 17.91% mAP on the test server.