cd /news/computer-vision/structure-over-pixels-learning-varia… · home topics computer-vision article
[ARTICLE · art-16020] src=arxiv.org pub= topic=computer-vision verified=true sentiment=· neutral

Structure over Pixels: Learning Variable-Length Visual Programs

Researchers have developed STROP, a discrete visual tokenizer that learns to generate variable-length structural scene representations, determining the optimal sequence length for each image in a single forward pass. The system uses a curriculum supervised by frozen DINOv3 features rather than pixel reconstruction, causing program length to scale with scene complexity and compositional structures to emerge in the learned code vocabulary. This approach enables adaptive visual tokenization that captures structural scene descriptions without requiring post-hoc search or pre-trained rate selection.

read1 min publishedMay 28, 2026

arXiv:2605.27696v1 Announce Type: new Abstract: Discrete visual tokenizers translate images into ordered sequences of codes, providing a natural representation for structural description of scenes. Yet existing adaptive tokenizers either require post-hoc search or select among a discrete set of pre-trained rates, rather than learning a continuous per-image sequence length coupled to the model and scene, and they typically train against pixel reconstruction, emphasizing texture rather than structure. We propose STROP, a discrete visual tokenizer architecture that forms structural scene representations and simultaneously learns how long an image's visual program should be. Using a four-phase curriculum supervised by local rate--distortion probes against frozen DINOv3 features, STROP optimizes a dedicated length head that estimates the active prefix length in a single forward pass. By bypassing pixel-level reconstruction gradients, the codebook is shaped entirely by the quality of higher-level latent representations. Program length grows with scene complexity, and signs of compositional structure emerge both in downstream dense-prediction transfer and in direct inspection of the learned code vocabulary.

── more in #computer-vision 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/structure-over-pixel…] indexed:0 read:1min 2026-05-28 ·