04:00
2026-05-28
arxiv.org
computer-vision
Structure over Pixels: Learning Variable-Length Visual Programs
Researchers have developed STROP, a discrete visual tokenizer that learns to generate variable-length structural scene representations, determining the optimal sequence length for each image in a singβ¦