Meta-learning as a principle for human-like visual representations

Researchers at arXiv propose that meta-learning, or learning to learn, may explain why human visual representations are more flexible than those of pretrained neural networks. By training a sequence model on thousands of tasks without human data, they found meta-learned representations better predict human similarity judgments, semantic rule learning, and high-level visual cortex activity. The findings suggest human visual flexibility arises from the need to learn new semantic relationships quickly.

arXiv:2606.28399v1 Announce Type: new Abstract: The structure of human visual representations underpins our capacity for adaptive behaviour. While pretrained neural networks model human visual representations with unprecedented success, a large discrepancy remains. We propose one reason: these networks optimise a single fixed objective, whereas human representations must support open-ended tasks. We hypothesise this flexibility arises from meta-learning learning to learn , a pressure shaping representations to acquire new tasks from few observations. To test this, we train a sequence model, without any supervision from human data, across thousands of semantically rich tasks mapping images to high-level concepts. Compared to their pretrained base encoders, meta-learned representations better predict human similarity judgements, semantic rule learning, and high-level visual cortex. Behavioural gains depend on disentangled, high-level task distributions, while brain alignment is driven primarily by the learning-to-learn pressure. Our results suggest the flexibility of human visual representations reflects the functional demand to learn new semantic relationships on the fly.