04:00
2026-05-26
arxiv.org
computer-vision
Machine Intelligence that Understands Visual and Linguistic Information and Interacts with Humans and Environments
A new dissertation proposes three novel architectures to improve machine intelligence across vision-language tasks, including image captioning, visual dialog, and interactive instruction following. Thβ¦