Perceive, Interact, Reason: Building Tool-Augmented Visual Agents for Spatial Reasoning

wpnews.pro

cd /news/artificial-intelligence/perceive-interact-reason-building-to… · home › topics › artificial-intelligence › article

[ARTICLE · art-24784] src=arxiv.org ↗ pub=2026-06-12T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

Perceive, Interact, Reason: Building Tool-Augmented Visual Agents for Spatial Reasoning

Researchers introduced PERIA, a tool-augmented visual agent designed to improve spatial reasoning in vision-language models by actively acquiring evidence and performing multi-step visual interactions. The agent uses two lightweight tool families for perception and interaction, and was trained with a unified recipe combining supervised tool-use trajectory synthesis and composite rewards. On 13 benchmarks, PERIA-8B outperformed its Qwen3-8B backbone by 10% on in-distribution tasks and achieved performance comparable to much larger models like GPT-5, demonstrating significant gains in spatial reasoning capabilities.

read1 min publishedJun 12, 2026

arXiv:2606.12830v1 Announce Type: new Abstract: While recent vision-language models (VLMs) demonstrate strong multimodal understanding, they remain limited in spatial reasoning tasks that require active evidence acquisition and multi-step visual interaction. This limitation suggests that relying solely on implicit visual representations from vision encoders is insufficient for recovering fine-grained spatial evidence. We introduce PERception-Interaction-reason Agent (PERIA), a tool-augmented visual agent for spatial reasoning tasks across map reasoning, visual probing, and vision reconstruction. PERIA uses two lightweight tool families: vision perception tools for exposing textual, symbolic, and spatial evidence, and vision interaction tools for manipulating visual context, tracing paths, and verifying spatial relations. To train PERIA, we develop a unified recipe that combines supervised tool-use trajectory synthesis, composite rewards, and Observation-Relaxed Group-in-Group Policy Optimization (OR-GIGPO) for effective multi-tool behavior. Experiments on 13 benchmarks from 8 datasets show that PERIA-8B improves over the Qwen3-8B backbone by 10.0% on in-distribution benchmarks and 4.4% on out-of-distribution benchmarks, while outperforming previous state-of-the-art baselines of similar size by 7.0%-14.8%. It also achieves performance comparable to much larger models such as Qwen3-VL-235B-A22B-Thinking and GPT-5, demonstrating the effectiveness of PERIA in enhancing spatial reasoning capabilities.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/perceive-interact-reason…

Read original on arxiv.org → arxiv.org/abs/2606.12830

mentioned entities

PERIA

Qwen3-8B

Qwen3-VL-235B-A22B-Thinking

GPT-5

metadata

slugperceive-interact-reason-building-tool-augmented-visual-agents-for-spatial

topic#artificial-intelligence

secondary4 topics

sentimentpositive

langen

canonicalarxiv.org

navigation

← prevLinear Coding Sessions

next →Can KKR Outmaneuver One of the B…

── more in #artificial-intelligence 4 stories · sorted by recency

code.visualstudio.com · 17 Jun · #artificial-intelligence

Visual Studio Code 1.125

mimo.xiaomi.com · 13 Jun · #artificial-intelligence

MiMo Code: Scaling coding agents to long-horizon tasks

news.ycombinator.com · 13 Jun · #artificial-intelligence

Show HN: I am running 3 coding agents non-stop over the last 3 days. Here is how

dev.to · 13 Jun · #artificial-intelligence

Coding Agents over Telegram, Part 3: The Day-to-Day Operating Contract

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required