ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents

wpnews.pro

cd /news/artificial-intelligence/toolgate-token-efficient-pre-call-co… · home › topics › artificial-intelligence › article

[ARTICLE · art-19905] src=arxiv.org pub=2026-06-03T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents

Researchers introduced ToolGate, a lightweight external controller that predicts whether to execute or skip tool calls proposed by vision-language agents, addressing the pre-call control problem where many tool executions are costly and unnecessary. Across five benchmarks, ToolGate reduced token costs to 64-69% of the unrestricted ReAct baseline while preserving average accuracy, and improved accuracy by 1.65 points with matched-domain training on Qwen3-VL-30B. The findings demonstrate that tool-augmented agents benefit from explicit control over when tool outputs are worth executing, not just from better perceptual tools.

read1 min publishedJun 3, 2026

arXiv:2606.03054v1 Announce Type: new Abstract: Tool-augmented vision-language agents can acquire external perceptual evidence through OCR, detection, segmentation, and other tools, but executing every proposed tool call is costly and sometimes unnecessary. We study the pre-call control problem: after a ReAct-style VLM agent proposes a perceptual tool call, should the call be executed, or skipped before its output enters the context? Across five benchmarks, we find that the baseline agent exhibits poor local selectivity: helpful and harmful calls occur at similar rates (11.8% vs. 9.9%), while most calls do not change the immediate forced-answer prediction. We introduce ToolGate, a lightweight external controller that predicts execute/skip decisions from trajectory text and simple structural features. Across two Qwen3-VL backbones, ToolGate reduces token cost to 64-69% of the unrestricted ReAct baseline while preserving average accuracy in cross-domain settings. With matched-domain trajectory training on Qwen3-VL-30B, it further improves average accuracy by 1.65 points. These results show that tool-augmented VLM agents benefit not only from better perceptual tools, but also from explicit control over when tool outputs are worth paying for.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/toolgate-token-efficient…

Read original on arxiv.org → arxiv.org/abs/2606.03054

mentioned entities

ToolGate

Qwen3-VL

ReAct

VLM

arXiv

metadata

slugtoolgate-token-efficient-pre-call-control-for-tool-augmented-vision-language

topic#artificial-intelligence

secondary4 topics

sentimentpositive

langen

canonicalarxiv.org

navigation

← prevAI Agent Deployment Architecture…

next →Achei interessante, talvez você …

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 3 Jun · #artificial-intelligence

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

arxiv.org · 3 Jun · #artificial-intelligence

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

arxiv.org · 3 Jun · #artificial-intelligence

Toward a Modular Architecture for Embedded AI Agent Systems at the Edge

arxiv.org · 3 Jun · #artificial-intelligence

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required