cd /news/artificial-intelligence/an-ai-agent-for-treatment-reasoning-… · home topics artificial-intelligence article
[ARTICLE · art-44356] src=arxiv.org ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

An AI agent for treatment reasoning over a biomedical tool universe

Researchers introduced ATHENA-R1, an AI agent trained via reinforcement learning over 212 biomedical tools to perform treatment reasoning across all FDA-approved drugs since 1939. The agent achieved 94.7% accuracy on drug reasoning and 82.9% on treatment reasoning, outperforming GPT-5 by 17.8 and 10.7 points respectively, and was preferred by experts from 28 rare disease organizations. Its adverse-event hypotheses, tested in electronic health records from 5.4 million patients, yielded adjusted odds ratios of 1.48-1.84 with no elevation among negative controls.

read1 min views1 publishedJun 30, 2026

arXiv:2606.28692v1 Announce Type: new Abstract: Treatment reasoning underpins every therapeutic decision, integrating disease context, comorbidities, medications, contraindications, and evolving biomedical knowledge to select an appropriate therapy. It is inherently iterative: candidates are weighed against many constraints, revised as evidence emerges, and grounded in verifiable sources. Here we introduce ATHENA-R1, an AI agent for treatment reasoning across all FDA approved drugs since 1939, trained by reinforcement learning over a universe of 212 biomedical tools. At each step it identifies missing information, selects and runs relevant tools, and incorporates the evidence. To train it without human-annotated traces, we build a two-level self-learning framework: multi-agent systems construct the tools, tasks, and reasoning trajectories for supervised fine-tuning, then reinforcement learning with scientific feedback rewards reasoning quality (evidence gathering, grounded tool use, logical non-redundancy). Across five benchmarks of 3,168 drug reasoning tasks and 456 patient treatment cases, ATHENA-R1 outperforms language models and tool-use systems, reaching 94.7% accuracy on open-ended drug reasoning and 82.9% on treatment reasoning, 17.8 and 10.7 points above GPT-5. In blinded evaluations by experts from 28 rare disease organizations, it is preferred over reference models on all criteria, and physicians rated it favorably on complex hospitalized cardiovascular and infectious-disease cases. Adverse-event hypotheses it generated, tested in electronic health records from 5.4 million patients, reached adjusted odds ratios of 1.48-1.84, with no elevation among negative controls. Because it requires knowing what evidence to seek before concluding, treatment reasoning has long been hard for AI; we show it can be reframed as a learnable process of iterative evidence gathering that reinforcement learning can train AI to perform.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @athena-r1 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/an-ai-agent-for-trea…] indexed:0 read:1min 2026-06-30 ·