{"slug": "lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool", "title": "LAI #131: A Tool Call Can Succeed and Still Be the Wrong Tool", "summary": "Microsoft released seven in-house AI models and a 100-page report detailing their refusal to use synthetic data and removal of AI-generated content before training. The report challenges other labs to prove similar transparency. Meanwhile, AI engineers are warned that a successful tool call by an agent does not guarantee it was the correct tool, highlighting a common debugging blind spot.", "body_md": "Good morning, AI enthusiasts!\n\nMicrosoft just released seven in-house AI models, but the interesting part isn’t the models, it’s the 100-page report. They refused to use synthetic data, actively hunted down AI-generated content before training, and essentially dared every other lab to prove they did the same. This week, we walk through the full release and what AI engineers can take from it. We also cover a debugging blind spot most teams miss: your agent’s tool call can succeed and still be the wrong tool entirely.\n\nWe also cover:\n\nLet’s get into it!\n\nEarlier this month, Microsoft AI announced seven MAI models built in-house across reasoning, coding, image, transcription, and voice. But what makes this release worth your attention is not the benchmark performance alone, but the report, which is more transparent than anything I’ve read from a major lab this year. Microsoft refused to use synthetic data to train their model, then actively hunted down AI-generated content and removed it before training. And they wrote a 100-page report daring every other lab to prove they did the same. This week, in What’s AI, I will walk through the entire model release, the training process, the RL steps, and share the recipe AI engineers can steal from this. [Read the full article here](https://www.louisbouchard.ai/mai-thinking/) or [watch the video on YouTube](https://youtu.be/Sl5O7KVVF6M).\n\nWhen an agent chooses a tool, don’t assume it understood the task.\n\nSometimes it only matches a word in the user’s request to a word in the tool name. For example, if the user asks for the “latest report,” the agent might still call the search tool even though the report has already been uploaded. Or it might call a database tool when the answer was already in the prompt.\n\nTo debug this, log three things side by side: the user’s request, the tool the agent picked, and the arguments it used. Then check whether the tool matched what the user actually wanted. Don’t only look for tool errors. A tool can run successfully and still be the wrong tool.\n\nIf you’re exploring agent engineering and want to go deeper into tool use and guardrails, our [Agent Engineering: Building Multi-Agent Systems](https://academy.towardsai.net/courses/agent-engineering?utm_source=Newsletter&utm_medium=email&utm_id=AItips) course is the cleanest path to building production agents.\n\n*— Louis-François Bouchard, Towards AI Co-founder & Head of Community*\n\n[Matteoturri_50413](https://discord.com/channels/702624558536065165/983037843532308500/1517661981048311890) built an open-source tool to reduce token waste when using AI on large codebases. FolioDux is a lightweight file-mapping standard that lets any AI tool navigate large codebases without reading every file. The AI reads the index first, picks only the relevant files, and ignores everything else. [Check it out on GitHub](https://github.com/matteo-turri/foliodux) and support a fellow community member. If you have any questions or feedback, [share them in the thread](https://discord.com/channels/702624558536065165/983037843532308500/1517661981048311890)!\n\nNo single option ran away with it; the votes are spread across the board, which tells us something in itself. The pattern from last week holds: career outcomes still rank highest when you add up jobs, interview prep, and getting unstuck on projects. But this time, staying current on tools and workflows tied for first, which suggests that for a lot of you, the value isn’t just “help me get a job”; it’s “help me stay relevant once I have one.”\n\nIf we launched this tomorrow, what’s the one thing that would make you cancel after month one if it wasn’t there? [Let us know in the thread](https://discord.com/channels/702624558536065165/833660976196354079/1518727592008614080)!\n\nThe Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, [join the collaboration channel](https://discord.gg/rj6m9AF7eC)! Keep an eye on this section, too — we share cool opportunities every week!\n\n1. [Shaurya09272004](https://discord.com/channels/702624558536065165/784477688551178240/1517562785892929537) is building an agentic tool that explains Indian laws in plain language and is looking for people working on similar projects to share ideas, discuss debugging journeys, and go deeper into LLMs. If you are building something similar, [connect with them in the thread](https://discord.com/channels/702624558536065165/784477688551178240/1517562785892929537)!\n\n2. [Strikerleapgaming](https://discord.com/channels/702624558536065165/998978160605540454/1519045161600946327) is building their own startup and is looking for beta testers. If this interests you, [reach out to them in the thread](https://discord.com/channels/702624558536065165/998978160605540454/1519045161600946327)!\n\n3. [Vishacoplayz_27974](https://discord.com/channels/702624558536065165/998978160605540454/1516553865552592978) is recruiting founding board members for AIXelerate, a student-led AI nonprofit. If you’re a high school student interested in leadership, AI, marketing, operations, outreach, or event planning, [contact them in the thread](https://discord.com/channels/702624558536065165/998978160605540454/1516553865552592978)!\n\nMeme shared by [default_user2004](https://discord.com/channels/702624558536065165/830572933197201459/1517356376656187403)\n\n[The Prompt Cache Is Not Enough: Building a Full LLM Cost Optimization Strategy](https://pub.towardsai.net/the-prompt-cache-is-not-enough-building-a-full-llm-cost-optimization-strategy-a9c1992a0d7c?sk=846013555cf87dff30560169e61d869e) By[ Rizwanhoda](https://rizwanhoda.medium.com/?source=post_page---byline--a9c1992a0d7c---------------------------------------)\n\nPrompt caching covers the easy 30% of LLM cost savings, but most teams stop there while their bills quietly climb back up. The author lays out a seven-layer optimization funnel that addresses every cost driver: semantic caching to skip redundant LLM calls entirely; model routing to stop sending simple tasks to expensive models; prompt compression via LLMLingua; batching for async workloads at half the price; and hard output constraints to cut verbose token spend. Teams implementing three or more layers consistently reached 60–80% total cost reduction.\n\n1. [The Flow of Attention](https://pub.towardsai.net/the-flow-of-attention-1795b1d6aaf9) By[ GSO1](https://gsokimoto1.medium.com/?source=post_page---byline--1795b1d6aaf9---------------------------------------)\n\nIf transformer attention is reframed as a physics problem, token embeddings enter a language model as a cloud of points in high-dimensional space, and each layer redistributes them via a coupled transport step governed by two learned operators. The piece traces how this additive update rule defines a flow in Wasserstein space, connects it to an idealized gradient flow result from Geshkovski et al., and explains why the actual transformer falls short of that theorem without abandoning its structure. Clustering across depth, the prediction readout, and positional encoding all follow cleanly from this geometric picture.\n\n2. [Continuous Batching: How to Keep Your GPU Actually Busy](https://pub.towardsai.net/continuous-batching-how-to-keep-your-gpu-actually-busy-ffcddebd9ecb) By[ Vedanti](https://medium.com/@vedanti220201?source=post_page---byline--ffcddebd9ecb---------------------------------------)\n\nStatic batching wastes GPU capacity because the slowest request forces every completed slot to sit idle until the entire batch finishes. Continuous batching fixed this by re-evaluating the batch at every forward pass and immediately slotting in new requests as soon as a slot becomes available. When combined with PagedAttention for dynamic KV cache memory management, the approach raises GPU utilization from 20–30% to nearly 100%. Every major inference framework, including vLLM, SGLang, and TensorRT-LLM, adopted it.\n\n3. [LangGraph Memory: The Complete Practical Guide to Managing What Your Agent Remembers](https://pub.towardsai.net/langgraph-memory-the-complete-practical-guide-to-managing-what-your-agent-remembers-d865d505a59e) By[ Bessie Delight Kekeli](https://medium.com/@bessiedelight?source=post_page---byline--d865d505a59e---------------------------------------)\n\nLangGraph agents face two silent production killers: conversations stored only in RAM vanish on every restart, and context windows grow expensive as message history accumulates. This article covers three memory management strategies, from simple message filtering and token-based trimming to LLM-powered rolling summarization, the most robust option for long-running agents. It also walks through swapping MemorySaver for SqliteSaver or PostgresSaver with a single line change.\n\n4. [Practical Breakdown of the Value of the Semantic Layer for AI Agents: Results of A/B Testing](https://pub.towardsai.net/practical-breakdown-of-the-value-of-the-semantic-layer-for-ai-agents-results-of-a-b-testing-5a2c6b130e22) By[ Sergey Gromov](https://medium.com/@grom_65116?source=post_page---byline--5a2c6b130e22---------------------------------------)\n\nWithout a semantic layer, models consistently make context errors, such as counting refunds as revenue, interpreting “Receipts” as a monetary sum, and miscalculating UPT by almost half. The semantic layer provides metric definitions, join logic, and business rules up front, eliminating the model’s need to infer corporate conventions. The experiment reframes semantic modeling as foundational infrastructure for enterprise AI agents, not legacy BI tooling.\n\nIf you are interested in publishing with Towards AI, [check our guidelines and sign up](https://contribute.towardsai.net/). We will publish your work to our network if it meets our editorial policies and standards.\n\n[LAI #131: A Tool Call Can Succeed and Still Be the Wrong Tool](https://pub.towardsai.net/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool-bf18827f5873) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.", "url": "https://wpnews.pro/news/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool", "canonical_source": "https://pub.towardsai.net/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool-bf18827f5873?source=rss----98111c9905da---4", "published_at": "2026-06-25 15:01:02+00:00", "updated_at": "2026-06-25 15:21:08.107884+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-research", "ai-agents", "ai-products", "ai-ethics"], "entities": ["Microsoft", "MAI", "Louis-François Bouchard", "Towards AI", "FolioDux", "Matteo Turri"], "alternates": {"html": "https://wpnews.pro/news/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool", "markdown": "https://wpnews.pro/news/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool.md", "text": "https://wpnews.pro/news/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool.txt", "jsonld": "https://wpnews.pro/news/lai-131-a-tool-call-can-succeed-and-still-be-the-wrong-tool.jsonld"}}