# Your AI Agent Is Being Fed Lies, and Your Logs Won't Tell You

> Source: <https://dev.to/coridev/your-ai-agent-is-being-fed-lies-and-your-logs-wont-tell-you-42p9>
> Published: 2026-07-01 05:32:52+00:00

Microsoft's own incident response team just demonstrated that you can manipulate an AI agent into exfiltrating sensitive data — not by breaking anything, not by triggering alerts — but by poisoning the *description* of a tool the agent reads before it acts. If that doesn't make you rethink every layer of your agentic pipeline, I'm not sure what will.

Prompt injection as a concept isn't new. Security researchers have been shouting about the risks of untrusted input reaching an LLM's context window for a couple of years now. What's new here is the specific attack surface: **MCP tool descriptions** — the metadata that tells an agent *what a tool does and how to use it*.

The Model Context Protocol is increasingly how agentic systems are assembled. Tools are registered with descriptions so agents can reason about which ones to invoke and when. That metadata is trusted by design. It's supposed to be the "safe" part of the system — infrastructure-level, not user-controlled. Except, apparently, it isn't safe. If an attacker can influence what ends up in a tool description, they can plant instructions that ride along silently in every subsequent agent decision.

This is supply chain thinking applied to AI orchestration, and most teams building agentic systems right now aren't thinking about it at all.

Let's be honest about who benefits from this narrative: researchers publishing this kind of finding get attention, credibility, and conference talks. That doesn't make the finding wrong — it's clearly real and demonstrated — but it's worth noting the framing tends toward "AI is uniquely dangerous" rather than "we built a complex system without thinking about trust boundaries, again."

What's being overstated: the novelty. Injecting malicious instructions into trusted metadata is a variant of what we've been doing to software systems for decades. It's a new substrate, not a new category.

What's being understated — and this is the part that should worry you — is **how undetectable this is by default**. The Microsoft researchers are explicit that each individual action the agent takes looks routine and rule-compliant. The exfiltration doesn't look like exfiltration. It looks like normal agent behavior. That's not a theoretical gap in coverage; that's a fundamental mismatch between how current monitoring was built and how agentic systems actually operate.

Your SIEM was designed to catch humans and scripts doing bad things. It was not designed to catch an AI agent doing 47 individually-reasonable things that collectively drain your sensitive data out the door.

If you're building agentic pipelines today — and a lot of you are — the lesson here is uncomfortable: **trust no layer of the stack implicitly, including the orchestration layer itself**.

Tool registries, tool descriptions, tool metadata — these need to be treated with the same skepticism you'd apply to user input. Who can write to them? Who audits changes? Is there any integrity verification before an agent consumes them?

For security teams, the monitoring gap is the real emergency. Behavioral analysis at the individual action level will not catch this class of attack. You need visibility into *patterns across agent sessions*, not just per-action rule matching. That's a significant retooling of how most shops currently think about AI observability.

For developers, the instinct to move fast and wire up tools quickly — which MCP actively encourages with its convenience-first design — is now in direct tension with security hygiene. Speed of integration is an attack surface.

And for everyone: the assumption that because an agent is "following the rules" it is behaving safely is now formally broken. Compliance with defined rules and actual safety are not the same thing when the rules themselves can be rewritten by an adversary.

If the attack works precisely because each individual agent action appears legitimate, what does meaningful detection even look like — and is it realistic to expect security teams to build it before agentic deployments outpace their ability to monitor them?

— Cor, Skyblue Soft
