Your AI Agent Knows Too Much

A developer built a tokenization guardrail for AI agents that prevents sensitive data from reaching large language models. The system replaces real customer data with opaque tokens, swaps them back only during tool execution, and rejects any non-token arguments. The approach, implemented as Microsoft Agent Framework middleware in about 150 lines, addresses data leakage risks from model traces, logs, and hallucinated tool calls.

Most AI agent examples make the same mistake. They show a nice prompt, a clean tool call, and then quietly pass raw real data straight through the model. That works for a demo, but it is a very bad idea in production. A prompt is not just text anymore. It is part of the execution path. If you put real customer data into it emails, user addresses, their real names , that data can leak through traces, tool calls, or the final answer. TL;DR The model gets an opaque token, never the real value. A guardrail swaps the token back for the real value just before the tool runs, then scrubs it out of the result. The model only ever holds tokens, and anything that is not a live token is rejected. The whole thing is Microsoft Agent Framework middleware, around 150 lines. Demo repo: github.com/bgener/demo-maf-tokenization https://github.com/bgener/demo-maf-tokenization A bare integer or a GUID is mostly harmless. Real data is not, and I do not just mean passwords. Think phone numbers, home addresses, someone's location. Two things go wrong the moment the model holds it. It can leak: repeated in a reply, written to a log, or shown to the wrong user. And models make things up. A confused agent will invent arguments and call your tools with nonsense. If your tools trust whatever the model sends, that nonsense reaches your real systems. Maybe you use Anthropic directly. Maybe Azure AI Foundry or Amazon Bedrock with good privacy terms. That helps, but it does not remove the problem. Because "Not used for training" is not the same as "never exposed anywhere". The data can still move through provider infrastructure, safety systems, logs, traces, tool calls, prompt history, evaluation runs, or the final answer. With tokenization, the model never sees the real value. You hand it an opaque token instead, something like tkn loc ab12... . When the model calls a tool, it passes that token back, and you swap it for the real value just before the tool runs. So when the agent invents a token, or fires ten calls with junk arguments, none of it lands. A made-up token matches nothing in the registry, and the call is rejected. The layer that enforces it around a tool call is a guardrail, and the step that keeps the real value out of the model's reply is output redaction. Put it together and you get a tokenization guardrail with output redaction. We need somewhere to keep the map between real values and tokens. The TokenRegistry holds it in memory. It maps each token to its real value, and each real value back to its token, so the same value never mints two tokens. public sealed class TokenRegistry : ITokenRegistry { readonly ConcurrentDictionary<string, string tokenToReal = new ; readonly ConcurrentDictionary<string, string realToToken = new ; public string RegisterRealValue string realValue, string prefix { ArgumentException.ThrowIfNullOrWhiteSpace realValue ; // GetOrAdd so the same real value always maps to the same token. return realToToken.GetOrAdd realValue, real = { var token = $"tkn {prefix} {Guid.NewGuid :N}"; tokenToReal token = real; return token; } ; } } A token is tkn plus a GUID. You cannot guess it. The only way to hold a valid one is to get it from a tool. The guardrail runs before the tool and checks the identifier argument. If it is a live token from the registry, the guardrail swaps in the real value. Anything else is blocked. The only thing it accepts is a valid token. A real value, a made-up token, a raw GUID, a bare number: all rejected. public static class TokenGuardrail { public static GuardrailVerdict Inspect string? value, ITokenRegistry registry { if value is not null && value.StartsWith "tkn ", StringComparison.OrdinalIgnoreCase { return registry.TryGetRealValue value, out var real ? GuardrailVerdict.Allow real : GuardrailVerdict.Block "Rejected token pointer. This token is not in the registry." ; } return GuardrailVerdict.Block "Direct database id access is forbidden. Pass a token instead." ; } } Why an allowlist? A denylist only stops the one format you predicted. If you block values that start with STN , an agent that passes ACC SQL 11111 , a raw GUID, or a bare number sails straight through. The rule above flips it around. The only accepted value is a live token. Everything else is rejected by default. You do not have to guess every bad format, because there is only one good format. Plugging the guardrail in is where Microsoft Agent Framework MAF does the work. It wraps a tool with middleware through DelegatingAIFunction : you pass the real tool to the base class and override InvokeCoreAsync . Your code runs before and after the wrapped tool, which is exactly where the guardrail belongs. public sealed class TokenGuardrailFunction : DelegatingAIFunction { protected override async ValueTask<object? InvokeCoreAsync AIFunctionArguments arguments, CancellationToken cancellationToken { var locationRef = CoerceToString arguments.TryGetValue ArgName, out var value ? value : null ; var verdict = TokenGuardrail.Inspect locationRef, registry ; if verdict.Allowed { return JsonSerializer.Serialize new { blocked = true, reason = verdict.Reason } ; } arguments ArgName = verdict.RealValue; var result = await base.InvokeCoreAsync arguments, cancellationToken ; // Egress: swap the real id back to its token so it never reaches the model. var json = result.ToString ; return json.Replace verdict.RealValue , locationRef ; } } The middleware does three things: First, it checks the locationRef argument. If the value is not a valid token from the registry, the call is blocked and the real tool is never executed. Second, if the token is valid, the middleware replaces it with the real value and calls the wrapped tool through base.InvokeCoreAsync . Third, it redacts the tool result before returning it to the model. Any real value coming back from the API is replaced with the original token, so the model never sees the raw value on the way in or on the way out. One gotcha lives here. The agent passes tool arguments as JsonElement , not string . A naive value as string cast returns null and blocks every call, so the guardrail coerces the argument explicitly. Now wire it together. A ChatClientAgent from MAF, backed by OpenAI, with two tools. get weather calls the real WeatherApi over HTTP, and the guardrail wraps it. find location stays raw, since it only ever returns a token. AIFunction rawWeather = AIFunctionFactory.Create async string locationRef = { // By now the guardrail has swapped the token for the real value, so locationRef is real here. var forecast = await http.GetStringAsync $"{weatherApiUrl}/weatherforecast" ; return JsonSerializer.Serialize new { location = locationRef, forecast = JsonSerializer.Deserialize<JsonElement forecast } ; }, name: "get weather", description: "Get the 5 day weather forecast. Pass the locationRef returned by find location." ; AIFunction findLocation = AIFunctionFactory.Create string city = { var locationRef = registry.RegisterRealValue city, "loc" ; logger.LogDebug "find location {City} minted {LocationRef}; the model only ever sees this token", city, locationRef ; return JsonSerializer.Serialize new { city, locationRef } ; }, name: "find location", description: "Look up a city by name and get an opaque locationRef that refers to it." ; AIFunctionFactory.Create turns a plain method into a callable tool. The agent only ever sees guardedWeather and cannot reach the raw one underneath. AIFunction guardedWeather = new TokenGuardrailFunction rawWeather, registry, loggerFactory.CreateLogger<TokenGuardrailFunction ; Both tools go on ChatOptions , the chat client points at OpenAI, and the session runs. IChatClient chatClient = new ChatClient model, apiKey .AsIChatClient ; AIAgent agent = new ChatClientAgent chatClient, new ChatClientAgentOptions { Name = "WeatherAgent", ChatOptions = new ChatOptions { Instructions = "You help users with weather. A location is referenced by an opaque locationRef, never by its raw name. To answer a weather question, first call find location with the city to get a locationRef, then call get weather with that locationRef. Never pass a raw city name to get weather, the system will reject anything that is not a locationRef. Report the forecast in plain language.", Tools = findLocation, guardedWeather } } ; AgentSession session = await agent.CreateSessionAsync ; AgentResponse response = await agent.RunAsync input, session ; Console.WriteLine $"agent {response.Text}\n" ; The agent starts with only the user's request, "weather in Barcelona?". A forecast needs a locationRef , and the single source of those is find location . So the agent calls find location "Barcelona" . The tool registers the value and returns the token alone. That is the only legitimate way a token reaches the model. Now the agent holds a token and calls get weather with it. The guardrail swaps in the real value, calls the API, scrubs the result, and returns a forecast where the location is still a token. The real value lives for a few milliseconds inside the tool and never reaches the model. One tool mints tokens, the other spends them, and the real value never crosses back to the model in either direction. Now picture the agent going off script. It ignores the instructions and calls get weather "Barcelona" with the raw name. No tkn prefix, so the guardrail blocks it. Then it invents tkn loc madeup . Not in the registry, blocked again. The only call that gets through carries a real token, and the only source of a real token is find location . The temperatures jump around because the stock API returns random data. That is the WeatherForecast dummy service, untouched. This is not full AI security. It does not solve prompt injection, authorization, tenant isolation, audit, rate limits, or business validation. But it does solve one very common mistake: passing real production values into prompts and hoping instructions will protect them. Do not give the model real values. It never truly needs them. One caveat on the redaction step. The string.Replace above assumes the tool result arrives whole. If you stream output instead, a real value split across chunks bu , then yer , then 1 can slip past a per-chunk replace, because no single chunk holds the full value. A streaming setup needs a buffer that spans chunk boundaries, or better, a tool that never puts the real value in its result at all. The pattern, start to finish: You never modified your existing API, and you never trusted the model with anything real. A confused agent inventing arguments cannot hurt you, because its made-up values map to nothing. None of this was a big rewrite, just a thin layer wrapped around what you already run.