Your AI Agent Knows Too Much

wpnews.pro

Most AI agent examples make the same mistake. They show a nice prompt, a clean tool call, and then quietly pass raw real data straight through the model.

That works for a demo, but it is a very bad idea in production. A prompt is not just text anymore. It is part of the execution path. If you put real customer data into it (emails, user addresses, their real names), that data can leak through traces, tool calls, or the final answer.

TL;DR The model gets an opaque token, never the real value. A guardrail swaps the token back for the real value just before the tool runs, then scrubs it out of the result. The model only ever holds tokens, and anything that is not a live token is rejected. The whole thing is Microsoft Agent Framework middleware, around 150 lines.

Demo repo:

github.com/bgener/demo-maf-tokenization

A bare integer or a GUID is mostly harmless. Real data is not, and I do not just mean passwords. Think phone numbers, home addresses, someone's location.

Two things go wrong the moment the model holds it. It can leak: repeated in a reply, written to a log, or shown to the wrong user. And models make things up. A confused agent will invent arguments and call your tools with nonsense. If your tools trust whatever the model sends, that nonsense reaches your real systems.

Maybe you use Anthropic directly. Maybe Azure AI Foundry or Amazon Bedrock with good privacy terms. That helps, but it does not remove the problem. Because "Not used for training" is not the same as "never exposed anywhere". The data can still move through provider infrastructure, safety systems, logs, traces, tool calls, prompt history, evaluation runs, or the final answer.

With tokenization, the model never sees the real value. You hand it an opaque token instead, something like tkn_loc_ab12...

. When the model calls a tool, it passes that token back, and you swap it for the real value just before the tool runs. So when the agent invents a token, or fires ten calls with junk arguments, none of it lands. A made-up token matches nothing in the registry, and the call is rejected.

The layer that enforces it around a tool call is a guardrail, and the step that keeps the real value out of the model's reply is output redaction. Put it together and you get a tokenization guardrail with output redaction.

We need somewhere to keep the map between real values and tokens. The TokenRegistry

holds it in memory. It maps each token to its real value, and each real value back to its token, so the same value never mints two tokens.

public sealed class TokenRegistry : ITokenRegistry
{
    readonly ConcurrentDictionary<string, string> _tokenToReal = new();
    readonly ConcurrentDictionary<string, string> _realToToken = new();

    public string RegisterRealValue(string realValue, string prefix)
    {
        ArgumentException.ThrowIfNullOrWhiteSpace(realValue);

        // GetOrAdd so the same real value always maps to the same token.
        return _realToToken.GetOrAdd(realValue, real =>
        {
            var token = $"tkn_{prefix}_{Guid.NewGuid():N}";
            _tokenToReal[token] = real;
            return token;
        });
    }
}

A token is tkn_

plus a GUID. You cannot guess it. The only way to hold a valid one is to get it from a tool.

The guardrail runs before the tool and checks the identifier argument. If it is a live token from the registry, the guardrail swaps in the real value. Anything else is blocked. The only thing it accepts is a valid token. A real value, a made-up token, a raw GUID, a bare number: all rejected.

public static class TokenGuardrail
{
    public static GuardrailVerdict Inspect(string? value, ITokenRegistry registry)
    {
        if (value is not null && value.StartsWith("tkn_", StringComparison.OrdinalIgnoreCase))
        {
            return registry.TryGetRealValue(value, out var real)
                ? GuardrailVerdict.Allow(real)
                : GuardrailVerdict.Block("Rejected token pointer. This token is not in the registry.");
        }

        return GuardrailVerdict.Block("Direct database id access is forbidden. Pass a token instead.");
    }
}

Why an allowlist? A denylist only stops the one format you predicted. If you block values that start with STN_

, an agent that passes ACC_SQL_11111

, a raw GUID, or a bare number sails straight through. The rule above flips it around. The only accepted value is a live token. Everything else is rejected by default. You do not have to guess every bad format, because there is only one good format.

Plugging the guardrail in is where Microsoft Agent Framework (MAF) does the work. It wraps a tool with middleware through DelegatingAIFunction

: you pass the real tool to the base class and override InvokeCoreAsync

. Your code runs before and after the wrapped tool, which is exactly where the guardrail belongs.

public sealed class TokenGuardrailFunction : DelegatingAIFunction
{
    protected override async ValueTask<object?> InvokeCoreAsync(
        AIFunctionArguments arguments, CancellationToken cancellationToken)
    {
        var locationRef = CoerceToString(arguments.TryGetValue(ArgName, out var value) ? value : null);

        var verdict = TokenGuardrail.Inspect(locationRef, _registry);
        if (!verdict.Allowed)
        {
            return JsonSerializer.Serialize(new { blocked = true, reason = verdict.Reason });
        }

        arguments[ArgName] = verdict.RealValue;
        var result = await base.InvokeCoreAsync(arguments, cancellationToken);

        // Egress: swap the real id back to its token so it never reaches the model.
        var json = result.ToString();
        return json.Replace(verdict.RealValue!, locationRef!);
    }
}

The middleware does three things:

First, it checks the locationRef

argument. If the value is not a valid token from the registry, the call is blocked and the real tool is never executed.

Second, if the token is valid, the middleware replaces it with the real value and calls the wrapped tool through base.InvokeCoreAsync

.

Third, it redacts the tool result before returning it to the model. Any real value coming back from the API is replaced with the original token, so the model never sees the raw value on the way in or on the way out.

One gotcha lives here. The agent passes tool arguments as JsonElement

, not string

. A naive value as string

cast returns null and blocks every call, so the guardrail coerces the argument explicitly.

Now wire it together. A ChatClientAgent

from MAF, backed by OpenAI, with two tools. get_weather

calls the real WeatherApi

over HTTP, and the guardrail wraps it. find_location

stays raw, since it only ever returns a token.

AIFunction rawWeather = AIFunctionFactory.Create(
    async (string locationRef) =>
    {
        // By now the guardrail has swapped the token for the real value, so locationRef is real here.
        var forecast = await http.GetStringAsync($"{weatherApiUrl}/weatherforecast");
        return JsonSerializer.Serialize(new
        {
            location = locationRef,
            forecast = JsonSerializer.Deserialize<JsonElement>(forecast)
        });
    },
    name: "get_weather",
    description: "Get the 5 day weather forecast. Pass the locationRef returned by find_location.");

AIFunction findLocation = AIFunctionFactory.Create(
    (string city) =>
    {
        var locationRef = registry.RegisterRealValue(city, "loc");
        logger.LogDebug("find_location({City}) minted {LocationRef}; the model only ever sees this token", city, locationRef);
        return JsonSerializer.Serialize(new { city, locationRef });
    },
    name: "find_location",
    description: "Look up a city by name and get an opaque locationRef that refers to it.");

AIFunctionFactory.Create

turns a plain method into a callable tool. The agent only ever sees guardedWeather

and cannot reach the raw one underneath.

AIFunction guardedWeather = new TokenGuardrailFunction(
    rawWeather, registry, loggerFactory.CreateLogger<TokenGuardrailFunction>());

Both tools go on ChatOptions

, the chat client points at OpenAI, and the session runs.

IChatClient chatClient = new ChatClient(model, apiKey).AsIChatClient();

AIAgent agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
    Name = "WeatherAgent",
    ChatOptions = new ChatOptions
    {
        Instructions =
            "You help users with weather. A location is referenced by an opaque locationRef, never by its raw name. To answer a weather question, first call find_location with the city to get a locationRef, then call get_weather with that locationRef. Never pass a raw city name to get_weather, the system will reject anything that is not a locationRef. Report the forecast in plain language.",
        Tools = [findLocation, guardedWeather]
    }
});

AgentSession session = await agent.CreateSessionAsync();
AgentResponse response = await agent.RunAsync(input, session);
Console.WriteLine($"agent > {response.Text}\n");

The agent starts with only the user's request, "weather in Barcelona?". A forecast needs a locationRef

, and the single source of those is find_location

. So the agent calls find_location("Barcelona")

. The tool registers the value and returns the token alone. That is the only legitimate way a token reaches the model.

Now the agent holds a token and calls get_weather

with it. The guardrail swaps in the real value, calls the API, scrubs the result, and returns a forecast where the location is still a token. The real value lives for a few milliseconds inside the tool and never reaches the model.

One tool mints tokens, the other spends them, and the real value never crosses back to the model in either direction.

Now picture the agent going off script. It ignores the instructions and calls get_weather("Barcelona")

with the raw name. No tkn_

prefix, so the guardrail blocks it. Then it invents tkn_loc_madeup

. Not in the registry, blocked again. The only call that gets through carries a real token, and the only source of a real token is find_location

.

The temperatures jump around because the stock API returns random data. That is the WeatherForecast dummy service, untouched.

This is not full AI security. It does not solve prompt injection, authorization, tenant isolation, audit, rate limits, or business validation. But it does solve one very common mistake: passing real production values into prompts and hoping instructions will protect them. Do not give the model real values. It never truly needs them.

One caveat on the redaction step. The string.Replace

above assumes the tool result arrives whole. If you stream output instead, a real value split across chunks (bu

, then yer

, then 1

) can slip past a per-chunk replace, because no single chunk holds the full value. A streaming setup needs a buffer that spans chunk boundaries, or better, a tool that never puts the real value in its result at all.

The pattern, start to finish:

You never modified your existing API, and you never trusted the model with anything real. A confused agent inventing arguments cannot hurt you, because its made-up values map to nothing. None of this was a big rewrite, just a thin layer wrapped around what you already run.

source & further reading

dev.to — original article How I Built a Carbon Footprint Tracker with Django + NVIDIA NIM Why Prompt Injection Won't Be "Fixed" Stratagems #1: Mark Johnson Walked Into an AI Audit. The Benchmark Had Everything Figured Out — Except the Truth.

Your AI Agent Knows Too Much

Run your AI side-project on zahid.host