cd /news/ai-safety/gate-deterministic-pii-redaction-for… · home topics ai-safety article
[ARTICLE · art-21277] src=github.com pub= topic=ai-safety verified=true sentiment=↑ positive

Gate – deterministic PII redaction for AI agent tool output (Rust)

A new open-source Rust tool called Gate intercepts AI agent query results to redact personally identifiable information (PII) before it reaches large language model contexts. Unlike LLM-based redaction systems that send data to models for classification, Gate uses deterministic regex and column heuristics with under 10 milliseconds of overhead per query, ensuring reproducible and auditable privacy boundaries. The tool covers both Bash commands and MCP server calls without requiring changes to existing agent workflows, though it cannot catch PII in unstructured free-text prose.

read13 min publishedJun 4, 2026

A deterministic privacy boundary between your data and AI.Intercepts query results before the model sees them — rule-driven, reproducible, and audit-ready.

English | 简体中文

AI agents increasingly access internal databases and APIs through CLI tools, scripts, and MCP servers. Without safeguards, sensitive data such as emails, phone numbers, tax identifiers, and payment details can be unintentionally exposed to LLM context windows.

gate

intercepts query results before they reach the model and automatically redacts detected PII fields without requiring changes to existing agent workflows or prompts. It covers both access paths agents use: Bash commands (via a harness hook) and MCP server calls (via a wrap-style stdio proxy), adding < 10 ms of overhead per query.

Most PII guardrails for AI agents are themselves LLMs — they send your data to a model to decide whether it's sensitive. Gate takes the opposite approach.

gate LLM-based redaction
Decision method Regex + column heuristics + Luhn Model inference
Deterministic ✅ Same input always produces the same output ❌ Varies by run and model version
Data stays local ✅ Never leaves your machine ❌ Sent to a model API for classification
Latency ✅ < 10ms overhead ❌ Adds an API round-trip
Auditable ✅ Every decision traceable to an explicit rule ❌ Model reasoning is opaque
Known gaps ✅ Documented — free-text prose ❌ False-negative rate unknown

The trade-off gate makes: rules can't catch PII in unstructured free-text prose. The threat model documents what gate doesn't cover.

Database-level masking is the right answer when you control the source. Gate fills the gap when you don't, and covers the paths masking can't reach.

gate Database masking
Requires DB admin access ✅ No changes to the database ❌ Needs column-level config by a DBA
Works on vendor / external DBs ✅ Wraps any JSON-returning tool ❌ Only databases you administer
Covers MCP and API tools ✅ Any tools/call response
❌ No masking concept at this layer
Production data freshness ✅ Works against live data ❌ Static copies drift; DDM may lag
Agent bypass resistance ✅ Direct value exposure blocked in harness hook ❌ Aggregate functions and CASE expressions can bypass DDM
Known gaps ✅ Documented ❌ DDM gaps are often silent

They're complementary: if you have DDM configured, gate is the safety net for the paths and patterns DDM misses.

The demo walks through three steps:

gate scan

detecting PII columns across the schema before any query runs- An agent querying the transactions table with gate disabled — card_number

fully visible - The same queries with gate enabled — card_number

redacted across both MCP and Bash paths

Also works with OpenCode, Cursor, GitHub Copilot CLI, Codex CLI, and Gemini CLI — see Supported AI Tools for the full compatibility matrix.

For the design rationale, threat-model walkthrough, and detection-pipeline deep dive, read

[.]Introducing gate

Before installing the hook, use gate scan

to assess how much PII your schema exposes. Pipe a TABLE_NAME, COLUMN_NAME

query into it and gate prints a risk report across every table. No config is required for gate scan

itself — if you haven't created one yet, run gate config --init-only

first.

psql -U <user> -h <host> -d <dbname> -c "SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'public' ORDER BY TABLE_NAME, ORDINAL_POSITION" | gate scan

See docs/scan.md for queries against MySQL, MS SQL Server (including native sqlcmd

), Databricks, and toolkit-managed clients.

Risk level is weighted by category sensitivity — one SSN column matters more than twenty address columns. Exits with code 1 if any PII columns are found (scriptable in CI). Pass --verbose

to show all detected columns, or --json

for machine-readable output.

Sensitivity Categories Risk floor
Critical
Government IDs, Health & medical, Financial, Biometric HIGH always; CRITICAL if ≥3 columns or >10% of schema
Elevated
Contact, Names, Date of birth, Location of birth, Family & relationships, Employment HIGH if >5% of schema; CRITICAL if >25%
Standard
Address & location, Online & technical, Demographics HIGH if >25% of schema

Note:gate scan

detects PII by column name only. A LOW result means your column names look clean — it does not mean the data is safe. Gate 2 additionally inspects values at query time, catching PII in free-text, JSON, and ambiguously-named columns that scan cannot see. In multi-row results, if any value in a column matches a PII pattern, the entire column is promoted and all rows are redacted — not just the matching row.

For false positives (e.g. city

in a products

table), run gate scan --review

to triage interactively and add columns to the allowlist. Allowlisted columns skip all redaction — both name-based and value-based. Only add a column to the allowlist when you are certain it contains no PII. Low-confidence pattern matches (below confidence_threshold

) are redacted and flagged with a warning in _gate_summary

; add the column to column_allowlist

to suppress. Manage the list directly with gate allowlist add/remove/list

.

Install gate

brew tap GaaraZhu/gate && brew install gate

cargo binstall gate

Create your config(opens~/.config/gate/config.yaml

in your editor):

gate config

Register the hook with your agent harness:

gate init

gate init --harness opencode

gate init --harness cursor

gate init --harness copilot-cli

gate init --harness codex

gate init --harness gemini

Add

--scope project

for project-only setup. Restart your OpenCode, Cursor, or Gemini CLI session aftergate init

to load the hook. For Codex CLI, restart the session, then review the hook in the Trust & Permissions UI, mark it as trusted, and enable it. For Copilot CLI, the generated.github/hooks/PreToolUse.json

is gitignored by default — each developer runsgate init --harness copilot-cli

once in their local clone. - *(Optional)*Register MCP server proxies sotools/call

responses also pass through gate:

gate init --wrap-mcp

gate init --harness opencode --wrap-mcp --yes

gate init --harness cursor --wrap-mcp --yes

gate init --harness copilot-cli --wrap-mcp --yes

gate init --harness codex --wrap-mcp --yes

gate init --harness gemini --wrap-mcp --yes

Add

--scope project

for project-level MCP config. For Cursor project-scoped MCP, re-enable the servers inSettings → Tools & MCPs after registration. Seedocs/mcp.mdfor--servers

, per-harness paths, and manual single-server registration. - Start your AI sessiongate

intercepts query commands automatically. No changes to your prompts or tools required.

Run gate validate

to confirm your config is valid before the first session.

gate

covers two access paths agents use to reach data. The blog post has the full walkthrough; the short version:

Every Bash command passes through gate hook

first. Commands that match a configured tool are silently rewritten to gate run -- <original command>

, which spawns the subprocess and pipes stdout through the two-gate detection pipeline. The rewrite happens in the harness's pre-tool-execution hook — it is enforcing in Claude Code, OpenCode, Cursor, GitHub Copilot CLI, Codex CLI, and Gemini CLI; the agent cannot bypass it. Humans and CI scripts running outside the harness are untouched.

AI asks to run: tkpsql query --sql "SELECT * FROM users"
                        │
         harness hook fires (PreToolUse / tool.execute.before)
                        │
              gate hook rewrites to: gate run -- tkpsql query --sql "..."
                        │
         ┌──────────────┴──────────────┐
         │ Gate 1: SQL inspection      │  SELECT * → no column hints, defer to Gate 2
         │ Gate 2: Value scanning      │  regex + column-name heuristics + Luhn check
         └──────────────┬──────────────┘
                        │
         {"id": 1, "full_name": "[PII:name]", "email": "[PII:email]", ..., "_gate_summary": {...}}

gate mcp

is a transparent stdio proxy registered in the harness as the MCP server. It forwards all JSON-RPC traffic verbatim except tools/call

responses, which pass through Gate 2 before reaching the model. No changes to the upstream server are required.

Note:onlytools/call

responses are redacted —resources/read

,prompts/get

, and other MCP message types are forwarded without inspection.

AI ──tools/call──> gate mcp ──forward──> upstream MCP server
                       │
                       │ <── tools/call response with PII
                       │
                       │ Gate 2 scan + redact
                       │
AI <───redacted result─┘

Redacted output preserves the original JSON structure. PII values are replaced with [PII:<type>]

placeholders. A _gate_summary

field is appended reporting what was redacted.

{
  "rows": [{"id": 1, "email": "[PII:email]", "ssn": "[PII:ssn]"}],
  "count": 1,
  "_gate_summary": {"redacted": 2, "types": ["email", "ssn"], "warnings": []}
}

With hash_values: true

in config, each placeholder gains an 8-char hex suffix derived from the original value ([PII:email:7f83b165]

). The same raw value always produces the same suffix, so the AI can join or deduplicate across rows without ever seeing the underlying data. Error responses from the underlying tool pass through unchanged.

_gate_summary

reports a single response. gate retro

aggregates across all of them — total queries seen, PII fields redacted, hit rate, plus a breakdown by tool and PII category. Useful for periodic audits and for confirming the boundary is doing real work.

If any query produced a low-confidence redaction, gate retro

surfaces a Low-confidence redactions section listing each unique warned column and the exact gate allowlist add <col>

command to suppress it. Once a column is added to the allowlist it disappears from this section automatically.

Stats are collected by default and written to a local JSONL log on disk — they never leave your machine. Disable with stats.enabled: false

in config.

gate

is a deterministic redaction layer, not a sandbox. It assumes the agent is non-adversarial and only inspects output from commands listed under tools:

in config. The following are deliberately out of scope:

Adversarial agents / prompt injection. Gate's threat model is an agent thatinadvertentlyexfiltrates PII.gate protect

(Unix) blocks the most direct bypass — a hijacked agent disabling gate via config edits — by transferring config ownership to root. But a determined attacker can still route around gate by invoking commands not intools:

, requesting non-JSON output formats, piping through encoders, or removing the hook entry from the harness settings file for the next session. Pair gate with a harness-level Bash allowlist to close the residual gap.Commands not in The AI can invoke them freely; their output is never inspected.tools:

.Non-JSON tool output. Plain text, CSV, and other formats pass through unchanged. Configure tools to emit JSON.Encoded or obfuscated PII. Base64-encoded emails, URL-encoded values, or deliberately spaced strings (a l i c e @ e x a m p l e . c o m

) are not detected.Non-US PII by value alone. The built-in SSN regex requires dashes. AU/NZ phone numbers are caught by value — mobile (04XX

/02X

local,+61 4XX

/+64 2X

international) and landline (0[2378]

/0[34679]

local,+61 [2378]

/+64 [34679]

international) — including the common+610

/+640

stray-leading-zero variant and arbitrary whitespace in the number. International-prefix numbers (+61

/+64

) auto-redact regardless of column name; local-format numbers require a PII-named column. Other AU/NZ identifiers are also covered at the value layer: ABN (mod-89 checksum), Medicare (mod-10 checksum), formatted TFN and IRD numbers (mod-11, separators required), NZ NHI (alpha-prefix regex), and NZ bank account numbers. Bare/unformatted TFN and IRD strings without separators are not detected by value alone — column-name matching remains the safety net for those. Other non-AU/NZ formats rely solely on column-name matching — extendpii.column_names

orpii.patterns

for your region.PII already in the model's context from prior turns, system prompts, file reads, or earlier summarisation. Gate filters what goesintothe model from configured tools; what's already there stays there.Tool-side network exfiltration. If a configured tool sends data to an external service directly (rather than returning it via stdout), gate never sees it.Write operations.INSERT

,UPDATE

,DELETE

are not inspected or blocked.Credential exposure. Gate holds no credentials; that is the responsibility of the underlying tool. Prefer toolkit commands or MCP servers over raw clients that take credentials on the CLI.

For a stronger boundary, combine gate with harness-level tool restrictions and database-level read-only roles. See THREAT-MODEL.md for the full attacker model and known bypasses.

Any command that returns JSON can be configured as a gate

target — database clients, internal API calls via curl

, or any other tool your AI agent uses to fetch data. The AI sees the same structured response it always did, with PII values replaced in-place.

Command Type Notes
tkpsql
PostgreSQL (toolkit-managed) sql_arg: "--sql"
tkmsql
MS SQL Server (toolkit-managed) sql_arg: "--sql"
tkdbr
Databricks (toolkit-managed) sql_arg: "--sql"
databricks
Databricks CLI (native) sql_arg: "--json" , json_sql_path: "statement"
curl
HTTP data sources pipe: "jq -c ."
psql , mysql , mariadb
Raw DB clients Not enabled by default — see

Prefer toolkit commands or MCP servers over raw clients: raw clients typically require credentials on the command line, which lands in the agent's transcript, shell history, and process listing. Toolkit commands ( tk*) inject credentials from a secrets store; MCP servers hide the connection string entirely.

gate

works with any JSON-returning command — toolkit is not required.

gate --help                    # full subcommand list
gate <subcommand> --help       # details for any subcommand

The ones you'll use most:

Command Purpose
gate init
Register the hook with your harness (see Quickstart)
gate config
Create and edit the YAML config
gate scan
PII risk report across your schema
gate allowlist add/remove/list
Manage column-name false positives
gate retro
Protection retrospective — total queries & PII fields redacted, breakdown by tool and PII type/category, hit rate with visual progress bar, and low-confidence warnings with allowlist hints
gate enable / gate disable
Toggle redaction without uninstalling
gate validate
Check config for errors before the first session
gate protect / gate unprotect (Unix only)
Transfer config ownership to root
gate uninstall
Remove everything gate added to your system

See docs/commands.md for the full reference, including gate run

, gate mcp

, and the --wrap-mcp

/ --scope

/ --harness

flags.

For a stronger guarantee, transfer ownership of the config to root so the agent cannot modify it:

sudo gate protect      # any future enable/disable/config/allowlist now needs sudo
sudo gate unprotect    # restore direct write access

Enforced at the OS level across all harnesses (Claude Code, OpenCode, Cursor, GitHub Copilot CLI, Codex CLI, Gemini CLI). Not supported on Windows.

AI Tool Bash Hook MCP Wrap Notes

Cursorgate init

to load the hookOpenCodegate init

to load the hookGitHub Copilot CLIgate init

onceCodex CLIgate init

, restart session and trust + enable the hook in the Permissions UIGemini CLIgate init

to load the hookConfiguration— full YAML schema and built-in PII detection rulesCommands— full subcommand referenceMCP setup— wrapping existing MCP servers and registering new onesScan queries— schema-query examples for each databaseConfig file locations— where each harness stores hooks and MCP settingsTroubleshooting— common issues and fixes

gate uninstall
brew uninstall gate

gate uninstall

removes gate hooks from all harnesses, the config directory at ~/.config/gate/

, and any gate-generated plugin files. It shows what will be deleted and asks for confirmation.

Bug reports and pull requests are welcome. For significant changes, open an issue first to discuss the proposal. See CONTRIBUTING.md for the dev setup, pre-commit checklist, and safety rules for redaction changes.

MIT — see LICENSE.

See DISCLAIMER.md.

── more in #ai-safety 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/gate-deterministic-p…] indexed:0 read:13min 2026-06-04 ·