I Built a PII Firewall for LLMs in a Weekend (and Caught My Own Leak)

A developer built an open-source PII firewall for LLMs after accidentally sending a customer's credit card number to OpenAI during a benchmark test. The tool, called LLM Governance Engine, intercepts prompts before they reach the model, using Microsoft's Presidio library to detect and block sensitive data based on configurable YAML policies. It runs locally with Docker and supports actions like blocking, warning, or alerting for GDPR and HIPAA compliance.

Three weeks ago I was benchmarking GPT-4o against a local Llama model. I was copying prompts from a real support ticket database to make the test realistic. Midway through the run I glanced at the terminal and saw this in the logs: prompt="Hi, my name is Sarah Johnson, my account number is 4532-1234-5678-9012..." provider=cloud model=gpt-4o A real customer's name. A real credit card number. Already sent to OpenAI. I had not noticed because the benchmark UI just showed a token count, not the actual prompt content. The PII was in the data. I had forgotten to sanitise it. OpenAI's API terms say they don't train on API data, but that's not the point — the data left my infrastructure. Under GDPR, that's a potential breach. I spent the rest of that weekend building a firewall so it could never happen again. This post is the full story of what I built, how it works, and how you can run it in one command. The code is at github.com/sochaty/llm-governance-engine https://github.com/sochaty/llm-governance-engine — tag governance-post-1 . Every LLM observability tool I have used — LangSmith, Helicone, Arize Phoenix — works the same way: it records what happened after the fact. You get a dashboard, a trace, a cost breakdown. None of them stop the request. That distinction matters enormously under GDPR, HIPAA, and the EU AI Act. "We logged that PII was sent" is not a compliance posture. "PII was blocked before it left the building" is. By the end of this post you will have: Everything runs with docker compose up . The key insight is where enforcement happens: before the model call, not after . User Prompt │ ▼ FastAPI /benchmark/stream │ ├── enforce governance policy ← Presidio scan + policy evaluation │ │ │ ├── PII detected + cloud model → HTTP 403 prompt never sent │ ├── Safety score low → warn + log + continue │ └── All rules pass → verdict returned to endpoint │ ├── LLMOrchestrator.get streaming response │ │ │ ├── OpenAI / Groq / Google / Anthropic cloud │ └── Ollama local │ └── AuditService → PostgreSQL The enforce governance policy function is a FastAPI Depends — injected into the streaming endpoint. If a blocking rule fires, it raises HTTP 403 before the orchestrator is even called. The prompt never touches the wire. The entire governance model is a YAML file. No code changes, no restarts — edit the file, POST /api/v1/policies/reload , rules are live. policies/default.yaml version: "1.0" name: "default" rules: - id: pii-cloud-block name: "Block PII from cloud models" condition: pii detected threshold: 0.7 Presidio confidence ≥ 0.7 triggers this rule models: cloud, gpt-4o action: block returns HTTP 403 severity: critical webhook url: null set to your Slack URL to get alerted - id: low-safety-warn name: "Warn on low safety score" condition: safety score below threshold: 0.5 action: warn logs + audits, passes through severity: medium - id: pii-local-alert name: "Alert on PII sent to local models" condition: pii detected threshold: 0.85 models: local action: alert fires webhook, does not block severity: high Four conditions: pii detected , safety score below , cost exceeds , model is . Three actions: block HTTP 403 , warn audit + continue , alert webhook + continue . Starter templates are shipped in the repo for GDPR policies/gdpr.yaml and HIPAA policies/hipaa.yaml . Presidio is Microsoft's open-source PII detection library. It runs locally — no API call, no data leaving your machine. It detects 50+ entity types out of the box: PERSON , EMAIL ADDRESS , CREDIT CARD , US SSN , PHONE NUMBER , IBAN CODE , IP ADDRESS , and more. It uses a combination of regex patterns, checksums, and a spaCy NLP model for name recognition. The scan returns a confidence score per entity. The policy engine compares that score against the rule's threshold . An entity with 0.95 confidence on CREDIT CARD and a threshold of 0.7 triggers the pii-cloud-block rule. backend/app/services/audit service.py simplified from presidio analyzer import AnalyzerEngine class AuditService: def init self : self.analyzer = AnalyzerEngine def scan for pii details self, text: str - ScanResult: results = self.analyzer.analyze text=text, language="en" detected = len results 0 entities = EntityResult entity type=r.entity type, confidence=r.score, start=r.start, end=r.end, for r in results max confidence = max r.score for r in results , default=0.0 return ScanResult detected=detected, entities=entities, max confidence=max confidence, The safety score is calculated separately — it is a 0.0–1.0 measure that combines PII confidence, entity density, and sensitive keyword presence. A score below 0.5 triggers the low-safety-warn rule. The engine follows a Chain of Responsibility pattern. Each rule evaluates the GovernanceContext independently: backend/app/governance/policy/schema.py @dataclass class GovernanceContext: prompt: str provider: str model id: str pii detected: bool pii entity types: List str pii max confidence: float safety score: float estimated prompt cost usd: float class PolicyVerdict BaseModel : passed: bool violated rules: List ViolatedRule = blocking rule: Optional ViolatedRule = None warnings: List str = The DefaultPolicyEngine.evaluate iterates all rules in order. Block rules short-circuit. Warn and alert rules accumulate into the verdict. The verdict is returned to the FastAPI dependency, which raises HTTP 403 if blocking rule is set. This is the part that makes everything composable. One line wires the entire governance stack into any endpoint: backend/app/api/benchmark router.py @router.get "/stream" async def stream benchmark verdict: PolicyVerdict = Depends enforce governance policy , db: AsyncSession = Depends get db , : If we reach here, the prompt passed all blocking rules. verdict.warnings contains any non-blocking rule hits. ... The dependency itself: backend/app/governance/policy/enforcement.py simplified async def enforce governance policy prompt: Annotated str, Query min length=1 , provider: Annotated str, Query pattern="^ cloud|local $" = "cloud", db: AsyncSession = Depends get db , - PolicyVerdict: engine = get policy engine audit = get audit service scan = audit.scan for pii details prompt context = GovernanceContext prompt=prompt, provider=provider, model id="gpt-4o" if provider == "cloud" else "llama3.2:latest", pii detected=scan.detected, pii entity types= e.entity type for e in scan.entities , pii max confidence=scan.max confidence, safety score=audit.calculate safety score prompt , estimated prompt cost usd= len prompt.split 0.00003 if provider == "cloud" else 0.0, verdict = engine.evaluate context for violation in verdict.violated rules: webhook url = get webhook url engine, violation.rule id await record violation db, violation, context, webhook url if not verdict.passed and verdict.blocking rule: br = verdict.blocking rule raise HTTPException status code=403, detail={ "error": "governance violation", "rule id": br.rule id, "rule name": br.rule name, "severity": br.severity, "message": br.message, }, return verdict Every violation — blocked or not — is persisted to policy violations in PostgreSQL before the function returns. Webhook delivery is fire-and-forget via asyncio.create task so it never adds latency to the response path. When a rule fires with a webhook url , a CloudEvents-compatible payload is POSTed: { "specversion": "1.0", "type": "com.governance.policy.violation", "source": "llm-governance-engine", "id": "uuid", "time": "2026-06-19T09:00:00Z", "data": { "rule id": "pii-cloud-block", "rule name": "Block PII from cloud models", "severity": "critical", "action": "block", "message": "PII detected CREDIT CARD, confidence=0.95 on cloud provider", "provider": "cloud", "model id": "gpt-4o" } } Three delivery attempts with exponential backoff. Slack, Teams, and PagerDuty all accept this payload natively via their incoming webhook integrations. git clone https://github.com/sochaty/llm-governance-engine git checkout governance-post-1 cp .env.example .env Add your OPENAI API KEY or any provider key docker compose up Dashboard → http://localhost:4200 API docs → http://localhost:8000/docs Pull a local model to enable the side-by-side comparison: curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2:latest"}' Trigger your first governance block: Open the dashboard, type a prompt containing a fake SSN — My SSN is 123-45-6789 — select the Cloud provider and hit Run. You will get a red Governance Violation banner instead of a response. The prompt never reached GPT-4o. Open http://localhost:8000/api/v1/policies/violations to see the audit record of the block. Every inference — blocked or not — is stored in PostgreSQL: | Field | Example | |---|---| prompt preview | "My SSN is 123-45..." | provider | cloud | model name | gpt-4o | pii detected | true | safety score | 0.12 | latency ms | 0 blocked before model | estimated cost | $0.0000 | version tag | openai/gpt-4o | The Audit Vault page in the dashboard is filterable by prompt, provider, and PII flag. Every row has a "Generate Report" button that exports a PDF — useful when a compliance officer asks for evidence. The orchestrator supports five provider types with a single interface: | Provider | How it connects | |---|---| | OpenAI | AsyncOpenAI — native | | Groq | AsyncOpenAI base url="https://api.groq.com/openai/v1" | | Google Gemini | AsyncOpenAI base url="https://generativelanguage.googleapis.com/v1beta/openai" | | Anthropic | Lazy import anthropic — separate streaming path | | Ollama local | AsyncOpenAI base url="http://ollama-service:11434/v1", api key="ollama" | API keys are stored in PostgreSQL Fernet-encrypted and resolved live on every request via settings service.get . Change a key in the Settings UI — no restart needed, effective on the next request. The codebase is production-ready for single-tenant use. The roadmap from here: faithfulness score populates in the audit log 2–3 seconds after the benchmark completes.The incident that started this — a real customer's credit card number sent to GPT-4o because I forgot to sanitise a test dataset — took about 30 seconds to happen and would have taken weeks to untangle from a compliance perspective. The fix took a weekend. It should have existed before the first prompt was ever sent. Full code: github.com/sochaty/llm-governance-engine https://github.com/sochaty/llm-governance-engine Reproduce this post exactly: git checkout governance-post-1 PRs and issues welcome. If you build a custom Presidio recogniser for your domain medical records, legal documents, financial instruments , I would love to include it in the default policy templates. All my writing lives at blogs.sourishchakraborty.com — subscribe there for future posts.