{"slug": "i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak", "title": "I Built a PII Firewall for LLMs in a Weekend (and Caught My Own Leak)", "summary": "A developer built an open-source PII firewall for LLMs after accidentally sending a customer's credit card number to OpenAI during a benchmark test. The tool, called LLM Governance Engine, intercepts prompts before they reach the model, using Microsoft's Presidio library to detect and block sensitive data based on configurable YAML policies. It runs locally with Docker and supports actions like blocking, warning, or alerting for GDPR and HIPAA compliance.", "body_md": "Three weeks ago I was benchmarking GPT-4o against a local Llama model. I was copying prompts from a real support ticket database to make the test realistic. Midway through the run I glanced at the terminal and saw this in the logs:\n\n```\nprompt=\"Hi, my name is Sarah Johnson, my account number is 4532-1234-5678-9012...\"\nprovider=cloud\nmodel=gpt-4o\n```\n\nA real customer's name. A real credit card number. Already sent to OpenAI.\n\nI had not noticed because the benchmark UI just showed a token count, not the actual prompt content. The PII was in the data. I had forgotten to sanitise it. OpenAI's API terms say they don't train on API data, but that's not the point — the data left my infrastructure. Under GDPR, that's a potential breach.\n\nI spent the rest of that weekend building a firewall so it could never happen again. This post is the full story of what I built, how it works, and how you can run it in one command.\n\nThe code is at [github.com/sochaty/llm-governance-engine](https://github.com/sochaty/llm-governance-engine) — tag `governance-post-1`\n\n.\n\nEvery LLM observability tool I have used — LangSmith, Helicone, Arize Phoenix — works the same way: it records what happened after the fact. You get a dashboard, a trace, a cost breakdown. None of them stop the request.\n\nThat distinction matters enormously under GDPR, HIPAA, and the EU AI Act. \"We logged that PII was sent\" is not a compliance posture. \"PII was blocked before it left the building\" is.\n\nBy the end of this post you will have:\n\nEverything runs with `docker compose up`\n\n.\n\nThe key insight is where enforcement happens: **before the model call, not after**.\n\n```\nUser Prompt\n    │\n    ▼\nFastAPI /benchmark/stream\n    │\n    ├── enforce_governance_policy()   ← Presidio scan + policy evaluation\n    │       │\n    │       ├── PII detected + cloud model → HTTP 403 (prompt never sent)\n    │       ├── Safety score low → warn + log + continue\n    │       └── All rules pass → verdict returned to endpoint\n    │\n    ├── LLMOrchestrator.get_streaming_response()\n    │       │\n    │       ├── OpenAI / Groq / Google / Anthropic (cloud)\n    │       └── Ollama (local)\n    │\n    └── AuditService → PostgreSQL\n```\n\nThe `enforce_governance_policy`\n\nfunction is a FastAPI `Depends()`\n\n— injected into the streaming endpoint. If a blocking rule fires, it raises `HTTP 403`\n\nbefore the orchestrator is even called. The prompt never touches the wire.\n\nThe entire governance model is a YAML file. No code changes, no restarts — edit the file, `POST /api/v1/policies/reload`\n\n, rules are live.\n\n```\n# policies/default.yaml\nversion: \"1.0\"\nname: \"default\"\n\nrules:\n  - id: pii-cloud-block\n    name: \"Block PII from cloud models\"\n    condition: pii_detected\n    threshold: 0.7          # Presidio confidence ≥ 0.7 triggers this rule\n    models: [cloud, gpt-4o]\n    action: block           # returns HTTP 403\n    severity: critical\n    webhook_url: null       # set to your Slack URL to get alerted\n\n  - id: low-safety-warn\n    name: \"Warn on low safety score\"\n    condition: safety_score_below\n    threshold: 0.5\n    action: warn            # logs + audits, passes through\n    severity: medium\n\n  - id: pii-local-alert\n    name: \"Alert on PII sent to local models\"\n    condition: pii_detected\n    threshold: 0.85\n    models: [local]\n    action: alert           # fires webhook, does not block\n    severity: high\n```\n\nFour conditions: `pii_detected`\n\n, `safety_score_below`\n\n, `cost_exceeds`\n\n, `model_is`\n\n.\n\nThree actions: `block`\n\n(HTTP 403), `warn`\n\n(audit + continue), `alert`\n\n(webhook + continue).\n\nStarter templates are shipped in the repo for GDPR (`policies/gdpr.yaml`\n\n) and HIPAA (`policies/hipaa.yaml`\n\n).\n\nPresidio is Microsoft's open-source PII detection library. It runs locally — no API call, no data leaving your machine.\n\nIt detects 50+ entity types out of the box: `PERSON`\n\n, `EMAIL_ADDRESS`\n\n, `CREDIT_CARD`\n\n, `US_SSN`\n\n, `PHONE_NUMBER`\n\n, `IBAN_CODE`\n\n, `IP_ADDRESS`\n\n, and more. It uses a combination of regex patterns, checksums, and a spaCy NLP model for name recognition.\n\nThe scan returns a confidence score per entity. The policy engine compares that score against the rule's `threshold`\n\n. An entity with 0.95 confidence on `CREDIT_CARD`\n\nand a threshold of 0.7 triggers the `pii-cloud-block`\n\nrule.\n\n```\n# backend/app/services/audit_service.py (simplified)\nfrom presidio_analyzer import AnalyzerEngine\n\nclass AuditService:\n    def __init__(self):\n        self.analyzer = AnalyzerEngine()\n\n    def scan_for_pii_details(self, text: str) -> ScanResult:\n        results = self.analyzer.analyze(text=text, language=\"en\")\n        detected = len(results) > 0\n        entities = [\n            EntityResult(\n                entity_type=r.entity_type,\n                confidence=r.score,\n                start=r.start,\n                end=r.end,\n            )\n            for r in results\n        ]\n        max_confidence = max((r.score for r in results), default=0.0)\n        return ScanResult(\n            detected=detected,\n            entities=entities,\n            max_confidence=max_confidence,\n        )\n```\n\nThe safety score is calculated separately — it is a 0.0–1.0 measure that combines PII confidence, entity density, and sensitive keyword presence. A score below 0.5 triggers the `low-safety-warn`\n\nrule.\n\nThe engine follows a Chain of Responsibility pattern. Each rule evaluates the `GovernanceContext`\n\nindependently:\n\n```\n# backend/app/governance/policy/schema.py\n@dataclass\nclass GovernanceContext:\n    prompt: str\n    provider: str\n    model_id: str\n    pii_detected: bool\n    pii_entity_types: List[str]\n    pii_max_confidence: float\n    safety_score: float\n    estimated_prompt_cost_usd: float\n\nclass PolicyVerdict(BaseModel):\n    passed: bool\n    violated_rules: List[ViolatedRule] = []\n    blocking_rule: Optional[ViolatedRule] = None\n    warnings: List[str] = []\n```\n\nThe `DefaultPolicyEngine.evaluate()`\n\niterates all rules in order. Block rules short-circuit. Warn and alert rules accumulate into the verdict. The verdict is returned to the FastAPI dependency, which raises `HTTP 403`\n\nif `blocking_rule`\n\nis set.\n\nThis is the part that makes everything composable. One line wires the entire governance stack into any endpoint:\n\n```\n# backend/app/api/benchmark_router.py\n@router.get(\"/stream\")\nasync def stream_benchmark(\n    verdict: PolicyVerdict = Depends(enforce_governance_policy),\n    db: AsyncSession = Depends(get_db),\n):\n    # If we reach here, the prompt passed all blocking rules.\n    # verdict.warnings contains any non-blocking rule hits.\n    ...\n```\n\nThe dependency itself:\n\n```\n# backend/app/governance/policy/enforcement.py (simplified)\nasync def enforce_governance_policy(\n    prompt: Annotated[str, Query(min_length=1)],\n    provider: Annotated[str, Query(pattern=\"^(cloud|local)$\")] = \"cloud\",\n    db: AsyncSession = Depends(get_db),\n) -> PolicyVerdict:\n    engine = get_policy_engine()\n    audit = _get_audit_service()\n\n    scan = audit.scan_for_pii_details(prompt)\n\n    context = GovernanceContext(\n        prompt=prompt,\n        provider=provider,\n        model_id=\"gpt-4o\" if provider == \"cloud\" else \"llama3.2:latest\",\n        pii_detected=scan.detected,\n        pii_entity_types=[e.entity_type for e in scan.entities],\n        pii_max_confidence=scan.max_confidence,\n        safety_score=audit.calculate_safety_score(prompt),\n        estimated_prompt_cost_usd=(len(prompt.split()) * 0.00003)\n        if provider == \"cloud\" else 0.0,\n    )\n\n    verdict = engine.evaluate(context)\n\n    for violation in verdict.violated_rules:\n        webhook_url = _get_webhook_url(engine, violation.rule_id)\n        await _record_violation(db, violation, context, webhook_url)\n\n    if not verdict.passed and verdict.blocking_rule:\n        br = verdict.blocking_rule\n        raise HTTPException(\n            status_code=403,\n            detail={\n                \"error\": \"governance_violation\",\n                \"rule_id\": br.rule_id,\n                \"rule_name\": br.rule_name,\n                \"severity\": br.severity,\n                \"message\": br.message,\n            },\n        )\n\n    return verdict\n```\n\nEvery violation — blocked or not — is persisted to `policy_violations`\n\nin PostgreSQL before the function returns. Webhook delivery is fire-and-forget via `asyncio.create_task()`\n\nso it never adds latency to the response path.\n\nWhen a rule fires with a `webhook_url`\n\n, a CloudEvents-compatible payload is POSTed:\n\n```\n{\n  \"specversion\": \"1.0\",\n  \"type\": \"com.governance.policy.violation\",\n  \"source\": \"llm-governance-engine\",\n  \"id\": \"uuid\",\n  \"time\": \"2026-06-19T09:00:00Z\",\n  \"data\": {\n    \"rule_id\": \"pii-cloud-block\",\n    \"rule_name\": \"Block PII from cloud models\",\n    \"severity\": \"critical\",\n    \"action\": \"block\",\n    \"message\": \"PII detected (CREDIT_CARD, confidence=0.95) on cloud provider\",\n    \"provider\": \"cloud\",\n    \"model_id\": \"gpt-4o\"\n  }\n}\n```\n\nThree delivery attempts with exponential backoff. Slack, Teams, and PagerDuty all accept this payload natively via their incoming webhook integrations.\n\n```\ngit clone https://github.com/sochaty/llm-governance-engine\ngit checkout governance-post-1\ncp .env.example .env\n# Add your OPENAI_API_KEY (or any provider key)\ndocker compose up\n```\n\nDashboard → `http://localhost:4200`\n\nAPI docs → `http://localhost:8000/docs`\n\nPull a local model to enable the side-by-side comparison:\n\n```\ncurl -X POST http://localhost:11434/api/pull -d '{\"name\":\"llama3.2:latest\"}'\n```\n\n**Trigger your first governance block:**\n\nOpen the dashboard, type a prompt containing a fake SSN — `My SSN is 123-45-6789`\n\n— select the Cloud provider and hit Run. You will get a red `Governance Violation`\n\nbanner instead of a response. The prompt never reached GPT-4o.\n\nOpen `http://localhost:8000/api/v1/policies/violations`\n\nto see the audit record of the block.\n\nEvery inference — blocked or not — is stored in PostgreSQL:\n\n| Field | Example |\n|---|---|\n`prompt` (preview) |\n\"My SSN is 123-45...\" |\n`provider` |\ncloud |\n`model_name` |\ngpt-4o |\n`pii_detected` |\ntrue |\n`safety_score` |\n0.12 |\n`latency_ms` |\n0 (blocked before model) |\n`estimated_cost` |\n$0.0000 |\n`version_tag` |\nopenai/gpt-4o |\n\nThe Audit Vault page in the dashboard is filterable by prompt, provider, and PII flag. Every row has a \"Generate Report\" button that exports a PDF — useful when a compliance officer asks for evidence.\n\nThe orchestrator supports five provider types with a single interface:\n\n| Provider | How it connects |\n|---|---|\n| OpenAI |\n`AsyncOpenAI` — native |\n| Groq | `AsyncOpenAI(base_url=\"https://api.groq.com/openai/v1\")` |\n| Google Gemini | `AsyncOpenAI(base_url=\"https://generativelanguage.googleapis.com/v1beta/openai\")` |\n| Anthropic | Lazy `import anthropic` — separate streaming path |\n| Ollama (local) | `AsyncOpenAI(base_url=\"http://ollama-service:11434/v1\", api_key=\"ollama\")` |\n\nAPI keys are stored in PostgreSQL (Fernet-encrypted) and resolved live on every request via `settings_service.get()`\n\n. Change a key in the Settings UI — no restart needed, effective on the next request.\n\nThe codebase is production-ready for single-tenant use. The roadmap from here:\n\n`faithfulness_score`\n\npopulates in the audit log 2–3 seconds after the benchmark completes.The incident that started this — a real customer's credit card number sent to GPT-4o because I forgot to sanitise a test dataset — took about 30 seconds to happen and would have taken weeks to untangle from a compliance perspective.\n\nThe fix took a weekend. It should have existed before the first prompt was ever sent.\n\nFull code: [github.com/sochaty/llm-governance-engine](https://github.com/sochaty/llm-governance-engine)\n\nReproduce this post exactly: `git checkout governance-post-1`\n\nPRs and issues welcome. If you build a custom Presidio recogniser for your domain (medical records, legal documents, financial instruments), I would love to include it in the default policy templates.\n\nAll my writing lives at\n\n[blogs.sourishchakraborty.com]— subscribe there for future posts.", "url": "https://wpnews.pro/news/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak", "canonical_source": "https://dev.to/sochaty/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak-1mh0", "published_at": "2026-06-19 02:03:37+00:00", "updated_at": "2026-06-19 03:00:04.072511+00:00", "lang": "en", "topics": ["large-language-models", "ai-safety", "ai-policy", "developer-tools", "ai-infrastructure"], "entities": ["OpenAI", "GPT-4o", "Microsoft", "Presidio", "LangSmith", "Helicone", "Arize Phoenix", "GDPR"], "alternates": {"html": "https://wpnews.pro/news/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak", "markdown": "https://wpnews.pro/news/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak.md", "text": "https://wpnews.pro/news/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak.txt", "jsonld": "https://wpnews.pro/news/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak.jsonld"}}