I Built a PII Firewall for LLMs in a Weekend (and Caught My Own Leak)

wpnews.pro

Three weeks ago I was benchmarking GPT-4o against a local Llama model. I was copying prompts from a real support ticket database to make the test realistic. Midway through the run I glanced at the terminal and saw this in the logs:

prompt="Hi, my name is Sarah Johnson, my account number is 4532-1234-5678-9012..."
provider=cloud
model=gpt-4o

A real customer's name. A real credit card number. Already sent to OpenAI.

I had not noticed because the benchmark UI just showed a token count, not the actual prompt content. The PII was in the data. I had forgotten to sanitise it. OpenAI's API terms say they don't train on API data, but that's not the point — the data left my infrastructure. Under GDPR, that's a potential breach.

I spent the rest of that weekend building a firewall so it could never happen again. This post is the full story of what I built, how it works, and how you can run it in one command.

The code is at github.com/sochaty/llm-governance-engine — tag governance-post-1

.

Every LLM observability tool I have used — LangSmith, Helicone, Arize Phoenix — works the same way: it records what happened after the fact. You get a dashboard, a trace, a cost breakdown. None of them stop the request.

That distinction matters enormously under GDPR, HIPAA, and the EU AI Act. "We logged that PII was sent" is not a compliance posture. "PII was blocked before it left the building" is.

By the end of this post you will have:

Everything runs with docker compose up

.

The key insight is where enforcement happens: before the model call, not after.

User Prompt
    │
    ▼
FastAPI /benchmark/stream
    │
    ├── enforce_governance_policy()   ← Presidio scan + policy evaluation
    │       │
    │       ├── PII detected + cloud model → HTTP 403 (prompt never sent)
    │       ├── Safety score low → warn + log + continue
    │       └── All rules pass → verdict returned to endpoint
    │
    ├── LLMOrchestrator.get_streaming_response()
    │       │
    │       ├── OpenAI / Groq / Google / Anthropic (cloud)
    │       └── Ollama (local)
    │
    └── AuditService → PostgreSQL

The enforce_governance_policy

function is a FastAPI Depends()

— injected into the streaming endpoint. If a blocking rule fires, it raises HTTP 403

before the orchestrator is even called. The prompt never touches the wire.

The entire governance model is a YAML file. No code changes, no restarts — edit the file, POST /api/v1/policies/reload

, rules are live.

version: "1.0"
name: "default"

rules:
  - id: pii-cloud-block
    name: "Block PII from cloud models"
    condition: pii_detected
    threshold: 0.7          # Presidio confidence ≥ 0.7 triggers this rule
    models: [cloud, gpt-4o]
    action: block           # returns HTTP 403
    severity: critical
    webhook_url: null       # set to your Slack URL to get alerted

  - id: low-safety-warn
    name: "Warn on low safety score"
    condition: safety_score_below
    threshold: 0.5
    action: warn            # logs + audits, passes through
    severity: medium

  - id: pii-local-alert
    name: "Alert on PII sent to local models"
    condition: pii_detected
    threshold: 0.85
    models: [local]
    action: alert           # fires webhook, does not block
    severity: high

Four conditions: pii_detected

, safety_score_below

, cost_exceeds

, model_is

.

Three actions: block

(HTTP 403), warn

(audit + continue), alert

(webhook + continue).

Starter templates are shipped in the repo for GDPR (policies/gdpr.yaml

) and HIPAA (policies/hipaa.yaml

).

Presidio is Microsoft's open-source PII detection library. It runs locally — no API call, no data leaving your machine.

It detects 50+ entity types out of the box: PERSON

, EMAIL_ADDRESS

, CREDIT_CARD

, US_SSN

, PHONE_NUMBER

, IBAN_CODE

, IP_ADDRESS

, and more. It uses a combination of regex patterns, checksums, and a spaCy NLP model for name recognition.

The scan returns a confidence score per entity. The policy engine compares that score against the rule's threshold

. An entity with 0.95 confidence on CREDIT_CARD

and a threshold of 0.7 triggers the pii-cloud-block

rule.

from presidio_analyzer import AnalyzerEngine

class AuditService:
    def __init__(self):
        self.analyzer = AnalyzerEngine()

    def scan_for_pii_details(self, text: str) -> ScanResult:
        results = self.analyzer.analyze(text=text, language="en")
        detected = len(results) > 0
        entities = [
            EntityResult(
                entity_type=r.entity_type,
                confidence=r.score,
                start=r.start,
                end=r.end,
            )
            for r in results
        ]
        max_confidence = max((r.score for r in results), default=0.0)
        return ScanResult(
            detected=detected,
            entities=entities,
            max_confidence=max_confidence,
        )

The safety score is calculated separately — it is a 0.0–1.0 measure that combines PII confidence, entity density, and sensitive keyword presence. A score below 0.5 triggers the low-safety-warn

rule.

The engine follows a Chain of Responsibility pattern. Each rule evaluates the GovernanceContext

independently:

@dataclass
class GovernanceContext:
    prompt: str
    provider: str
    model_id: str
    pii_detected: bool
    pii_entity_types: List[str]
    pii_max_confidence: float
    safety_score: float
    estimated_prompt_cost_usd: float

class PolicyVerdict(BaseModel):
    passed: bool
    violated_rules: List[ViolatedRule] = []
    blocking_rule: Optional[ViolatedRule] = None
    warnings: List[str] = []

The DefaultPolicyEngine.evaluate()

iterates all rules in order. Block rules short-circuit. Warn and alert rules accumulate into the verdict. The verdict is returned to the FastAPI dependency, which raises HTTP 403

if blocking_rule

is set.

This is the part that makes everything composable. One line wires the entire governance stack into any endpoint:

@router.get("/stream")
async def stream_benchmark(
    verdict: PolicyVerdict = Depends(enforce_governance_policy),
    db: AsyncSession = Depends(get_db),
):
    ...

The dependency itself:

async def enforce_governance_policy(
    prompt: Annotated[str, Query(min_length=1)],
    provider: Annotated[str, Query(pattern="^(cloud|local)$")] = "cloud",
    db: AsyncSession = Depends(get_db),
) -> PolicyVerdict:
    engine = get_policy_engine()
    audit = _get_audit_service()

    scan = audit.scan_for_pii_details(prompt)

    context = GovernanceContext(
        prompt=prompt,
        provider=provider,
        model_id="gpt-4o" if provider == "cloud" else "llama3.2:latest",
        pii_detected=scan.detected,
        pii_entity_types=[e.entity_type for e in scan.entities],
        pii_max_confidence=scan.max_confidence,
        safety_score=audit.calculate_safety_score(prompt),
        estimated_prompt_cost_usd=(len(prompt.split()) * 0.00003)
        if provider == "cloud" else 0.0,
    )

    verdict = engine.evaluate(context)

    for violation in verdict.violated_rules:
        webhook_url = _get_webhook_url(engine, violation.rule_id)
        await _record_violation(db, violation, context, webhook_url)

    if not verdict.passed and verdict.blocking_rule:
        br = verdict.blocking_rule
        raise HTTPException(
            status_code=403,
            detail={
                "error": "governance_violation",
                "rule_id": br.rule_id,
                "rule_name": br.rule_name,
                "severity": br.severity,
                "message": br.message,
            },
        )

    return verdict

Every violation — blocked or not — is persisted to policy_violations

in PostgreSQL before the function returns. Webhook delivery is fire-and-forget via asyncio.create_task()

so it never adds latency to the response path.

When a rule fires with a webhook_url

, a CloudEvents-compatible payload is POSTed:

{
  "specversion": "1.0",
  "type": "com.governance.policy.violation",
  "source": "llm-governance-engine",
  "id": "uuid",
  "time": "2026-06-19T09:00:00Z",
  "data": {
    "rule_id": "pii-cloud-block",
    "rule_name": "Block PII from cloud models",
    "severity": "critical",
    "action": "block",
    "message": "PII detected (CREDIT_CARD, confidence=0.95) on cloud provider",
    "provider": "cloud",
    "model_id": "gpt-4o"
  }
}

Three delivery attempts with exponential backoff. Slack, Teams, and PagerDuty all accept this payload natively via their incoming webhook integrations.

git clone https://github.com/sochaty/llm-governance-engine
git checkout governance-post-1
cp .env.example .env
docker compose up

Dashboard → http://localhost:4200

API docs → http://localhost:8000/docs

Pull a local model to enable the side-by-side comparison:

curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2:latest"}'

Trigger your first governance block:

Open the dashboard, type a prompt containing a fake SSN — My SSN is 123-45-6789

— select the Cloud provider and hit Run. You will get a red Governance Violation

banner instead of a response. The prompt never reached GPT-4o.

Open http://localhost:8000/api/v1/policies/violations

to see the audit record of the block.

Every inference — blocked or not — is stored in PostgreSQL:

Field	Example
`prompt` (preview)
"My SSN is 123-45..."
`provider`
cloud
`model_name`
gpt-4o
`pii_detected`
true
`safety_score`
0.12
`latency_ms`
0 (blocked before model)
`estimated_cost`
$0.0000
`version_tag`
openai/gpt-4o

The Audit Vault page in the dashboard is filterable by prompt, provider, and PII flag. Every row has a "Generate Report" button that exports a PDF — useful when a compliance officer asks for evidence.

The orchestrator supports five provider types with a single interface:

Provider	How it connects
OpenAI
`AsyncOpenAI` — native
Groq	`AsyncOpenAI(base_url="https://api.groq.com/openai/v1")`
Google Gemini	`AsyncOpenAI(base_url="https://generativelanguage.googleapis.com/v1beta/openai")`
Anthropic	Lazy `import anthropic` — separate streaming path
Ollama (local)	`AsyncOpenAI(base_url="http://ollama-service:11434/v1", api_key="ollama")`

API keys are stored in PostgreSQL (Fernet-encrypted) and resolved live on every request via settings_service.get()

. Change a key in the Settings UI — no restart needed, effective on the next request.

The codebase is production-ready for single-tenant use. The roadmap from here:

faithfulness_score

populates in the audit log 2–3 seconds after the benchmark completes.The incident that started this — a real customer's credit card number sent to GPT-4o because I forgot to sanitise a test dataset — took about 30 seconds to happen and would have taken weeks to untangle from a compliance perspective.

The fix took a weekend. It should have existed before the first prompt was ever sent.

Full code: github.com/sochaty/llm-governance-engine

Reproduce this post exactly: git checkout governance-post-1

PRs and issues welcome. If you build a custom Presidio recogniser for your domain (medical records, legal documents, financial instruments), I would love to include it in the default policy templates.

All my writing lives at

[blogs.sourishchakraborty.com]— subscribe there for future posts.

source & further reading

dev.to — original article How to Build a Multi-Step Agent Stress Test: Adversity Sandboxes and Oracle Checks Eidetic Works Pro is live: persistent memory for your AI agents, $29/mo Structuring TypeScript: Interfaces, Type Aliases, Enums, and Object Types

I Built a PII Firewall for LLMs in a Weekend (and Caught My Own Leak)

Run your AI side-project on zahid.host