# I Built a PII Firewall for LLMs in a Weekend (and Caught My Own Leak)

> Source: <https://dev.to/sochaty/i-built-a-pii-firewall-for-llms-in-a-weekend-and-caught-my-own-leak-1mh0>
> Published: 2026-06-19 02:03:37+00:00

Three weeks ago I was benchmarking GPT-4o against a local Llama model. I was copying prompts from a real support ticket database to make the test realistic. Midway through the run I glanced at the terminal and saw this in the logs:

```
prompt="Hi, my name is Sarah Johnson, my account number is 4532-1234-5678-9012..."
provider=cloud
model=gpt-4o
```

A real customer's name. A real credit card number. Already sent to OpenAI.

I had not noticed because the benchmark UI just showed a token count, not the actual prompt content. The PII was in the data. I had forgotten to sanitise it. OpenAI's API terms say they don't train on API data, but that's not the point — the data left my infrastructure. Under GDPR, that's a potential breach.

I spent the rest of that weekend building a firewall so it could never happen again. This post is the full story of what I built, how it works, and how you can run it in one command.

The code is at [github.com/sochaty/llm-governance-engine](https://github.com/sochaty/llm-governance-engine) — tag `governance-post-1`

.

Every LLM observability tool I have used — LangSmith, Helicone, Arize Phoenix — works the same way: it records what happened after the fact. You get a dashboard, a trace, a cost breakdown. None of them stop the request.

That distinction matters enormously under GDPR, HIPAA, and the EU AI Act. "We logged that PII was sent" is not a compliance posture. "PII was blocked before it left the building" is.

By the end of this post you will have:

Everything runs with `docker compose up`

.

The key insight is where enforcement happens: **before the model call, not after**.

```
User Prompt
    │
    ▼
FastAPI /benchmark/stream
    │
    ├── enforce_governance_policy()   ← Presidio scan + policy evaluation
    │       │
    │       ├── PII detected + cloud model → HTTP 403 (prompt never sent)
    │       ├── Safety score low → warn + log + continue
    │       └── All rules pass → verdict returned to endpoint
    │
    ├── LLMOrchestrator.get_streaming_response()
    │       │
    │       ├── OpenAI / Groq / Google / Anthropic (cloud)
    │       └── Ollama (local)
    │
    └── AuditService → PostgreSQL
```

The `enforce_governance_policy`

function is a FastAPI `Depends()`

— injected into the streaming endpoint. If a blocking rule fires, it raises `HTTP 403`

before the orchestrator is even called. The prompt never touches the wire.

The entire governance model is a YAML file. No code changes, no restarts — edit the file, `POST /api/v1/policies/reload`

, rules are live.

```
# policies/default.yaml
version: "1.0"
name: "default"

rules:
  - id: pii-cloud-block
    name: "Block PII from cloud models"
    condition: pii_detected
    threshold: 0.7          # Presidio confidence ≥ 0.7 triggers this rule
    models: [cloud, gpt-4o]
    action: block           # returns HTTP 403
    severity: critical
    webhook_url: null       # set to your Slack URL to get alerted

  - id: low-safety-warn
    name: "Warn on low safety score"
    condition: safety_score_below
    threshold: 0.5
    action: warn            # logs + audits, passes through
    severity: medium

  - id: pii-local-alert
    name: "Alert on PII sent to local models"
    condition: pii_detected
    threshold: 0.85
    models: [local]
    action: alert           # fires webhook, does not block
    severity: high
```

Four conditions: `pii_detected`

, `safety_score_below`

, `cost_exceeds`

, `model_is`

.

Three actions: `block`

(HTTP 403), `warn`

(audit + continue), `alert`

(webhook + continue).

Starter templates are shipped in the repo for GDPR (`policies/gdpr.yaml`

) and HIPAA (`policies/hipaa.yaml`

).

Presidio is Microsoft's open-source PII detection library. It runs locally — no API call, no data leaving your machine.

It detects 50+ entity types out of the box: `PERSON`

, `EMAIL_ADDRESS`

, `CREDIT_CARD`

, `US_SSN`

, `PHONE_NUMBER`

, `IBAN_CODE`

, `IP_ADDRESS`

, and more. It uses a combination of regex patterns, checksums, and a spaCy NLP model for name recognition.

The scan returns a confidence score per entity. The policy engine compares that score against the rule's `threshold`

. An entity with 0.95 confidence on `CREDIT_CARD`

and a threshold of 0.7 triggers the `pii-cloud-block`

rule.

```
# backend/app/services/audit_service.py (simplified)
from presidio_analyzer import AnalyzerEngine

class AuditService:
    def __init__(self):
        self.analyzer = AnalyzerEngine()

    def scan_for_pii_details(self, text: str) -> ScanResult:
        results = self.analyzer.analyze(text=text, language="en")
        detected = len(results) > 0
        entities = [
            EntityResult(
                entity_type=r.entity_type,
                confidence=r.score,
                start=r.start,
                end=r.end,
            )
            for r in results
        ]
        max_confidence = max((r.score for r in results), default=0.0)
        return ScanResult(
            detected=detected,
            entities=entities,
            max_confidence=max_confidence,
        )
```

The safety score is calculated separately — it is a 0.0–1.0 measure that combines PII confidence, entity density, and sensitive keyword presence. A score below 0.5 triggers the `low-safety-warn`

rule.

The engine follows a Chain of Responsibility pattern. Each rule evaluates the `GovernanceContext`

independently:

```
# backend/app/governance/policy/schema.py
@dataclass
class GovernanceContext:
    prompt: str
    provider: str
    model_id: str
    pii_detected: bool
    pii_entity_types: List[str]
    pii_max_confidence: float
    safety_score: float
    estimated_prompt_cost_usd: float

class PolicyVerdict(BaseModel):
    passed: bool
    violated_rules: List[ViolatedRule] = []
    blocking_rule: Optional[ViolatedRule] = None
    warnings: List[str] = []
```

The `DefaultPolicyEngine.evaluate()`

iterates all rules in order. Block rules short-circuit. Warn and alert rules accumulate into the verdict. The verdict is returned to the FastAPI dependency, which raises `HTTP 403`

if `blocking_rule`

is set.

This is the part that makes everything composable. One line wires the entire governance stack into any endpoint:

```
# backend/app/api/benchmark_router.py
@router.get("/stream")
async def stream_benchmark(
    verdict: PolicyVerdict = Depends(enforce_governance_policy),
    db: AsyncSession = Depends(get_db),
):
    # If we reach here, the prompt passed all blocking rules.
    # verdict.warnings contains any non-blocking rule hits.
    ...
```

The dependency itself:

```
# backend/app/governance/policy/enforcement.py (simplified)
async def enforce_governance_policy(
    prompt: Annotated[str, Query(min_length=1)],
    provider: Annotated[str, Query(pattern="^(cloud|local)$")] = "cloud",
    db: AsyncSession = Depends(get_db),
) -> PolicyVerdict:
    engine = get_policy_engine()
    audit = _get_audit_service()

    scan = audit.scan_for_pii_details(prompt)

    context = GovernanceContext(
        prompt=prompt,
        provider=provider,
        model_id="gpt-4o" if provider == "cloud" else "llama3.2:latest",
        pii_detected=scan.detected,
        pii_entity_types=[e.entity_type for e in scan.entities],
        pii_max_confidence=scan.max_confidence,
        safety_score=audit.calculate_safety_score(prompt),
        estimated_prompt_cost_usd=(len(prompt.split()) * 0.00003)
        if provider == "cloud" else 0.0,
    )

    verdict = engine.evaluate(context)

    for violation in verdict.violated_rules:
        webhook_url = _get_webhook_url(engine, violation.rule_id)
        await _record_violation(db, violation, context, webhook_url)

    if not verdict.passed and verdict.blocking_rule:
        br = verdict.blocking_rule
        raise HTTPException(
            status_code=403,
            detail={
                "error": "governance_violation",
                "rule_id": br.rule_id,
                "rule_name": br.rule_name,
                "severity": br.severity,
                "message": br.message,
            },
        )

    return verdict
```

Every violation — blocked or not — is persisted to `policy_violations`

in PostgreSQL before the function returns. Webhook delivery is fire-and-forget via `asyncio.create_task()`

so it never adds latency to the response path.

When a rule fires with a `webhook_url`

, a CloudEvents-compatible payload is POSTed:

```
{
  "specversion": "1.0",
  "type": "com.governance.policy.violation",
  "source": "llm-governance-engine",
  "id": "uuid",
  "time": "2026-06-19T09:00:00Z",
  "data": {
    "rule_id": "pii-cloud-block",
    "rule_name": "Block PII from cloud models",
    "severity": "critical",
    "action": "block",
    "message": "PII detected (CREDIT_CARD, confidence=0.95) on cloud provider",
    "provider": "cloud",
    "model_id": "gpt-4o"
  }
}
```

Three delivery attempts with exponential backoff. Slack, Teams, and PagerDuty all accept this payload natively via their incoming webhook integrations.

```
git clone https://github.com/sochaty/llm-governance-engine
git checkout governance-post-1
cp .env.example .env
# Add your OPENAI_API_KEY (or any provider key)
docker compose up
```

Dashboard → `http://localhost:4200`

API docs → `http://localhost:8000/docs`

Pull a local model to enable the side-by-side comparison:

```
curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2:latest"}'
```

**Trigger your first governance block:**

Open the dashboard, type a prompt containing a fake SSN — `My SSN is 123-45-6789`

— select the Cloud provider and hit Run. You will get a red `Governance Violation`

banner instead of a response. The prompt never reached GPT-4o.

Open `http://localhost:8000/api/v1/policies/violations`

to see the audit record of the block.

Every inference — blocked or not — is stored in PostgreSQL:

| Field | Example |
|---|---|
`prompt` (preview) |
"My SSN is 123-45..." |
`provider` |
cloud |
`model_name` |
gpt-4o |
`pii_detected` |
true |
`safety_score` |
0.12 |
`latency_ms` |
0 (blocked before model) |
`estimated_cost` |
$0.0000 |
`version_tag` |
openai/gpt-4o |

The Audit Vault page in the dashboard is filterable by prompt, provider, and PII flag. Every row has a "Generate Report" button that exports a PDF — useful when a compliance officer asks for evidence.

The orchestrator supports five provider types with a single interface:

| Provider | How it connects |
|---|---|
| OpenAI |
`AsyncOpenAI` — native |
| Groq | `AsyncOpenAI(base_url="https://api.groq.com/openai/v1")` |
| Google Gemini | `AsyncOpenAI(base_url="https://generativelanguage.googleapis.com/v1beta/openai")` |
| Anthropic | Lazy `import anthropic` — separate streaming path |
| Ollama (local) | `AsyncOpenAI(base_url="http://ollama-service:11434/v1", api_key="ollama")` |

API keys are stored in PostgreSQL (Fernet-encrypted) and resolved live on every request via `settings_service.get()`

. Change a key in the Settings UI — no restart needed, effective on the next request.

The codebase is production-ready for single-tenant use. The roadmap from here:

`faithfulness_score`

populates in the audit log 2–3 seconds after the benchmark completes.The incident that started this — a real customer's credit card number sent to GPT-4o because I forgot to sanitise a test dataset — took about 30 seconds to happen and would have taken weeks to untangle from a compliance perspective.

The fix took a weekend. It should have existed before the first prompt was ever sent.

Full code: [github.com/sochaty/llm-governance-engine](https://github.com/sochaty/llm-governance-engine)

Reproduce this post exactly: `git checkout governance-post-1`

PRs and issues welcome. If you build a custom Presidio recogniser for your domain (medical records, legal documents, financial instruments), I would love to include it in the default policy templates.

All my writing lives at

[blogs.sourishchakraborty.com]— subscribe there for future posts.
