How we built a PII masking layer for LLM APIs — local detection, reversible tokens, one line to integrate

wpnews.pro

cd /news/large-language-models/how-we-built-a-pii-masking-layer-for… · home › topics › large-language-models › article

[ARTICLE · art-13763] src=dev.to ↗ pub=2026-05-25T17:23Z topic=large-language-models verified=true sentiment=· neutral

How we built a PII masking layer for LLM APIs — local detection, reversible tokens, one line to integrate

Armos has released a local PII detection and masking layer for LLM APIs that runs entirely on the user's server, with no data sent to third parties during detection. The open-source library uses Microsoft Presidio and spaCy to detect 10 entity types with up to 100% accuracy, replaces sensitive values with reversible tokens before they reach the model, and restores the original text in the response automatically. The integration requires changing only one line of code, replacing the standard OpenAI client with an Armos-wrapped version.

read4 min views14 publishedMay 25, 2026

If you're building LLM features on top of OpenAI or Anthropic, you're almost certainly sending raw user data to a third-party model provider. Names, emails, phone numbers, tax IDs, health records — whatever your users type, it goes straight to the API.

Here's the uncomfortable part: every attempt to fix this problem seems to make it worse. The most obvious fix — sending your text to a cloud anonymisation service first — means you're solving a data privacy problem by sending your sensitive data to another third party.

I was talking to a healthtech team recently that had been blocked from using GPT-4 for clinical notes for months. Not because the engineers didn't want to — they did. Legal wouldn't sign off because every API call meant patient data leaving their infrastructure. The problem wasn't capability. It was the missing privacy boundary between their data and the LLM.

Armos is that boundary. A local detection and masking layer that sits between your application and the LLM API — PII never leaves your server, and real values are restored in the response automatically.

This is how it works under the hood.

Option 1: Regex scrubbing

Fast to write, breaks constantly. Email regexes miss edge cases. Names are impossible. You end up with a pile of patterns that need constant maintenance and still let things through.

Option 2: Send everything to a cloud anonymisation API

Same problem, different server. You haven't kept the data in-house — you've just added a hop.

Option 3: Build it yourself with Presidio

Microsoft's Presidio is excellent — it's what powers Armos's detection. But it's detection only. You still need to build the masking layer, the vault, the de-masking logic, and wire it into your SDK calls. That's a week of work for a first pass and months of edge cases.

Three steps, all local:

1. Detect

Presidio + spaCy runs on the text before it leaves your process. No network call. No data sent anywhere during detection.

2. Mask with reversible tokens

Detected entities are replaced with deterministic tokens:

"Patient John Smith, Aadhaar 2345 6789 0123"
→
"Patient [PII:NAME:c4587843], Aadhaar [PII:AADHAAR:473adcf3]"

The token format encodes the entity type and a hash of the original value. Same value always maps to the same token — so if "John Smith" appears twice, it gets the same token both times, and the LLM can reason about it consistently.

3. Restore

After the LLM responds, the library scans the output for tokens and swaps them back. Your application receives the original text. The model never saw the real values.

Tokens need to map back to real values. The library keeps a vault — a simple key-value store — inside the process by default, with an optional Redis backend for cross-process persistence.

client = ArmosOpenAI(OpenAI())

client = ArmosOpenAI(OpenAI(), store="redis", redis_url="redis://...")

The vault never leaves your infrastructure. Armos has no server. There's no telemetry, no cloud component.

This is the entire change to existing code:

from openai import OpenAI
client = OpenAI()

from openai import OpenAI
from armos import ArmosOpenAI
client = ArmosOpenAI(OpenAI())

Everything downstream works identically — same method signatures, same response objects. The masking and de-masking happen invisibly inside the privacy layer.

10 entity types out of the box:

I ran a 1,000-sample benchmark across all entity types:

Entity	Accuracy
100%
Aadhaar	100%
PAN	100%
SSN	100%
IBAN	100%
Credit card	100%
Phone	100%
API keys	100%
IP address	99.8%
Person name	96.4%

The 3.6% miss rate on names is entirely Indian names — en_core_web_lg

was trained predominantly on Western text. I'm working on a supplemental approach for this.

stream=True

currently passes through unmasked)AsyncOpenAI

, AsyncAnthropic

)The library is early and I'm actively looking for teams using LLMs on sensitive data who want to trial it and shape where it goes.

GitHub: github.com/armos-ai/armos-python

Docs: armos.dev

pip install armos

If you're hitting this problem or have thoughts on the approach, I'd love to hear from you in the comments.

source & further reading

dev.to — original article I Did the Math on GPT-5.6. The $2.50 Terra Tier Is the One I'd Ship First. Monetize any MCP server in 10 minutes — no billing code required AMD Had Zero Agent Skills. I Built the First 10.

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-we-built-a-pii-maski…

Read original on dev.to → dev.to/dhroov7/how-we-built-a-pii-masking-layer-…

mentioned entities

OpenAI

Anthropic

Armos

GPT-4

metadata

slughow-we-built-a-pii-masking-layer-for-llm-apis-local-detection-reversible-tokens

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevLearning Abstractions: A Convers…

next →What is an encyclical? Inside Po…

── more in #large-language-models 4 stories · sorted by recency

tensorsharp.ai · 10 Jul · #large-language-models

Show HN: TensorSharp: Open-Source Local LLM Inference Engine

dev.to · 10 Jul · #large-language-models

GPT-5.6 Arrived. Fable 5 Became Metered. The Missing Product Is Cost Control

dev.to · 10 Jul · #large-language-models

I Did the Math on GPT-5.6. The $2.50 Terra Tier Is the One I'd Ship First.

dissenter.com · 10 Jul · #large-language-models

OpenAI Unleashes ChatGPT Work: AI Replaces You, Elites Profit

── more on @openai 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 8 Jul · #artificial-intelligence

Anthropic's "J-lens" reveals workspace in Claude mirrors theory of consciousness

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required