cd /news/ai-safety/contextwall-context-firewall-for-ai-… · home topics ai-safety article
[ARTICLE · art-19667] src=contextwall.io pub= topic=ai-safety verified=true sentiment=↓ negative

ContextWall – Context firewall for AI agents and RAG pipelines

ContextWall launched a context-layer firewall that intercepts and screens content before it reaches an AI agent's context window, blocking prompt injection and credential leaks. The product addresses vulnerabilities like EchoLeak and PoisonedRAG, where attackers exploited the lack of trust boundaries in systems like Microsoft 365 Copilot and RAG pipelines. ContextWall enforces security policies per source and team without requiring code changes to agents, running in the customer's own infrastructure.

read10 min publishedJun 2, 2026

untrusted content.

Every web result, document, and API response your agent retrieves goes straight into the model's context window - unscreened. ContextWall intercepts it first, blocks prompt injection and credential leaks, and enforces your security policy before the LLM ever sees it.

No code changes to your agents · Runs in your infrastructure · No LLM in the screening path

Your agent trusts everything it reads #

LLMs have no built-in concept of source trust. Content retrieved from a web search and content from your system prompt look identical once they are both inside the context window. Attackers exploit this directly.

CVE-2025-32711

EchoLeak

Microsoft 365 Copilot

An attacker sends a crafted email. Copilot reads it, interprets embedded instructions as commands, silently accesses internal SharePoint files, and sends them to the attacker. The user never clicks anything.

Copilot had no way to distinguish a trusted system instruction from untrusted email content. Both looked the same inside the context window.

USENIX Security 2025

PoisonedRAG

RAG pipelines

Researchers planted five adversarial documents into a knowledge base of millions. When users asked questions, the model retrieved and repeated the false content as confident fact, with no jailbreak, no system prompt change, and no model access needed.

The RAG pipeline retrieved documents by relevance score and passed them straight to the model. There was no check on where the document came from or whether it should be trusted.

Both attacks exploited the same gap: no trust boundary at the context layer. ContextWall fixes this by tagging every context source with a trust tier and applying your policy rules before content reaches the model.

Who it's for #

ContextWall is built for teams shipping AI into production who need security guarantees - not just guidelines.

AI & Agent Engineers

You're shipping RAG pipelines and agentic systems that pull from the web, internal docs, and third-party APIs. Every retrieved document is a potential attack vector - and your agent has no way to tell a legitimate source from a poisoned one.

How ContextWall helps

  • One pip install or Docker image - no changes to your agent code
  • Screens every document before it enters the prompt
  • Blocks injections and credential leaks before the LLM sees them

Security Teams

AI systems bypass your existing perimeter controls. Agents make outbound calls, ingest untrusted content, and operate with broad permissions - all outside your traditional detection stack.

How ContextWall helps

  • Enforceable policy rules per source, team, and repo
  • Real-time enforcement feed and tamper-evident audit log
  • Fleet-wide visibility across all deployed agents

Compliance & Legal

HIPAA, SOC 2, and FedRAMP auditors are asking how PHI can't leak through an AI agent's context window. You need evidence - not assurances.

How ContextWall helps

  • Every enforcement decision mapped to a compliance control ID
  • Cryptographically signed audit exports on demand
  • Documented data residency: context never leaves your infrastructure

What ContextWall stops - and what it doesn't #

Detection at the context layer. No LLM in the screening path. We're honest about the scope.

Detected & blocked

  • Direct instruction overrideL1 + L2 "IGNORE ALL PREVIOUS INSTRUCTIONS…"

  • Bidi & zero-width obfuscationL1 RTL override chars hidden in retrieved text

  • Spaced-letter injectionL1 "i g n o r e p r e v i o u s"

  • Semantic paraphrase injectionL3 "Your assignment has been superseded…"

  • Credential leakageL2 AWS keys, GitHub PATs, bearer tokens

  • PII exfiltration via contextL2 Emails, SSNs in untrusted-tier documents

Out of scope

Model hallucinations

ContextWall filters what enters the context window - it cannot control what the model generates from clean inputs.

System prompt mistakes

If your system prompt grants excessive permissions, ContextWall cannot override that design decision.

Training-time poisoning

Attacks on model weights or fine-tuning data happen before inference. ContextWall operates at inference time only.

Novel zero-day patterns

L3 heuristics catch known semantic paraphrases. A sufficiently novel attack may score below the block threshold - you set that threshold.

Authorized access you've allowed

If your policy permits a source and the model uses that data, ContextWall enforces your policy - not a stricter one.

Honest scope beats false assurances. Defense in depth means ContextWall works alongside your model provider's safety filters, not instead of them.

How it works #

ContextWall intercepts every document before it enters the context window. Here's exactly what happens.

Your agent requests a document

A web search result, internal doc, API response, or user upload - any external content.

ContextWall intercepts it

The daemon receives the document before it enters the context window. The LLM hasn't seen anything yet.

Three detection layers run in sequence

Policy decision

BLOCKED

400 returned to your agent. Document never reaches the LLM. Event written to the tamper-evident audit log.

ALLOWED

Document forwarded to the LLM API. Clean context enters the prompt as normal.

No LLM in the screening path. No external calls. Your data stays on your host.

Source trust tiers

You declare what each context source is. ContextWall applies the right level of scrutiny automatically based on that tier.

Internal

Your code repos, internal wikis

External

Vendor docs, partner APIs

Untrusted

Public web, user-submitted input

Regulated

FHIR APIs, PHI data sources

Three detection layers

Applied in order from cheapest to most thorough. No external calls, no LLM inference.

Layer 1: Structural

CheapestScans raw bytes for known obfuscation tricks: bidirectional control characters, zero-width characters, and spaced-letter keywords ("i g n o r e a l l"). These are invisible to the human eye but readable by the model.

Layer 2: Pattern matching

FastRuns regex patterns against normalized text. Catches injection syntax, exposed API keys (AWS, GitHub, Anthropic), bearer tokens, and PII like emails, phone numbers, and SSNs.

Layer 3: Heuristic scoring

Most thoroughScores each message for instruction-like intent, even when the wording avoids obvious keywords. Catches paraphrases like "your previous assignment has been superseded" that bypass regex entirely.

Your context stays yours #

ContextWall runs as a daemon inside your own infrastructure. Prompts, documents, and file contents are screened locally and never transmitted anywhere. The cloud control plane receives only counts and scores, never content.

All screening happens here. Nothing exits.

What crosses the boundary

Never transmitted

Sees counts and scores only. Never content.

  • Prompt content and user messages

  • Retrieved documents and file contents

  • Source URLs and file paths

  • Model responses and completions

  • Personally identifiable information

  • Protected health information (PHI)

  • Request counts (blocked / allowed)

  • Violation types detected (e.g. "pii")

  • Average latency in milliseconds

  • Active session count

  • Policy version acknowledgement

Prefer fully offline? Leave control_plane.url

empty and ContextWall runs entirely local with no cloud dependency.

Integrate in minutes #

The daemon installs with pip and proxies your AI SDK calls locally. The cloud dashboard is optional and sees only aggregated metadata, never content.

pip install contextwall
ctxfw start --config ctxfw.yaml

sources:
  - id: web-search
    type: web
    trust_tier: untrusted

  - id: internal-docs
    type: confluence
    trust_tier: internal

export ANTHROPIC_BASE_URL=http://localhost:8080/proxy/anthropic
export ANTHROPIC_API_KEY=sk-ant-your-key   # unchanged

Works with the Anthropic and OpenAI SDKs in any language. The daemon proxies requests, screens context, then forwards clean content to the real API without ever storing or transmitting your prompts.

Security policy as config #

Everything is declared in YAML. Sources, rules, and thresholds all live in a file you commit to your repo, review in a pull request, and deploy alongside your other infrastructure config.

  • Sources declared in config at startup, no API calls or setup scripts
  • Four-layer policy: fleet-wide rules down to individual repo overrides
  • Starter policy templates for HIPAA, SOC2, and FedRAMP included
  • Rules reload within 5 seconds of a file change, no restart needed
  • Every rule can map to a compliance control ID for audit evidence
sources:
  - id: brave-web-search
    type: web
    trust_tier: untrusted

  - id: internal-confluence
    type: confluence
    trust_tier: internal
    data_classification: sensitive

  - id: fhir-api
    type: api
    trust_tier: regulated
    data_classification: phi
    owner: clinical-data-team

Compliance coverage #

ContextWall is designed so that compliance is a property of the architecture, not a checklist you complete afterwards. PHI, PII, and sensitive data are handled locally before they can ever be exposed.

HIPAA

PHI never leaves your network

  • Protected health information is screened locally. It never transits a third-party server.
  • Regulated source tier enforces that PHI can only flow between approved internal systems
  • Every enforcement decision is logged with a timestamp, source ID, and outcome for auditor review
  • Violation events are logged with timestamps, source IDs, and policy decisions for auditor review

SOC 2

Audit trail your reviewers can verify

  • Every context screening event is logged with source ID, trust tier, decision, and timestamp
  • Provenance chain is cryptographically linked; records cannot be altered without detection
  • Role-based access to the fleet dashboard; no raw context is stored or accessible anywhere
  • Policy rules are version-controlled; changes leave a full audit trail

GDPR

Personal data stays in your jurisdiction by design

  • PII (email addresses, phone numbers, names) is detected and redacted before reaching the model
  • The daemon processes all data inside your own infrastructure, with no cross-border data transfer
  • Control plane receives only aggregated counts, not personal data
  • Supports data minimisation by design: the model sees the least data necessary to complete the task

Offline deployments

Fully air-gapped. No external dependencies.

  • Daemon runs entirely within your air-gapped or VPC environment with no external dependencies
  • Leave control_plane.url empty for a fully offline deployment. No telemetry leaves the host.
  • Policy and configuration live in files you control; nothing is stored in the cloud
  • Enforcement continues locally even when the control plane is unreachable

Pricing #

All enforcement runs in your infrastructure - open source, free forever. The cloud adds fleet visibility and compliance tooling on top.

Free

Forever. Self-host from GitHub.

  • Three-layer detection engine (structural, regex, heuristic)
  • Source trust tiers (internal / external / untrusted / regulated)
  • Policy-as-code in YAML, hot-reloaded in 5 s
  • Tamper-evident Merkle audit log
  • Python SDK - SafeAnthropic, SafeOpenAI
  • Pre-built policy packs for HIPAA, SOC2, FedRAMP
  • Prometheus metrics + live WebSocket enforcement feed
  • Runs entirely in your infrastructure. No external calls.

Self-host on GitHub

Free

No credit card. Sign in to get started.

Everything in Open Source, plus:

  • Fleet dashboard - all daemons, status, block rates
  • Policy authoring UI - no YAML editing required
  • Compliance report generation (SOC2, HIPAA, FedRAMP exports)
  • One-click audit trail exports, cryptographically signed
  • Email support

Enforcement stays local whether or not the cloud is reachable. Prompts and documents never leave your infrastructure.

Get early access

The daemon, policy engine, and SDK are Apache 2.0 - free to use, modify, and embed in any product. The cloud dashboard is proprietary.

Get early access #

The cloud control plane is live. Sign in to connect your first daemon, set policies, and see real-time enforcement - free during early access.

Launch the app

No credit card required · Free during early access · Cancel any time

── more in #ai-safety 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/contextwall-context-…] indexed:0 read:10min 2026-06-02 ·