If you work in a regulated organisation, you have probably seen this play out: leadership wants AI in production, security wants an audit trail, and the team in the middle has two options. Either ship something fast with no governance β shadow tools, no DLP, no audit log β or wait twelve to eighteen months for an enterprise platform to get procured and approved. Neither is good.
Most of the tools available to bridge that gap fall into one of three camps:
I have been working on a small Apache-2.0 project called Synapse AI Gateway that aims at the space between those options. docker compose up
brings the whole stack β postgres, backend, admin console β and you have it running in under five minutes. Governance controls run on every inference request before they ever reach a model.
GitHub: synapse-ai-gateway/synapse-ai-gateway
The design hinges on one decision: every API key is bound at creation to a system prompt, a model allowlist, a team identity, and rate limits. The team that gets a key for an approved HR-assistant use case cannot quietly repurpose that key for something else. They need a new key, which means a new approval.
That is the difference between governance-as-policy (a wiki page nobody reads) and governance-as-infrastructure (the gateway refuses the request). Policies do not enforce themselves. Controls in the request path do.
client app
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β 1. auth + use-case scoping β β inject system prompt, check model allowlist
βββββββββββββββββββββββββββββββββββββββββββ€
β 2. prompt DLP β β block / redact / alert
βββββββββββββββββββββββββββββββββββββββββββ€
β 3. hybrid routing (on-prem vs cloud) β β classification decides backend
βββββββββββββββββββββββββββββββββββββββββββ€
β 4. immutable audit log β β PostgreSQL append-only, SHA-256 hashes
βββββββββββββββββββββββββββββββββββββββββββ€
β 5. response DLP + anomaly detection β β webhook alerts
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
LLM backend (Ollama, vLLM, OpenAI, Anthropic, Azure, Google)
Layer 1 validates the key, injects the bound system prompt, checks the model allowlist. Invalid key or unapproved model returns 403 immediately.
Layer 2 is a built-in regex DLP engine. Three outcomes per category: block
(HTTP 400), redact
(sanitise and forward), alert
(log and forward). Patterns live in a config file you can hot-reload. No external service required β this matters if your data sovereignty rules say PII cannot leave your perimeter even for a scan.
Layer 3 routes by data classification. A key tagged sensitive
is allowed only to on-premises backends (Ollama, vLLM). A key tagged non_sensitive
can go to a cloud provider for higher capability. Consuming applications do not change β they always speak the OpenAI API.
Layer 4 writes one row per request to PostgreSQL: timestamp, team, model, token count, latency, DLP outcome, HTTP status. Prompt and response are stored as SHA-256 hashes, never plaintext. That preserves forensic hash-matching while protecting staff privacy.
Layer 5 scans responses on the way back out and surfaces anomalies (usage spikes, repeated DLP blocks, off-hours bursts) via webhook.
git clone https://github.com/synapse-ai-gateway/synapse-ai-gateway
cd synapse-ai-gateway
docker compose up -d
Every setting has a working default, so that genuinely is the whole quick start for a local trial. Before exposing the stack beyond localhost, copy .env.example
to .env
and set real values for JWT_SECRET
, ADMIN_PASSWORD
, and POSTGRES_PASSWORD
.
Admin console at http://localhost:5173
. Log in, create a team in the UI, copy the API key (shown once), and you're ready:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer <YOUR_TEAM_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:latest",
"messages": [{"role": "user", "content": "hello"}]
}'
It is OpenAI-API-compatible, so any OpenAI SDK works. Point base_url
at http://localhost:8080/v1
and pass the team key. The backend is fully transparent to the client.
LiteLLM is excellent. It is the right tool for a different problem: routing across 100+ providers with maximum flexibility, at scale, with a team that already has DevOps capacity. Its per-worker footprint is correspondingly larger β appropriate for the high-throughput case it is built for.
Synapse AI Gateway's full stack runs at ~113 MB at idle (backend 73 MB + postgres 32 MB + frontend 8 MB). The whole stack is three containers β postgres, backend, admin console β brought up by a single docker compose up
. No Redis, no message broker, no Kubernetes required to get started. If you are just starting out and need a governance layer you can deploy in an afternoon, that footprint matters. If you are running millions of requests per day, it does not, and LiteLLM is the better choice.
The other meaningful difference is DLP. LiteLLM's DLP integrates with PromptGuard, Pangea, or Azure Content Safety β external services with their own pricing, accounts, and data flows. Synapse's DLP is built in. For an organisation whose data residency rules say PII does not leave the perimeter, "built in" is not a feature preference β it is a hard requirement.
Worth saying clearly:
Specific facts, all verified in the repo:
/v1/chat/completions
The README.md
has a deployment checklist for production: rotate every default secret, terminate TLS at a reverse proxy, use managed PostgreSQL, restrict CORS, review the DLP patterns for your jurisdiction.
The repo is Apache-2.0. There is a CONTRIBUTING.md
with DCO sign-off and a list of good first issues β DLP patterns for additional jurisdictions, additional backend adapters, a Helm chart. If there is a use case the current design does not cover, open an issue.
GitHub: synapse-ai-gateway/synapse-ai-gateway
If your organisation is staring at the gap between "ship AI now with no controls" and "wait two years for the enterprise platform," this is meant for you.