AI Gateway & Governance Platform
Open-source LLM proxy written in Go. Add quota enforcement, PII blocking, cost tracking, and compliance to any LLM in one line of code.
Quick Start · Integration Guide · Features · Architecture · Gateway Docs · Admin API · Discussions
ToTra is an open-source AI gateway and governance platform that sits in front of any LLM provider.
Point your existing apps at ToTra instead of OpenAI, Anthropic, or any other provider — and instantly get:
Quota enforcement— per-user and per-team hard budget caps** PII blocking**— 18 language groups scanned at the edge before any data leaves your network** Cost tracking**— per-user, per-team, per-model token and USD spend with chargeback reports** Compliance**— GDPR workflows, EU AI Act checklist, hash-chained immutable audit log** Zero code changes**— 100% OpenAI-compatible; swap one line in your config
flowchart LR
A["🖥️ Your App\n(OpenAI SDK / curl\n/ LangChain)"] -->|"1 · API request"| B
subgraph B["ToTra Gateway :8080"]
direction TB
B1["🔑 Auth & API Key"]
B2["📊 Quota Check\n(per user / team)"]
B3["🔒 PII Scan\n(18 languages)"]
B4["⚡ Semantic Cache"]
B5["🔀 Route & Load Balance"]
B1 --> B2 --> B3 --> B4 --> B5
end
B -->|"2 · forward request"| C["☁️ LLM Providers\nOpenAI · Anthropic\nGemini · Mistral · Azure\nBedrock · Ollama"]
C -->|"3 · response"| A
B -->|"4 · usage events"| D
subgraph D["ToTra Admin :8081"]
direction TB
D1["💸 Cost Tracking"]
D2["📋 Compliance & Audit"]
D3["🔔 Budget Alerts"]
end
D --> E["📊 Dashboard :3000\nAdmin Console · Reports\nEmployee Self-Service"]
- 🚀
Written in Go— < 2 ms p95 overhead. Native binary, no Python runtime, no warm-up. - 🔒
PII blocked at the edge— email, IDs, credit cards, health records across 18 language groups. Sensitive data is redacted before it ever reaches an LLM. - 💸
Hard budget caps— requests over limit get
429
before touching any provider. Real-time Slack / webhook alerts. - 📋 Compliance out of the box— GDPR data-subject workflows, EU AI Act checklist, and an immutable hash-chained audit log on every request. - 📊 Finance-ready reporting— department chargeback CSV, budget forecasts, spend anomaly detection. - 🏠 Self-hosted— your keys, your infrastructure, your data. No external dependency.
Prerequisites: Docker + Docker Compose
git clone https://github.com/SugaC-275/ToTra.git
cd ToTra
cp .env.example .env # fill in your provider API keys
docker-compose --profile app up -d --wait
Open ** http://localhost:3000** and sign in:
| Field | Value |
|---|---|
admin@acme.com |
|
| Password | totra123 |
Change default credentials immediately after first login via
Settings → Security.
One line change. Every other line of code stays the same.
Python (OpenAI SDK)
import openai
client = openai.OpenAI(api_key="sk-...")
client = openai.OpenAI(
api_key="your-totra-api-key", # issued from the ToTra admin panel
base_url="http://your-totra-host:8080/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Node.js / TypeScript (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your-totra-api-key",
baseURL: "http://your-totra-host:8080/v1",
});
const response = await client.chat.completions.create({
model="gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
curl
curl http://your-totra-host:8080/v1/chat/completions \
-H "Authorization: Bearer your-totra-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
openai_api_key="your-totra-api-key",
openai_api_base="http://your-totra-host:8080/v1",
)
response = llm.invoke("Hello!")
print(response.content)
Once connected, every request is automatically routed through quota enforcement, PII scanning, semantic caching, and cost tracking.
🔒 PII Protection — 18 Language Groups
Every request body is scanned in real time before it reaches any LLM. Detected PII is redacted and the event is logged. Blocked requests return 422
.
| Language Group | Detected Types |
|---|---|
| Universal | Email, credit card, IBAN, SWIFT/BIC, ICD medical codes |
| Chinese | National ID, phone, bank account, unified credit code, securities account |
| English | US SSN, phone, NI number, passport, driver's license, medical record number |
| Japanese | My Number (個人番号), phone, postal code, health insurance number |
| Korean | RRN (주민등록번호), phone, passport, business registration number |
| EU (14 countries) | National IDs, tax numbers, social security — DE/FR/ES/IT/NL/PL/SE/PT/BE/CH/DK/FI/NO/AT |
| Arabic (GCC + MENA) | National ID, Iqama, Emirates ID, QID, CIN, NIN, phone |
Configure rules per team, per model, or globally in the admin panel.
💸 Cost & Spend Management
- Per-user, per-team, per-model token and USD cost tracking
Hard budget caps— requests over limit get
429
before touching the provider- Configurable alert thresholds with Slack / Feishu / webhook notifications
- Monthly budget forecasts based on current burn rate Department chargeback reports with CSV export for finance- Procurement analytics and ROI dashboards
- Spend anomaly detection with automatic alerts
Dashboard → Cost → Reports → Export CSV
📋 Compliance & Audit
GDPR— data-subject export and deletion request workflows, configurable retention policies** EU AI Act**— compliance checklist with per-model status tracking** Immutable audit chain**— every request is hash-chained; the log cannot be tampered with** SIEM integration**— configurable webhook targets for security event forwarding- Data residency controls — keep all data on-premises or in a specific region
⚡ Gateway & Routing
OpenAI-compatible— drop-in replacement for the OpenAI API (/v1/chat/completions
,/v1/embeddings
, streaming)Anthropic-compatible— native Anthropic messages API support- Multi-provider routing — automatic fallback across providers and models Semantic cache— SimHash LSH deduplication; repeated prompts skip the LLM entirely- Prompt compression — reduce token spend on long context
- Streaming proxy — full
text/event-stream
support File pipeline— upload PDF / DOCX / PPTX → parse → chat in one API call- Rate limiting, IP allowlist, API-key authentication
🔐 Administration
- JWT authentication + OIDC / SSO integration
- Role-based access control (admin / employee)
- User and team management with quota request / approval workflow
- Model catalogue — enable, disable, and configure providers per team
- Bot notifications — Slack, Feishu, custom webhooks
- HR sync connector (CSV import) Agent session tracking— detects and terminates dead-loop agent sessions automatically
| Provider | Chat | Embeddings | Streaming | Files |
|---|---|---|---|---|
| OpenAI (GPT-4o, o1, o3, o4) | ✅ | ✅ | ✅ | ✅ |
| Anthropic (Claude 3.5, 4) | ✅ | — | ✅ | ✅ |
| Google Gemini | ✅ | ✅ | ✅ | — |
| Mistral AI | ✅ | ✅ | ✅ | — |
| Meta Llama (via Ollama) | ✅ | ✅ | ✅ | — |
| Cohere Command | ✅ | ✅ | ✅ | — |
| Azure OpenAI | ✅ | ✅ | ✅ | ✅ |
| AWS Bedrock | ✅ | ✅ | ✅ | — |
| Local / Ollama | ✅ | ✅ | ✅ | — |
| Any OpenAI-compatible endpoint | ✅ | ✅ | ✅ | — |
ToTra is written entirely in Go. The gateway adds < 2 ms overhead at p95 under production load.
| Concurrency | p50 | p95 | p99 |
|---|---|---|---|
| 10 VUs | < 1 ms | 2 ms | 4 ms |
| 50 VUs | 1 ms | 3 ms | 8 ms |
| 200 VUs | 2 ms | 6 ms | 15 ms |
Measured against a 100 ms mock upstream.
[Reproduce the benchmark →]
Your Apps (OpenAI SDK / curl / LangChain / any HTTP client)
│
▼
ToTra Gateway :8080
auth · quota · PII scan · policy · semantic cache · routing
│
▼
OpenAI · Anthropic · Gemini · Mistral · Local Models
│
│ (usage events)
▼
ToTra Admin :8081
cost · compliance · budgets · audit trail · notifications
│
▼
Dashboard :3000
admin console · department reports · employee self-service
| Service | Stack | Port |
|---|---|---|
gateway |
||
| Go 1.25 / Fiber | 8080 | |
admin |
||
| Go 1.25 / Fiber | 8081 | |
parser |
||
| Python 3.12 / FastAPI | 8090 | |
dashboard |
||
| React 19 / Vite | 3000 | |
postgres |
||
| PostgreSQL 16 | 5432 | |
redis |
||
| Redis 7 | 6379 |
| Cost Dashboard | Department Reports |
|---|
| User Management | Employee Self-Service |
|---|
docker-compose up -d postgres redis
cd gateway && go run .
cd admin && go run .
cd parser && uvicorn main:app --port 8090
cd dashboard && npm install && npm run dev
cd scripts/set-dev-passwords
POSTGRES_HOST=localhost POSTGRES_DB=totra \
POSTGRES_USER=totra POSTGRES_PASSWORD=totra_secret go run .
Default dev credentials: admin@acme.com
/ totra123
Copy .env.example
to .env
. Key variables:
| Variable | Description |
|---|---|
POSTGRES_HOST/PORT/DB/USER/PASSWORD |
|
| PostgreSQL connection | |
JWT_SECRET |
|
| Shared secret for JWT signing | |
ENCRYPTION_KEY |
|
| 32-byte hex key — admin credential store | |
GATEWAY_ENCRYPTION_KEY |
|
| 32-byte hex key — gateway credential store | |
OPENAI_API_KEY |
|
| Your OpenAI key (set per provider) | |
ANTHROPIC_API_KEY |
|
| Your Anthropic key |
See .env.example for the full list including Redis, SMTP, and notification settings.
make test
cd gateway && go test ./...
cd admin && go test ./...
cd dashboard && npm run test:run
cd parser && pytest
We welcome contributions — bug fixes, new provider integrations, docs improvements, and feature requests.
git clone https://github.com/SugaC-275/ToTra.git
cd ToTra
make test
-
Fork the repo and create a branch from
main -
Make your change and add tests where relevant
-
Ensure
make test
passes - Open a pull request
For larger features, open a Discussion first to align on direction.
- 💬 GitHub Discussions— questions, ideas, show & tell - 🐛 GitHub Issues— bug reports
MIT — free to use, self-host, fork, and modify.