GreyFox – Free self-hosted AI proxy, token quotas, and local cache

wpnews.pro

GreyFox Community Edition is a self-hosted AI traffic proxy and local operator console for teams that want to control LLM token usage, enforce per-user limits, reuse exact cached responses, and keep AI traffic visibility inside their own infrastructure.

GreyFox runs as a local Docker box. No GreyFox-hosted control plane is required.

OpenAI-compatible proxy endpoint at /v1/chat/completions
Local Admin UI served from the same container
Per-user token quota enforcement with X-App-User-Id
Mock mode for zero-cost onboarding and demos
Provider mode for OpenAI-compatible upstream APIs
Exact response cache for repeated non-streaming requests
Local SQLite storage for settings, users, logs, cache, and metrics
Traffic history, token analytics, manual cost calculator, and safe maintenance tools
Up to 5 active managed users
Token monitoring is the authoritative usage signal
Cost estimates are manual and informational only
No hosted GreyFox cloud control plane
No automatic update checks or automatic container updates
No request detail drawer, exports, deeper diagnostics, or live traffic metrics
Docker Desktop or Docker Engine with Docker Compose
One available host port, default 8080
A Provider API key only if you want to use live provider mode

You do not need Node.js, npm, Angular, Nx, or source code to run the Community Edition release.

Create a compose.yaml

file:

services:
  greyfox:
    image: ghcr.io/skillful-fox-studio/grey-fox-community:0.1.0
    container_name: greyfox-community
    environment:
      OPENAI_BASE_URL: ${OPENAI_BASE_URL:-https://api.openai.com/v1}
      GREYFOX_DB_PATH: ${GREYFOX_DB_PATH:-data/greyfox.db}
      PORT: 3000
      GREYFOX_STATIC_ROOT: /app/public/admin-ui
    ports:
      - "${GREYFOX_HTTP_PORT:-8080}:3000"
    volumes:
      - greyfox-data:/app/data
    restart: unless-stopped

volumes:
  greyfox-data:

Start GreyFox:

docker compose up -d

Open the Admin UI:

http://localhost:8080

Health check:

curl http://localhost:8080/api/health

Expected response:

{"status":"ok","service":"proxy-api"}

GreyFox is a proxy layer. It does not install browser extensions, intercept your personal ChatGPT usage, or automatically capture traffic from unrelated applications. Your AI application must send its provider requests to GreyFox instead of sending them directly to the upstream provider.

Typical direct setup:

Your application
      |
      | HTTPS request with provider API key
      v
OpenAI-compatible provider

GreyFox setup:

Your application
      |
      | OpenAI-compatible request
      | Base URL: http://<greyfox-host>:<port>/v1
      | Header: X-App-User-Id: <your-end-user-id>
      v
GreyFox Community Edition
      |
      | Local checks:
      | - user token quota
      | - exact response cache
      | - prompt injection guard
      | - traffic logging
      v
OpenAI-compatible provider

The application still decides when to call AI. GreyFox only sees requests that are explicitly routed through its proxy endpoint.

In your application configuration:

Change the AI provider base URL to GreyFox:

http://localhost:8080/v1

If GreyFox runs on another server, use that host instead:

http://greyfox.internal:8080/v1

Keep using the OpenAI-compatible chat completions path:

/chat/completions

Full URL:

http://localhost:8080/v1/chat/completions

Add the end-user identifier header to every AI request:

X-App-User-Id: user-123

This should be your application's own user id, tenant user id, account id, or another stable identifier that lets GreyFox enforce limits per real end user.

Configure Provider Settings in the GreyFox Admin UI:

use Mock mode

for first validation; - switch to OpenAI-compatible provider

when you are ready to forward real traffic; - enter your provider API key in the Admin UI.

use

Send a test request and verify it appears in Dashboard and Traffic.

Use this for local evaluation:

App or curl -> http://localhost:8080/v1/chat/completions -> GreyFox -> Provider

If your application also runs in Docker Compose, put both services on the same Compose network and call GreyFox by service name:

http://greyfox:3000/v1/chat/completions

Inside Docker, use the container port 3000

. From the host machine, use the published port, usually 8080

.

For a team environment, run GreyFox on an internal host and point your application to it:

http://greyfox.internal:8080/v1/chat/completions

Keep the Admin UI and proxy endpoint reachable only inside your trusted network unless you intentionally place your own authentication, VPN, or gateway in front of it.

Most OpenAI-compatible SDKs let you override the base URL.

Conceptually, change this:

baseURL = "https://api.openai.com/v1"

to this:

baseURL = "http://localhost:8080/v1"

Then include:

X-App-User-Id: user-123

The exact SDK option name depends on your application stack. Look for settings such as baseURL

, baseUrl

, apiBase

, base_url

, or OPENAI_BASE_URL

.

GreyFox uses one stable internal container port:

The host port is controlled by Docker port mapping. To run GreyFox on another host port:

GREYFOX_HTTP_PORT=9090 docker compose up -d

Then open:

http://localhost:9090

Open the Admin UI and go to Provider Settings.

Use:

Mock mode

for zero-cost local demos and onboardingOpenAI-compatible provider

for live traffic forwarding

GreyFox expects an OpenAI-compatible upstream API in live provider mode. Other compatible providers such as OpenRouter, Groq, Together, DeepSeek, Mistral, Ollama, or LocalAI may connect successfully, but provider billing remains the source of truth for final accounting.

GreyFox stores provider settings locally in the container database volume. Saved provider keys are not shown again in full inside the UI.

After enabling Mock mode in the Admin UI, send a test request:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-App-User-Id: demo-user-1" \
  -d "{\"model\":\"gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"Reply with GreyFox OK\"}]}"

Refresh the Admin UI to see the request in Traffic and Dashboard.

GreyFox does not auto-update.

To check releases manually, use About -> Check for updates

in the Admin UI or visit the public release page.

To update the Docker image:

docker compose pull
docker compose up -d

Your local SQLite data is stored in the greyfox-data

Docker volume and is not removed by a normal image update.

GreyFox Community Edition is designed to run inside your own infrastructure.

Prompts, completions, logs, settings, provider keys, and metrics stay in your local deployment unless you send them elsewhere.
Manual update checks make one browser request to GitHub Releases.
GreyFox does not require a hosted GreyFox control plane.
Connected upstream providers still process any traffic you send to them.

Public issues and Community releases:

https://github.com/skillful-fox-studio/grey-fox-community

Direct operator inquiries:

support@skilful-fox.com

GreyFox is currently maintained by a solo indie developer. Email replies may take up to 3 days.

GreyFox Community Edition is proprietary commercial software made available as a free-to-use Community Edition. It is not open-source software.

See LICENSE.md

and THIRD_PARTY_NOTICES.md

.

source & further reading

github.com — original article

GreyFox – Free self-hosted AI proxy, token quotas, and local cache

Run your AI side-project on zahid.host