{"slug": "greyfox-free-self-hosted-ai-proxy-token-quotas-and-local-cache", "title": "GreyFox – Free self-hosted AI proxy, token quotas, and local cache", "summary": "GreyFox Community Edition, a free self-hosted AI traffic proxy, was released by Skillful Fox Studio. The Docker-based tool allows teams to control LLM token usage, enforce per-user quotas, cache responses, and monitor AI traffic locally without a cloud control plane. It supports up to five managed users and provides an OpenAI-compatible endpoint.", "body_md": "GreyFox Community Edition is a self-hosted AI traffic proxy and local operator console for teams that want to control LLM token usage, enforce per-user limits, reuse exact cached responses, and keep AI traffic visibility inside their own infrastructure.\n\nGreyFox runs as a local Docker box. No GreyFox-hosted control plane is required.\n\n- OpenAI-compatible proxy endpoint at\n`/v1/chat/completions`\n\n- Local Admin UI served from the same container\n- Per-user token quota enforcement with\n`X-App-User-Id`\n\n- Mock mode for zero-cost onboarding and demos\n- Provider mode for OpenAI-compatible upstream APIs\n- Exact response cache for repeated non-streaming requests\n- Local SQLite storage for settings, users, logs, cache, and metrics\n- Traffic history, token analytics, manual cost calculator, and safe maintenance tools\n\n- Up to 5 active managed users\n- Token monitoring is the authoritative usage signal\n- Cost estimates are manual and informational only\n- No hosted GreyFox cloud control plane\n- No automatic update checks or automatic container updates\n- No request detail drawer, exports, deeper diagnostics, or live traffic metrics\n\n- Docker Desktop or Docker Engine with Docker Compose\n- One available host port, default\n`8080`\n\n- A Provider API key only if you want to use live provider mode\n\nYou do not need Node.js, npm, Angular, Nx, or source code to run the Community Edition release.\n\nCreate a `compose.yaml`\n\nfile:\n\n```\nservices:\n  greyfox:\n    image: ghcr.io/skillful-fox-studio/grey-fox-community:0.1.0\n    container_name: greyfox-community\n    environment:\n      OPENAI_BASE_URL: ${OPENAI_BASE_URL:-https://api.openai.com/v1}\n      GREYFOX_DB_PATH: ${GREYFOX_DB_PATH:-data/greyfox.db}\n      PORT: 3000\n      GREYFOX_STATIC_ROOT: /app/public/admin-ui\n    ports:\n      - \"${GREYFOX_HTTP_PORT:-8080}:3000\"\n    volumes:\n      - greyfox-data:/app/data\n    restart: unless-stopped\n\nvolumes:\n  greyfox-data:\n```\n\nStart GreyFox:\n\n```\ndocker compose up -d\n```\n\nOpen the Admin UI:\n\n```\nhttp://localhost:8080\n```\n\nHealth check:\n\n```\ncurl http://localhost:8080/api/health\n```\n\nExpected response:\n\n```\n{\"status\":\"ok\",\"service\":\"proxy-api\"}\n```\n\nGreyFox is a proxy layer. It does not install browser extensions, intercept your personal ChatGPT usage, or automatically capture traffic from unrelated applications. Your AI application must send its provider requests to GreyFox instead of sending them directly to the upstream provider.\n\nTypical direct setup:\n\n```\nYour application\n      |\n      | HTTPS request with provider API key\n      v\nOpenAI-compatible provider\n```\n\nGreyFox setup:\n\n```\nYour application\n      |\n      | OpenAI-compatible request\n      | Base URL: http://<greyfox-host>:<port>/v1\n      | Header: X-App-User-Id: <your-end-user-id>\n      v\nGreyFox Community Edition\n      |\n      | Local checks:\n      | - user token quota\n      | - exact response cache\n      | - prompt injection guard\n      | - traffic logging\n      v\nOpenAI-compatible provider\n```\n\nThe application still decides when to call AI. GreyFox only sees requests that are explicitly routed through its proxy endpoint.\n\nIn your application configuration:\n\n-\nChange the AI provider base URL to GreyFox:\n\n```\nhttp://localhost:8080/v1\n```\n\nIf GreyFox runs on another server, use that host instead:\n\n```\nhttp://greyfox.internal:8080/v1\n```\n\n-\nKeep using the OpenAI-compatible chat completions path:\n\n```\n/chat/completions\n```\n\nFull URL:\n\n```\nhttp://localhost:8080/v1/chat/completions\n```\n\n-\nAdd the end-user identifier header to every AI request:\n\n```\nX-App-User-Id: user-123\n```\n\nThis should be your application's own user id, tenant user id, account id, or another stable identifier that lets GreyFox enforce limits per real end user.\n\n-\nConfigure Provider Settings in the GreyFox Admin UI:\n\n- use\n`Mock mode`\n\nfor first validation; - switch to\n`OpenAI-compatible provider`\n\nwhen you are ready to forward real traffic; - enter your provider API key in the Admin UI.\n\n- use\n-\nSend a test request and verify it appears in Dashboard and Traffic.\n\nUse this for local evaluation:\n\n``` php\nApp or curl -> http://localhost:8080/v1/chat/completions -> GreyFox -> Provider\n```\n\nIf your application also runs in Docker Compose, put both services on the same Compose network and call GreyFox by service name:\n\n```\nhttp://greyfox:3000/v1/chat/completions\n```\n\nInside Docker, use the container port `3000`\n\n. From the host machine, use the\npublished port, usually `8080`\n\n.\n\nFor a team environment, run GreyFox on an internal host and point your application to it:\n\n```\nhttp://greyfox.internal:8080/v1/chat/completions\n```\n\nKeep the Admin UI and proxy endpoint reachable only inside your trusted network unless you intentionally place your own authentication, VPN, or gateway in front of it.\n\nMost OpenAI-compatible SDKs let you override the base URL.\n\nConceptually, change this:\n\n```\nbaseURL = \"https://api.openai.com/v1\"\n```\n\nto this:\n\n```\nbaseURL = \"http://localhost:8080/v1\"\n```\n\nThen include:\n\n```\nX-App-User-Id: user-123\n```\n\nThe exact SDK option name depends on your application stack. Look for settings\nsuch as `baseURL`\n\n, `baseUrl`\n\n, `apiBase`\n\n, `base_url`\n\n, or `OPENAI_BASE_URL`\n\n.\n\nGreyFox uses one stable internal container port:\n\n```\n3000\n```\n\nThe host port is controlled by Docker port mapping. To run GreyFox on another host port:\n\n```\nGREYFOX_HTTP_PORT=9090 docker compose up -d\n```\n\nThen open:\n\n```\nhttp://localhost:9090\n```\n\nOpen the Admin UI and go to Provider Settings.\n\nUse:\n\n`Mock mode`\n\nfor zero-cost local demos and onboarding`OpenAI-compatible provider`\n\nfor live traffic forwarding\n\nGreyFox expects an OpenAI-compatible upstream API in live provider mode. Other compatible providers such as OpenRouter, Groq, Together, DeepSeek, Mistral, Ollama, or LocalAI may connect successfully, but provider billing remains the source of truth for final accounting.\n\nGreyFox stores provider settings locally in the container database volume. Saved provider keys are not shown again in full inside the UI.\n\nAfter enabling Mock mode in the Admin UI, send a test request:\n\n```\ncurl http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-App-User-Id: demo-user-1\" \\\n  -d \"{\\\"model\\\":\\\"gpt-4o-mini\\\",\\\"messages\\\":[{\\\"role\\\":\\\"user\\\",\\\"content\\\":\\\"Reply with GreyFox OK\\\"}]}\"\n```\n\nRefresh the Admin UI to see the request in Traffic and Dashboard.\n\nGreyFox does not auto-update.\n\nTo check releases manually, use `About -> Check for updates`\n\nin the Admin UI or\nvisit the public release page.\n\nTo update the Docker image:\n\n```\ndocker compose pull\ndocker compose up -d\n```\n\nYour local SQLite data is stored in the `greyfox-data`\n\nDocker volume and is not\nremoved by a normal image update.\n\nGreyFox Community Edition is designed to run inside your own infrastructure.\n\n- Prompts, completions, logs, settings, provider keys, and metrics stay in your local deployment unless you send them elsewhere.\n- Manual update checks make one browser request to GitHub Releases.\n- GreyFox does not require a hosted GreyFox control plane.\n- Connected upstream providers still process any traffic you send to them.\n\nPublic issues and Community releases:\n\n```\nhttps://github.com/skillful-fox-studio/grey-fox-community\n```\n\nDirect operator inquiries:\n\n```\nsupport@skilful-fox.com\n```\n\nGreyFox is currently maintained by a solo indie developer. Email replies may take up to 3 days.\n\nGreyFox Community Edition is proprietary commercial software made available as a free-to-use Community Edition. It is not open-source software.\n\nSee `LICENSE.md`\n\nand `THIRD_PARTY_NOTICES.md`\n\n.", "url": "https://wpnews.pro/news/greyfox-free-self-hosted-ai-proxy-token-quotas-and-local-cache", "canonical_source": "https://github.com/skillful-fox-studio/grey-fox-community", "published_at": "2026-06-21 19:09:52+00:00", "updated_at": "2026-06-21 19:34:54.586712+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-tools", "developer-tools"], "entities": ["GreyFox", "Skillful Fox Studio", "OpenAI", "Docker"], "alternates": {"html": "https://wpnews.pro/news/greyfox-free-self-hosted-ai-proxy-token-quotas-and-local-cache", "markdown": "https://wpnews.pro/news/greyfox-free-self-hosted-ai-proxy-token-quotas-and-local-cache.md", "text": "https://wpnews.pro/news/greyfox-free-self-hosted-ai-proxy-token-quotas-and-local-cache.txt", "jsonld": "https://wpnews.pro/news/greyfox-free-self-hosted-ai-proxy-token-quotas-and-local-cache.jsonld"}}