Phlox is a self-hostable chat application with an agentic harness, document RAG, code execution, and MCP integration — running over any model provider: AWS Bedrock or any OpenAI-compatible endpoint (OpenAI, Ollama, vLLM, LiteLLM, LM Studio, local models).
- 💬
Streaming chat with conversation history, rename/delete, search & export, message edit/regenerate, markdown with highlighted/copyable code,Mermaid diagrams andLaTeX math. - 🤖
Agentic harness(inspired by PI Coder): the model uses tools in a loop — filesystem (
read_file
/write_file
/edit_file
/glob
/grep
),run_shell
,execute_python
/execute_node
,search_documents
,web_fetch
, plusplanning(update_todos
),sub-agents(spawn_subagent
),memory(save_memory
), andcheckpoints— each scoped to a per-conversation sandboxed workspace. - 🤝
Human-in-the-loop approvals— on sensitive tools, approve/deny, resume. - 🧰
Code execution with captured output andartifacts shown inline + aWorkspace Files panel to browse/download everything the agent created. - 🗂️
Workspace checkpoints— git-backed snapshots with one-click restore. - 📚
Documents / RAG— upload PDF/DOCX/TXT/MD/code;** hybrid (dense+sparse) searchover Qdrantwith reranking + citations; global or per-conversation scoping. Works offline via a fallback embedder. - 🧠
Cross-conversation memory— durable facts recalled across chats. - 🖼️
Multimodal— attach images to messages for vision models. - 🔌
MCP integration— connect Model Context Protocol servers; their tools join automatically. - 🔀
Any provider— named profiles for Bedrock / OpenAI-compatible endpoints, switchable live, with a connection tester. - 🏠
Runs fully local— point at Ollama**,** LM Studio**, or** vLLM**(any OpenAI-compatible server) for offline, self-hosted inference with no cloud API key; RAG embeddings can run locally too. - 🔐
Auth & multi-user— local accounts (or** Entra ID SSO**),user
/admin
roles, per-user data isolation, anadmin panel(users, MCP, tools, auth). Seedocs/AUTH.md. - 💵
Usage & cost accounting— per-message token/cost in the UI, plus an admin** chargebackview: usage by month × user × department × model**, CSV export for finance, and a durable ledger that keeps a departed user's costs billable after their account is deleted. Seedocs/OBSERVABILITY.md. - ⚙️
Live admin configuration— edit provider profiles (keys write-only), model pricing, resilience, generation defaults, and sandbox limits from an admin-onlyConfiguration panel, applied without a server restart.config.yml
remains the seed. - 📦
Container sandbox— run code in an isolated** Podman/Docker**container with resource limits + network isolation. Seedocs/SANDBOX.md. - 🎨
Theming— Phlox Dark (default) + Phlox Light/Light/Dark/Fred Hutch/Hutch Night/Sandstone, instant switching. Seedocs/THEMING.md. - 🛡️
Per-tool permissions—auto | ask | deny
, with an "Agent mode" toggle.
| Doc | What it covers |
|---|---|
start heredocs/ROADMAP.mddocs/AUTH.mdEntra ID SSO setupdocs/SANDBOX.mdPodman/Docker container code-execution sandboxdocs/OBSERVABILITY.mddocs/MCP.mddocs/THEMING.mddocs/ADDING_A_TOOL.md·docs/ADDING_A_PROVIDER.mdAGENTS.mdTwo processes: a FastAPI backend (LLM orchestration, agent harness, MCP, RAG, code exec, auth, SQLite persistence) and a React/Vite frontend. Full details in ** docs/ARCHITECTURE.md**.
backend/ FastAPI app (app/), config.yml, SQLite + Qdrant under data/
frontend/ React + Vite + Tailwind SPA
docs/ ARCHITECTURE, ROADMAP, AUTH, SANDBOX, MCP, THEMING, ADDING_A_*
scripts/ dev.ps1 / dev.sh
Prerequisites: Python 3.11+ with uv,
Node 18+, and a model provider (a local
Ollamais the easiest).
cd backend
uv sync
cp config.yml.example config.yml # edit: set your provider profile(s)
uv run uvicorn app.main:app --reload --port 8000
cd frontend
npm install
npm run dev # open http://localhost:5173
On Windows you can run both with ./scripts/dev.ps1
; on macOS/Linux ./scripts/dev.sh
.
Edit backend/config.yml
(full examples in config.yml.example
). Any
OpenAI-compatible server works with type: openai
— just point endpoint
at it. That covers the popular local runtimes, so Phlox can run entirely offline with no cloud API key:
default_profile: local-ollama
profiles:
local-ollama:
type: openai
label: "Ollama (local)"
endpoint: http://localhost:11434/v1
api_key: ollama # required by the client, ignored by Ollama
model: qwen3.6:35b
models: [qwen3.6:35b, glm-4.7-flash:latest]
supports_tools: true # set false for models without tool-calling
lmstudio:
type: openai
label: "LM-Studio (local)"
endpoint: http://localhost:1234/v1
api_key: none # required by the client, ignored by LM-Studio
model: qwen/qwen3.6-27b
models: [qwen/qwen3.6-27b]
supports_tools: true # set false for models without tool-calling
The same type: openai
shape also covers OpenAI, LiteLLM, and any other
OpenAI-compatible gateway — set the endpoint
and api_key
. For AWS Bedrock, use
type: bedrock
with a model
id and aws_region
(credentials resolve via the standard
AWS chain; for temporary STS creds also set aws_session_token
).
Define as many profiles as you like and switch between them live in Settings → Model
(there's a built-in connection tester). Embeddings for document RAG can also run locally —
e.g. Ollama's nomic-embed-text
— so the whole stack stays offline.
Edit config without a restart.config.yml
is the seed; an admin can edit provider profiles, model pricing, resilience, generation defaults, and sandbox limitsliveinSettings → (Admin) Configuration(overrides are stored in the DB and applied immediately). API keys there are write-only/masked. Bootstrap-sensitive settings (auth
,vector_store
, the sandbox runnertype, OTel) stay file-only and need a restart. See[docs/AUTH.md]§admin config.
Auth is on by default with a seeded admin: ** admin / admin**. Manage users, reset passwords, and view/configure SSO under
Settings → (Admin) Users / Authentication.
Change the default admin password and set a real before sharing access — see
auth.jwt_secret
docs/AUTH.md. To run single-user with no login, set
auth.enabled: false
.By default code runs in a local subprocess (fast, trusts the host). For isolation, set
sandbox.runner: container
to run each execution in an ephemeral Podman/Docker container with CPU/memory/PID limits and network isolation — see docs/SANDBOX.md.
cd frontend && npm run build # outputs frontend/dist
cd ../backend && uv run uvicorn app.main:app --port 8000
FastAPI serves the built SPA from frontend/dist
at /
.
The backend has a pytest suite (unit + FastAPI TestClient
API tests + scripted-provider
agent-loop/fallback tests); the frontend is verified by a production build. The same checks
run in GitHub Actions CI (.github/workflows/ci.yml
) on every push/PR.
cd backend
uv sync --extra dev # installs ruff + pytest
uv run ruff check app tests
uv run pytest # or: uv run pytest -k usage to run a subset
cd ../frontend && npm run build
The tests run against an in-memory/temp SQLite DB with auth.enabled
off (a synthetic dev
admin), so no provider credentials or network are needed — agent-loop tests use a built-in
scripted "test" provider. Coverage includes the chargeback ledger surviving user
deletion (tests/test_api.py::test_usage_ledger_survives_user_deletion
).
backend/evals/run_evals.py
exercises the agent against a real configured provider
(tool use, RAG, multi-step). It needs a working config.yml
profile and is not part of CI:
cd backend && uv run python -m evals.run_evals
Auth: change the seededadmin
/admin
and set a strongauth.jwt_secret
(envPHLOX_JWT_SECRET
) before any shared use. Data is isolated per user; admin features are role-gated.Sandbox: the local runner trusts the host (fine for single-user/local). For untrusted/multi-user execution usesandbox.runner: container
(docs/SANDBOX.md).- Mutating/execution tools default to the
permission policy; "Agent mode" auto-approves for a turn.ask
Sensitive data (PHI): Postgres, audit logging, secrets management, and data governance are tracked asTier 5 in theroadmapand are required before any deployment touching sensitive data.
Licensed under the Apache License, Version 2.0 — see LICENSE. Copyright © 2026 Robert McDermott <robert.c.mcdermott@gmail.com>.