{"slug": "show-hn-tetherdust-self-hosted-ai-analytics-engineer-open-source", "title": "Show HN: TetherDust – Self-hosted AI Analytics Engineer (open source)", "summary": "TetherDust, an open-source self-hosted AI analytics engineer, was released on Show HN, enabling AI agents to generate verifiable SQL and build interactive d3.js dashboards by bridging codebases and databases through containerized Model Context Protocol servers. The platform runs entirely within user infrastructure with read-only query enforcement, role-based access control, and immutable audit logging, supporting multiple databases and AI agent integrations including ChatGPT, Claude, and Ollama.", "body_md": "TetherDust bridges the gap between your codebase and databases using containerized Model Context Protocol (MCP) servers. By documenting database schemas alongside repository documentation, TetherDust enables any AI agent to generate verifiable SQL, build dynamic d3.js dashboards, and map schema-to-code dependencies. The platform runs entirely within your infrastructure, enforcing strict read-only query boundaries, role-based access control (RBAC), and immutable audit logging.\n\nTetherDust is designed to be a flexible platform for AI-driven data interaction, with features that include:\n\n- Generate well-structured, wiki-like codebase and database documentation from natural language prompts, with rich Markdown support.\n\n- Point TetherDust at a GitHub codebase (or codebase documentation) together with a database documentation. The agent explores both and produces an interactive visual graph showing which code files read or write which tables and columns, versioned as the schema drifts.\n\n- Describe the dashboard you want; the AI agent writes the SQL and the d3.js code for every chart. Charts auto-refresh on a schedule and are cached for performance.\n\n- Edit the generated chart directly for custom behavior, or ask the agent to update it when requirements change.\n\n- Define queries and run them on a schedule, delivering results by email or download.\n\nUse **Chat** to access all of TetherDust's capabilities in one place.\n\n- Ask natural language questions about your data and get streamed answers grounded in your documentation. You can mention documentation sources by name to pull in specific context, or let the agent decide what to use.\n\n- TetherDust can write and execute SQL queries — either at your request or to confirm details before answering.\n\n- Reach reports, dashboards, and tethers by name from the chat.\n\n- Use predefined prompts.\n\nUse CLI tools, API calls, or Ollama to connect any agent that speaks MCP. Currently supported agent integrations:\n\n| Provider | Method |\n|---|---|\nCodex CLI (OpenAI) |\nChatGPT subscription auth token |\nCodex CLI (OpenAI) |\nOpenAI API key |\nClaude Code (Anthropic) |\nClaude Pro/Max OAuth token |\nClaude Code (Anthropic) |\nAnthropic API key |\nDirect API |\nAny agent accessible via HTTP API, configured with custom MCP servers |\nOllama |\nLocal Ollama models with MCP support |\n\nConnect any database with a Python SQLAlchemy dialect and a read-only user. Currently supported databases: PostgreSQL, MySQL/MariaDB, SQL Server, SQLite, ClickHouse, Oracle, Snowflake, BigQuery.\n\nMany more agents and databases to comeThe architecture is designed to be agent-agnostic, with a simple interface for adding new ones.\n\nAgent runtimes are containerized, so the only way for an agent to interact with TetherDust's features is through MCP servers, which expose tools and data sources as APIs. TetherDust includes a built-in MCP server that exposes the core features.\n\nEvery user's role decides which databases, MCP tools, documentation sources, dashboards, reports, and tethers they can see.\n\n``` php\nflowchart LR\n    User([👤 User]) --> Role{{Role}}\n\n    Role -- allowed_databases --> DB[(Databases)]\n    Role -- allowed_tools --> Tools[MCP Tools]\n    Role -- allowed_doc_sources --> Docs[Doc Sources]\n    Role -- allowed_prompts --> Prompts[Prompts]\n    Role -- allowed_mcp_servers --> MCP[Custom MCP Servers]\n    Role -- can_view_dashboards --> Dash[Dashboards]\n    Role -- can_view_reports --> Reports[Reports]\n    Role -- can_view_tethers --> Tethers[Tethers]\n\n    DB --> Agent([🤖 Agent])\n    Tools --> Agent\n    Docs --> Agent\n    Prompts --> Agent\n    MCP --> Agent\n\n    Agent -. only sees the<br/>allowed subset .-> Scope[/Permitted scope/]\n```\n\nAgents only see the databases and tools their user role allows.\n\nExtend the agent with remote HTTP or local subprocess MCP servers (Notion, GitHub, internal APIs, anything that speaks MCP), granted per role.\n\nEvery agent query is parsed with SQLGlot and rejected unless it is read-only. Connections are read-only by default, and the real trust boundary is a read-only database user — always connect with one.\n\nActions and queries are logged in an immutable audit log. Every chat session, agent query, and generation run is recorded and reviewable by staff in the admin console.\n\nThe full stack ships as a Docker Compose project: a Django web app (portal + admin console), an MCP server that exposes database and generation tools, pluggable AI agent gateways (Codex CLI, Claude Code CLI, direct API/Ollama), PostgreSQL, Redis, and Celery workers for background tasks. Switching the active agent is a single toggle in the console — no restarts, no config changes.\n\n[Docker](https://docs.docker.com/get-docker/)and Docker Compose v2- An AI agent credential (one of):\n**Codex**— a ChatGPT subscription auth token, or an OpenAI API key** Claude Code**— a Claude Pro/Max OAuth token (`claude setup-token`\n\n), or an Anthropic API key\n\nYou configure the agent credential later, from the admin console — it is **not** needed to boot the stack.\n\nAll credentials live in a gitignored `.env`\n\nfile that Docker Compose loads automatically.\nStart from the template:\n\n```\ncp .env.example .env\n```\n\n`.env.example`\n\nships with working local defaults for the database and admin login, but the\ntwo cryptographic keys are intentionally blank — the stack will not start until you fill\nthem in. Edit `.env`\n\nand set the following.\n\n**a. Generate a credential-encryption key** (Fernet). This key encrypts all stored\ndatabase passwords and agent API keys/tokens — generate your own and keep it secret:\n\n``` python\npython -c \"from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())\"\n```\n\nSet `TETHERDUST_ENCRYPTION_KEY`\n\nto the generated value. It is shared from `.env`\n\nto every\nservice that needs it (`mcp`\n\n, `local-mcp`\n\n, `web`\n\n, `celery-worker`\n\n, `celery-beat`\n\n), so you\nonly set it once.\n\n**b. Generate a Django secret key:**\n\n``` python\npython -c \"import secrets; print(secrets.token_urlsafe(64))\"\n```\n\nSet `DJANGO_SECRET_KEY`\n\nto the generated value.\n\n**c. Set the admin login.** The superuser is created on first boot from these values —\nchange them so the default `admin`\n\n/`admin`\n\nis never used:\n\n```\nDJANGO_SUPERUSER_USERNAME=admin\nDJANGO_SUPERUSER_PASSWORD=<a-strong-password>\nDJANGO_SUPERUSER_EMAIL=you@example.com\n```\n\n**d. Change the database password.** Set `DB_NAME`\n\n, `DB_USER`\n\n, and `DB_PASSWORD`\n\nto your\nchosen values. A single set of variables feeds the `db`\n\nservice, both MCP connection\nstrings, and the web/celery services — there is nothing to keep in sync by hand.\n\n**e. Generate the internal service secrets.** Two shared secrets authenticate\nTetherDust's internal service-to-service calls — `MCP_FILTER_SECRET`\n\n(web/celery →\nMCP filter registration) and `AGENT_GATEWAY_SECRET`\n\n(Django → the Codex/Claude\ngateways). Generate a value for each:\n\n``` python\npython -c \"import secrets; print(secrets.token_urlsafe(32))\"\n```\n\nIf left blank the stack still starts, but those internal calls are unauthenticated — set both before exposing TetherDust to a network.\n\n**f. (Local development only) Enable debug mode.** `.env.example`\n\nships with\n`DJANGO_DEBUG=false`\n\n, which enables production hardening (secure cookies, HTTPS\nredirect, HSTS) and assumes TLS in front of the app — so logging in over plain\n`http://localhost`\n\nwon't work. For local development set `DJANGO_DEBUG=true`\n\n(dev\nserver with auto-reload, hardening relaxed). Leave it `false`\n\nfor any real deployment.\n\nNote:\n\n`.env`\n\nis listed in`.gitignore`\n\n, so your real secrets stay out of version control. Never commit it. See[Production notes].\n\n```\ndocker compose up --build\n```\n\nThis starts PostgreSQL, the MCP server, the agent gateways, Redis, Celery, and the Django web app. First boot runs database migrations, creates your superuser, and auto-discovers documentation sources.\n\nVisit ** http://localhost:8000** and log in with the superuser credentials you set in step 1c.\n\nFrom the admin console:\n\n**Agents**— add an agent configuration (Codex or Claude Code), paste your auth token/API key, and mark it active. Only one agent is active at a time.**Databases**— add a connection to the database you want to query. Use a** read-only**database user (see below).- Open the chat and ask a question in natural language.\n\nTetherDust runs every agent query through three layers of read-only protection:\n\n**SQL validation**— each query is parsed (via SQLGlot, per database dialect) and rejected unless it is a single`SELECT`\n\n/CTE/set-operation. Multi-statement input, data-modifying CTEs,`SELECT … INTO`\n\n, stored-procedure calls, and DDL/DML are all blocked.**Read-only session**— connections marked** Read-only**(default ON) run in a read-only database session where the engine supports it (PostgreSQL, MySQL/MariaDB, SQLite, Oracle, ClickHouse). SQL Server, BigQuery, and Snowflake have no session-level read-only — there, rely on a read-only user/role (below).**Read-only database user**— the real trust boundary.** Always connect with an account that only has read access.**The two layers above are defense-in-depth; a read-only credential is what actually guarantees the agent can't write.\n\n```\n-- PostgreSQL\nCREATE ROLE tetherdust_ro LOGIN PASSWORD '...';\nGRANT CONNECT ON DATABASE mydb TO tetherdust_ro;\nGRANT USAGE ON SCHEMA public TO tetherdust_ro;\nGRANT SELECT ON ALL TABLES IN SCHEMA public TO tetherdust_ro;\nALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO tetherdust_ro;\n\n-- MySQL / MariaDB\nCREATE USER 'tetherdust_ro'@'%' IDENTIFIED BY '...';\nGRANT SELECT ON mydb.* TO 'tetherdust_ro'@'%';\n```\n\nFor **BigQuery** grant `roles/bigquery.dataViewer`\n\n+ `roles/bigquery.jobUser`\n\n(not\n`dataEditor`\n\n); for **Snowflake** grant a role with `USAGE`\n\n/`SELECT`\n\nonly; for **SQL\nServer** add the login to the `db_datareader`\n\nrole.\n\n**Stored credentials are encrypted** with the Fernet key from step 1a. If`TETHERDUST_ENCRYPTION_KEY`\n\nis left blank, credentials are stored in plaintext — always set it. In production (`DJANGO_DEBUG=false`\n\n) TetherDust refuses to save a credential without a key.**Set every secret in** before any non-local deployment, as described in step 1.`.env`\n\nThe default Compose configuration is tuned for local development. Before exposing TetherDust to a network, work through this checklist:\n\n-\n**Rotate every secret**— encryption key, Django secret key,`MCP_FILTER_SECRET`\n\n,`AGENT_GATEWAY_SECRET`\n\n, admin password, database password (see[step 1](#1-set-your-secrets-before-the-first-launch)). -\n**Set**— without them the internal MCP filter registration and agent gateways accept unauthenticated calls.`MCP_FILTER_SECRET`\n\nand`AGENT_GATEWAY_SECRET`\n\n-\n— disables debug pages and switches to the Daphne ASGI server. This`DJANGO_DEBUG=false`\n\n**also auto-enables the transport/cookie hardening below.** -\n— set to your real host(s), comma-separated.`DJANGO_ALLOWED_HOSTS`\n\n-\n— set to your HTTPS origin(s), e.g.`DJANGO_CSRF_TRUSTED_ORIGINS`\n\n`https://tetherdust.example.com`\n\n(required for form posts behind a proxy). -\n**Terminate TLS** in front of the app (reverse proxy / load balancer) and forward`X-Forwarded-Proto`\n\n. -\n**Publish only the** The internal services —`web`\n\nservice (port 8000).`mcp`\n\n(8001),`local-mcp`\n\n(8003), the agent gateways (`codex`\n\n/`codex-api`\n\n/`claude`\n\n/`claude-api`\n\n, 8002),`db`\n\n, and`redis`\n\n— have no user-facing auth and must stay on the private Compose network. The default`docker-compose.yml`\n\nonly maps`8000`\n\n; if you add port mappings or run host networking, do**not** expose the others. Treat`MCP_FILTER_SECRET`\n\n/`AGENT_GATEWAY_SECRET`\n\nas defense-in-depth, not a substitute for network isolation. -\n**Keep secrets out of version control**— secrets already live in the gitignored`.env`\n\n; for production prefer a secrets manager and never commit`.env`\n\n.\n\nWhen `DJANGO_DEBUG=false`\n\n, `settings.py`\n\nautomatically turns on\n`SECURE_SSL_REDIRECT`\n\n, `SESSION_COOKIE_SECURE`\n\n, `CSRF_COOKIE_SECURE`\n\n,\n`SECURE_CONTENT_TYPE_NOSNIFF`\n\n, HSTS (1 year), and `SECURE_PROXY_SSL_HEADER`\n\n. These\nassume TLS is terminated in front of the app. Optional overrides:\n\n| Variable | Default (when DEBUG off) | Purpose |\n|---|---|---|\n`DJANGO_SECURE_SSL_REDIRECT` |\n`True` |\nSet `False` if your proxy already redirects HTTP→HTTPS. |\n`DJANGO_SECURE_HSTS_SECONDS` |\n`31536000` |\nSet `0` to disable HSTS while validating a TLS rollout. |\n`DJANGO_CSRF_TRUSTED_ORIGINS` |\n(empty) |\nComma-separated HTTPS origins. |\n\nVerify your configuration with `python manage.py check --deploy`\n\n.\n\nTetherDust tracks a single **product version** in the repo-root `VERSION`\n\nfile\n(independent of the `mcp_server`\n\npackage version in `tetherdust/pyproject.toml`\n\n).\nStaff see it under **Console → Version**, along with per-release notes read from\nthe `changelog/`\n\ndirectory (one `changelog/<version>.md`\n\nfile per release) and an\n**update-available** indicator.\n\nThe update check (a Celery task, every 6 hours) calls the GitHub API for the\n**latest published Release** of the upstream repo (`GITHUB_REPOSITORY`\n\nin\n`core/version.py`\n\n) and compares its tag against the running `VERSION`\n\nusing\nsemantic versioning. A newer tag lights up the indicator. There is nothing to\nconfigure — every install checks the same official repo.\n\nOnly\n\npublished GitHub Releasesare detected. A bare`git tag`\n\nwith no Release attached is invisible to the check.\n\n- Bump\n`VERSION`\n\nand add`changelog/<version>.md`\n\nwith the upgrade notes for admins (migrations, new env vars, manual steps) plus the changes. Commit. - Tag and push:\n`git tag v<version> && git push --tags`\n\n. **Publish a GitHub Release** for that tag — this is the step that flips the update indicator for every running install.\n\nTetherDust is licensed under the **GNU Affero General Public License v3.0\n(AGPLv3)** — see [LICENSE](/mpospirit-apps/TetherDust/blob/main/LICENSE). You are free to use, modify, and self-host\nit; note that AGPL's network-copyleft requires you to make your modified source\navailable to users you provide the software to over a network.\n\nA separate commercial license is available for the managed/cloud offering and for use that doesn't fit AGPLv3 — contact the maintainers.\n\nContributions are welcome. By submitting a contribution you agree to the\n[Contributor License Agreement](/mpospirit-apps/TetherDust/blob/main/CLA.md), signaled by signing off your commits:\n\n```\ngit commit -s\n```\n\nThis certifies the Developer Certificate of Origin and lets the project include your contribution in both the AGPLv3 codebase and the commercial offering.", "url": "https://wpnews.pro/news/show-hn-tetherdust-self-hosted-ai-analytics-engineer-open-source", "canonical_source": "https://github.com/mpospirit-apps/TetherDust", "published_at": "2026-06-12 14:43:27+00:00", "updated_at": "2026-06-12 14:51:17.972362+00:00", "lang": "en", "topics": ["ai-tools", "ai-agents", "ai-infrastructure", "natural-language-processing", "generative-ai"], "entities": ["TetherDust", "GitHub", "MCP", "d3.js", "SQL", "RBAC"], "alternates": {"html": "https://wpnews.pro/news/show-hn-tetherdust-self-hosted-ai-analytics-engineer-open-source", "markdown": "https://wpnews.pro/news/show-hn-tetherdust-self-hosted-ai-analytics-engineer-open-source.md", "text": "https://wpnews.pro/news/show-hn-tetherdust-self-hosted-ai-analytics-engineer-open-source.txt", "jsonld": "https://wpnews.pro/news/show-hn-tetherdust-self-hosted-ai-analytics-engineer-open-source.jsonld"}}