{"slug": "show-hn-sql-mcp-server-61-37-on-dataagentbench-with-glm-5-2", "title": "Show HN: SQL MCP Server – 61.37% on DataAgentBench with GLM-5.2", "summary": "DataBridge, an open-source MCP server, achieves 61.37% on UC Berkeley's DataAgentBench using GLM-5.2, enabling AI agents to query heterogeneous databases (PostgreSQL, MongoDB, SQLite, DuckDB) with deterministic safety and schema-aware cross-database joins.", "body_md": "One MCP server. Any database. Benchmark-proven.\n\nDataBridge is an open-source MCP server that gives AI agents (Claude, GPT, Gemini, and any MCP-compatible agent) reliable, safe, and intelligent access to heterogeneous databases. It sits between your agent and your data — handling connections, enforcing safety, learning schema, normalizing cross-database joins, and running post-query transforms so the agent gets answers, not raw data engineering problems.\n\nBenchmarked on [DataAgentBench (DAB)](https://ucbepic.github.io/DataAgentBench/) — the UC Berkeley + Hasura benchmark for real-world data agents across 12 datasets and 4 database systems.\n\nEnterprise data lives across multiple systems simultaneously — PostgreSQL for transactions, MongoDB for documents, DuckDB for analytics, SQLite for local state. Answering a single business question often requires querying all of them together.\n\nCurrent AI agents fail at this in four specific ways:\n\n**1. Silent wrong answers.** An agent joins PostgreSQL's integer `subscriber_id: 12345`\n\nwith MongoDB's string `\"CUST-0012345\"`\n\n, gets zero rows, and confidently reports \"no results found.\" No error. No warning. Wrong answer delivered with certainty.\n\n**2. No safety layer.** Agents given database access can — and do — execute destructive operations. A misunderstood task becomes a `DELETE FROM orders`\n\nwith no WHERE clause. Prompt-based safety instructions are insufficient. A deterministic enforcement layer is required.\n\n**3. Cold start every session.** Every new agent session re-discovers schema from scratch — re-reading table definitions, re-learning join patterns, re-discovering that `customer_id`\n\nin PostgreSQL maps to `_id`\n\nin MongoDB. This wastes tokens, time, and produces inconsistent results.\n\n**4. Raw row fetching.** Agents pull full tables into context when they should push aggregation to the database. A `SELECT *`\n\non a 500,000-row table is a context window disaster.\n\nDataAgentBench tests agents on 54 realistic queries across 12 real-world datasets spanning PostgreSQL, MongoDB, SQLite, and DuckDB:\n\n| System | DAB Pass@1 |\n|---|---|\nDataBridge + GLM-5.2 |\n61.37% |\n| MinusX + Claude Sonnet 4.6 + GPT-5.5-mini + Claude Haiku 4.5 | 65.2% |\n| Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 63.1% |\n| Spacedock (Recce) + Claude Opus 4.8 | 67.2% |\n| Altimate Code + Claude Sonnet 4.6 | 68.2% |\n| Altimate Code + Claude Sonnet 4.6 | 68.2% |\n| Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 71.7% |\n\nDataBridge with a significantly lower cost model matches frontier models\n\nDataBridge exposes a single MCP interface that any agent calls with a natural language question or structured intent.\n\n```\nAgent: \"Which customers bought product X in Q1 but not Q2, and what was\n        their average order value?\"\n\nDataBridge:\n  → Identifies: orders in PostgreSQL, customer profiles in MongoDB\n  → Plans: two sub-queries + cross-DB join\n  → Normalizes: integer customer_id (PG) ↔ string \"CUST-XXXXX\" (Mongo)\n  → Safety check: read-only enforcement at parser level\n  → Executes: sub-queries, merges results\n  → Returns: clean structured JSON\n\nAgent receives: the answer, not the data engineering problem.\n```\n\nConnect any combination of databases by listing their URIs in a single environment variable — comma-separated, no config files required.\n\n**Supported databases:** PostgreSQL · MongoDB · SQLite · DuckDB\n\n```\nDATABRIDGE_DATABASE_URIS=postgresql://user:pass@localhost:5432/mydb,sqlite:////absolute/path/to/file.db\n```\n\nPass it in your MCP client config, in a `.env`\n\nfile, or directly in the shell. SQLite and DuckDB paths must be absolute (4 slashes: `sqlite:////`\n\n).\n\nDeterministic safety. Not prompt-based instructions.\n\n- All queries are\n**read-only by default**— enforced at the SQL parser level - DML (INSERT, UPDATE, DELETE) and DDL (CREATE, DROP, ALTER) blocked unconditionally\n- No prompt injection can override parser-level enforcement\n\nPersistent, versioned knowledge about your databases.\n\n- Schema scanner: introspects all connected databases, stores column types, row counts, null rates\n- Schema cache: persists to local SQLite — no re-scanning on every session\n- Diff detection: flags schema changes since last scan\n\n**Cross-database join registry:**\n\nAuto-discovers join keys between databases using column name similarity (WordNet + rapidfuzz) and value sampling with a transform grammar. Covers common format differences like `12345`\n\n↔ `\"CUST-0012345\"`\n\nwithout API calls. Human confirmation flow for ambiguous pairs.\n\n```\n{\n  \"join_id\": \"orders_customers\",\n  \"source\": { \"db\": \"prod_postgres\", \"table\": \"orders\", \"column\": \"customer_id\" },\n  \"target\": { \"db\": \"prod_mongodb\", \"collection\": \"users\", \"field\": \"_id\" },\n  \"transform\": \"CUST-{zero_pad(value, 7)}\",\n  \"confidence\": 0.97\n}\n```\n\nCross-database query planning and execution.\n\n**Sub-query spec format** — run queries across multiple databases in one call:\n\n```\n{\n  \"sub_queries\": [\n    {\"db\": \"sqlite\",  \"query\": \"SELECT Name, Version FROM packageinfo WHERE IsRelease=1\", \"key\": \"pkg\"},\n    {\"db\": \"duckdb\",  \"query\": \"SELECT Name, Version, ProjectName, Project_Information FROM project_packageversion JOIN project_info ...\", \"key\": \"ppv\"}\n  ],\n  \"join_on\": [[\"pkg.Name\", \"ppv.Name\"], [\"pkg.Version\", \"ppv.Version\"]],\n  \"transform\": [\n    {\"op\": \"extract_number\", \"column\": \"Project_Information\", \"metric\": \"stars\", \"output\": \"stars\"},\n    {\"op\": \"top_n_with_ties\", \"column\": \"stars\", \"n\": 5}\n  ]\n}\n```\n\n**Post-query transform pipeline** — agents declare *what* to compute; DataBridge executes it:\n\n| Transform | What it does |\n|---|---|\n`extract_number` |\nPulls a numeric metric from prose text (`\"38,715 stars\"` , `\"94k\"` ) |\n`top_n_with_ties` |\nReturns top-N rows including all tied items — `LIMIT N` silently truncates ties |\n`sort` |\nSorts rows by column, ascending or descending |\n`cast_number` |\nStrips commas/spaces from a text column and casts to integer |\n`compute_ema` |\nExponential moving average per group, sorted by a time column |\n`parse_date` |\nExtracts year/decade from prose text containing embedded dates |\n`round_down` |\nRounds a numeric column down to the nearest N (e.g. decade) |\n\nAgents never write `TRY_CAST(REPLACE(regexp_extract(...), ',', '') AS BIGINT)`\n\n. They call `{\"op\": \"extract_number\", \"metric\": \"stars\"}`\n\nand DataBridge handles it.\n\n**Math compute** — fetch data and compute in one call:\n\n```\n# Standard deviation without pulling rows to agent context\nmath_compute(\n    query=\"SELECT value AS v FROM measurements\", databases=[\"mydb\"],\n    expression=\"math.sqrt(sum((x - sum(v)/len(v))**2 for x in v) / len(v))\"\n)\n\n# EMA over time-series data\nmath_compute(\n    sub_queries=[{\"db\":\"patents\",\"query\":\"SELECT code, year, COUNT(*) AS cnt FROM t GROUP BY code, year\",\"key\":\"k\"}],\n    operation=\"ema\", group_col=\"code\", sort_col=\"year\", value_col=\"cnt\", alpha=0.3\n)\n\n# Chi-square test\nmath_compute(\n    sub_queries=[...],\n    operation=\"chi_square\", row_col=\"category\", col_col=\"flag\", count_col=\"cnt\"\n)\n```\n\nCatch silent failures before the agent acts on wrong answers.\n\n- Zero-row results on tables with known large row counts → flagged as suspicious\n- Query provenance: which databases were queried, which joins were applied\n- Failure classification: wrong join key / schema mismatch / empty vs failed\n\nAppend-only log of every query: timestamp, session ID, query text, rows returned, execution time. Queryable by session or recent N entries. Supports query replay for debugging.\n\n| Tool | Description |\n|---|---|\n`db_query` |\nExecute SQL or a multi-DB spec across connected databases |\n`db_schema` |\nGet schema for a database, table, or column |\n`db_joins` |\nList and manage cross-database join relationships |\n`db_plan` |\nGet the execution plan for a query without running it |\n`db_verify` |\nCheck plausibility of a result set |\n`db_audit` |\nQuery history for the current session |\n`db_connections` |\nList active database connections and health status |\n\n```\n┌──────────────────────────────────────────────────┐\n│  MCP CLIENT (Claude / GPT / any MCP agent)       │\n└────────────────────┬─────────────────────────────┘\n                     │ MCP tool calls\n┌────────────────────▼─────────────────────────────┐\n│  DATABRIDGE MCP SERVER                           │\n│                                                  │\n│  ┌─────────────────────────────────────────┐    │\n│  │  Query Intelligence                     │    │\n│  │  multi-DB planning · transforms · math  │    │\n│  └──────────────────┬──────────────────────┘    │\n│                     │                            │\n│  ┌──────────────────▼──────────────────────┐    │\n│  │  Safety Enforcement                     │    │\n│  │  read-only at parser level              │    │\n│  └──────────────────┬──────────────────────┘    │\n│                     │                            │\n│  ┌──────────────────▼──────────────────────┐    │\n│  │  Connection Layer                       │    │\n│  │  unified driver · pooling               │    │\n│  └──────────────────┬──────────────────────┘    │\n│                     │                            │\n│  ┌──────────────────▼──────────────────────┐    │\n│  │  Schema Memory & Verification           │    │\n│  │  schema cache · join registry · audit   │    │\n│  └─────────────────────────────────────────┘    │\n│                                                  │\n└──────────────────────────────────────────────────┘\n         │              │              │\n   PostgreSQL       MongoDB        DuckDB / SQLite\n```\n\n- Python 3.11+\n- At least one running database (PostgreSQL, MongoDB, SQLite, or DuckDB)\n- An MCP-compatible agent (Claude Desktop, Cursor, Windsurf, or any MCP client)\n\n```\ngit clone https://github.com/gaviventures/databridge.git\ncd databridge\npip install -e .\n```\n\nEdit `~/Library/Application Support/Claude/claude_desktop_config.json`\n\n(Mac) and add:\n\n```\n{\n  \"mcpServers\": {\n    \"databridge\": {\n      \"command\": \"databridge\",\n      \"args\": [\"serve\"],\n      \"env\": {\n        \"DATABRIDGE_DATABASE_URIS\": \"postgresql://user:pass@localhost:5432/mydb,sqlite:////absolute/path/to/file.db\"\n      }\n    }\n  }\n}\n```\n\nRestart Claude Desktop. DataBridge scans your schema on first use and caches it for subsequent sessions.\n\nMultiple databases are comma-separated in `DATABRIDGE_DATABASE_URIS`\n\n. SQLite paths must be absolute (4 slashes: `sqlite:////`\n\n).\n\nOnce connected, ask Claude a natural language question that spans your databases. DataBridge handles the rest:\n\n\"Which decade of publication has the highest average rating among detailed reviews?\"\n\n```\nClaude calls db_connections  → PostgreSQL + SQLite live\nClaude calls db_schema       → finds books_info (PostgreSQL), review (SQLite)\nClaude calls db_query        → samples rows, discovers publication dates are prose text\n                               in details field (\"May 8, 2012\") and join is purchase_id ↔ book_id\nClaude calls db_query        → extracts years via regex, joins tables, aggregates ratings by decade\nClaude answers               → \"The 1990s has the highest average rating at 4.32\"\n```\n\nNo connection strings in the prompt. No schema explanation needed. No JOIN syntax across database engines.\n\nWhen running the benchmark, DataBridge reads a `db_description.txt`\n\n(or `db_description_withhint.txt`\n\n) from each dataset directory and prepends it to the query context — useful for non-obvious join relationships or column semantics the model can't infer from schema alone.\n\nThis is a planned feature for the hosted MCP server. [Join the waitlist →](https://gaviventures.com)\n\nDataBridge out of the box handles schema discovery, join detection, and query planning automatically. But for production use on your specific data, accuracy improves significantly with a few targeted tuning steps:\n\n**1. Confirm or correct join relationships**\nAuto-discovery finds joins based on column name similarity and value sampling, but it can miss non-obvious relationships (e.g. `purchase_id`\n\n↔ `book_id`\n\n) or propose false positives. Ask Claude to call `db_joins`\n\nto list all discovered candidates — it will show each join with its confidence score and transform. Tell Claude to confirm joins that are correct (`confirm=<join_id>`\n\n) or reject ones that aren't (`reject=<join_id>`\n\n). Confirmed joins are shown to the model in every subsequent query as trusted facts, eliminating the need to re-discover them.\n\n**2. Add database hints**\nDocument non-obvious relationships, column semantics, and business logic in plain text. Examples: which ID fields map across databases, what free-text columns contain embedded dates, what enum values mean. The model uses this context on every query.\n\n**3. Normalize your data**\nInconsistent ID formats (`12345`\n\nvs `\"CUST-0012345\"`\n\n), missing foreign keys, nulls in join columns, and mixed date formats all reduce accuracy. The closer your schema is to clean relational data, the better the results.\n\n**4. Add ontology and lookup tables**\nQueries that require domain knowledge — category hierarchies, code-to-name mappings, status enumerations — benefit from explicit lookup tables the model can join against rather than having to infer meaning from raw codes.\n\n**5. Tune the query context**\nFor schemas with many tables, explicitly describing which tables are relevant for which query types reduces the model's search space and improves answer quality.\n\nNeed help setting this up for your databases? Write to [hello@gaviventures.com](mailto:hello@gaviventures.com) — we'll help you configure DataBridge for your specific schema.\n\nDataBridge is built to be measured. We run against DataAgentBench on every release.\n\n| System | DAB Pass@1 |\n|---|---|\nDataBridge + GLM-5.2 |\n61.37% |\n| MinusX + Claude Sonnet 4.6 + GPT-5.5-mini + Claude Haiku 4.5 | 65.2% |\n| Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 63.1% |\n| Spacedock (Recce) + Claude Opus 4.8 | 67.2% |\n| Altimate Code + Claude Sonnet 4.6 | 68.2% |\n| Altimate Code + Claude Sonnet 4.6 | 68.2% |\n| Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 71.7% |\n\nReproducible eval scripts are in `/benchmark`\n\n. See [TESTING.md](/gagarwal304/databridge/blob/main/TESTING.md) for full instructions.\n\n**1. Clone DataAgentBench**\n\n```\ngit clone https://github.com/ucbepic/DataAgentBench.git\n```\n\n**2. Set your API key**\n\n```\nexport TOGETHER_API_KEY=your_key   # or ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.\n```\n\n**3. Run a single dataset to verify setup**\n\n```\ndatabridge benchmark run \\\n  --dab-root /path/to/DataAgentBench \\\n  --provider together \\\n  --model zai-org/GLM-5.2 \\\n  --dataset bookreview\n```\n\n**4. Run all 12 official datasets (one run)**\n\n```\ndatabridge benchmark run \\\n  --dab-root /path/to/DataAgentBench \\\n  --provider together \\\n  --model zai-org/GLM-5.2 \\\n  --official \\\n  --run 0\n```\n\n**5. Run all 5 trials for leaderboard submission**\n\n```\nfor i in 0 1 2 3 4; do\n  databridge benchmark run \\\n    --dab-root /path/to/DataAgentBench \\\n    --provider together \\\n    --model zai-org/GLM-5.2 \\\n    --official \\\n    --run $i\ndone\n```\n\nResults are written incrementally to `benchmark/results/submission_{model}.json`\n\n— a crash mid-run won't lose completed queries. Re-running a specific `--run`\n\nindex overwrites only that run's entries.\n\nSupported providers: `anthropic`\n\n· `openai`\n\n· `together`\n\n· `groq`\n\n· `kimi`\n\n· `ollama`\n\n- PostgreSQL, MongoDB, SQLite, DuckDB connectors\n- Read-only safety enforcement (parser level)\n- Schema scanner and cache\n- Cross-database join registry (auto-discovery + human confirmation)\n- Multi-DB sub-query spec with join and transform pipeline\n- Post-query transforms (extract_number, top_n_with_ties, compute_ema, parse_date, ...)\n- Math compute (EMA, chi-square, arbitrary Python expressions)\n- Result plausibility verification\n- Audit log with query replay\n- DAB benchmark eval harness\n- 61.1% on DataAgentBench with GLM-5.1\n\nHosted MCP endpoint — no self-hosting required. Add databases via UI, share with your team, tune join discovery, view eval logs. [Join the waitlist →](https://gaviventures.com)\n\n**Safety is deterministic, not instructional.** Read-only enforcement happens at the SQL parser level. No prompt can override it.\n\n**Silent failures are the real enemy.** A wrong answer delivered confidently is worse than an error. DataBridge catches zero-row results on populated tables, type mismatches, and plausibility failures before the agent acts.\n\n**Computation is cheap. Context is expensive.** Value sampling, join confidence scoring, post-query transforms, and math operations all happen inside tool calls. The agent sees a result, not the process that produced it.\n\n**Benchmark-first development.** Every feature is evaluated against DAB. If it doesn't move the score, it doesn't ship.\n\n**Open core.** The MCP server, connectors, safety enforcement, schema memory, and benchmark tooling are open source (Apache 2.0) forever.\n\nDataBridge welcomes contributors, especially:\n\n- Database connector implementations (BigQuery, Snowflake, Supabase, Neon)\n- DAB benchmark improvements\n- Safety layer hardening\n- Schema learning algorithms\n\nApache 2.0 — free to use, modify, and distribute. Commercial use permitted.\n\n*Built by Gavi Ventures*", "url": "https://wpnews.pro/news/show-hn-sql-mcp-server-61-37-on-dataagentbench-with-glm-5-2", "canonical_source": "https://github.com/gagarwal304/databridge", "published_at": "2026-06-24 07:37:24+00:00", "updated_at": "2026-06-24 08:14:13.615262+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-infrastructure", "ai-tools"], "entities": ["DataBridge", "GLM-5.2", "DataAgentBench", "UC Berkeley", "Hasura", "PostgreSQL", "MongoDB", "SQLite"], "alternates": {"html": "https://wpnews.pro/news/show-hn-sql-mcp-server-61-37-on-dataagentbench-with-glm-5-2", "markdown": "https://wpnews.pro/news/show-hn-sql-mcp-server-61-37-on-dataagentbench-with-glm-5-2.md", "text": "https://wpnews.pro/news/show-hn-sql-mcp-server-61-37-on-dataagentbench-with-glm-5-2.txt", "jsonld": "https://wpnews.pro/news/show-hn-sql-mcp-server-61-37-on-dataagentbench-with-glm-5-2.jsonld"}}