Show HN: SQL MCP Server – 61.37% on DataAgentBench with GLM-5.2

DataBridge, an open-source MCP server, achieves 61.37% on UC Berkeley's DataAgentBench using GLM-5.2, enabling AI agents to query heterogeneous databases (PostgreSQL, MongoDB, SQLite, DuckDB) with deterministic safety and schema-aware cross-database joins.

One MCP server. Any database. Benchmark-proven. DataBridge is an open-source MCP server that gives AI agents Claude, GPT, Gemini, and any MCP-compatible agent reliable, safe, and intelligent access to heterogeneous databases. It sits between your agent and your data — handling connections, enforcing safety, learning schema, normalizing cross-database joins, and running post-query transforms so the agent gets answers, not raw data engineering problems. Benchmarked on DataAgentBench DAB https://ucbepic.github.io/DataAgentBench/ — the UC Berkeley + Hasura benchmark for real-world data agents across 12 datasets and 4 database systems. Enterprise data lives across multiple systems simultaneously — PostgreSQL for transactions, MongoDB for documents, DuckDB for analytics, SQLite for local state. Answering a single business question often requires querying all of them together. Current AI agents fail at this in four specific ways: 1. Silent wrong answers. An agent joins PostgreSQL's integer subscriber id: 12345 with MongoDB's string "CUST-0012345" , gets zero rows, and confidently reports "no results found." No error. No warning. Wrong answer delivered with certainty. 2. No safety layer. Agents given database access can — and do — execute destructive operations. A misunderstood task becomes a DELETE FROM orders with no WHERE clause. Prompt-based safety instructions are insufficient. A deterministic enforcement layer is required. 3. Cold start every session. Every new agent session re-discovers schema from scratch — re-reading table definitions, re-learning join patterns, re-discovering that customer id in PostgreSQL maps to id in MongoDB. This wastes tokens, time, and produces inconsistent results. 4. Raw row fetching. Agents pull full tables into context when they should push aggregation to the database. A SELECT on a 500,000-row table is a context window disaster. DataAgentBench tests agents on 54 realistic queries across 12 real-world datasets spanning PostgreSQL, MongoDB, SQLite, and DuckDB: | System | DAB Pass@1 | |---|---| DataBridge + GLM-5.2 | 61.37% | | MinusX + Claude Sonnet 4.6 + GPT-5.5-mini + Claude Haiku 4.5 | 65.2% | | Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 63.1% | | Spacedock Recce + Claude Opus 4.8 | 67.2% | | Altimate Code + Claude Sonnet 4.6 | 68.2% | | Altimate Code + Claude Sonnet 4.6 | 68.2% | | Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 71.7% | DataBridge with a significantly lower cost model matches frontier models DataBridge exposes a single MCP interface that any agent calls with a natural language question or structured intent. Agent: "Which customers bought product X in Q1 but not Q2, and what was their average order value?" DataBridge: → Identifies: orders in PostgreSQL, customer profiles in MongoDB → Plans: two sub-queries + cross-DB join → Normalizes: integer customer id PG ↔ string "CUST-XXXXX" Mongo → Safety check: read-only enforcement at parser level → Executes: sub-queries, merges results → Returns: clean structured JSON Agent receives: the answer, not the data engineering problem. Connect any combination of databases by listing their URIs in a single environment variable — comma-separated, no config files required. Supported databases: PostgreSQL · MongoDB · SQLite · DuckDB DATABRIDGE DATABASE URIS=postgresql://user:pass@localhost:5432/mydb,sqlite:////absolute/path/to/file.db Pass it in your MCP client config, in a .env file, or directly in the shell. SQLite and DuckDB paths must be absolute 4 slashes: sqlite://// . Deterministic safety. Not prompt-based instructions. - All queries are read-only by default — enforced at the SQL parser level - DML INSERT, UPDATE, DELETE and DDL CREATE, DROP, ALTER blocked unconditionally - No prompt injection can override parser-level enforcement Persistent, versioned knowledge about your databases. - Schema scanner: introspects all connected databases, stores column types, row counts, null rates - Schema cache: persists to local SQLite — no re-scanning on every session - Diff detection: flags schema changes since last scan Cross-database join registry: Auto-discovers join keys between databases using column name similarity WordNet + rapidfuzz and value sampling with a transform grammar. Covers common format differences like 12345 ↔ "CUST-0012345" without API calls. Human confirmation flow for ambiguous pairs. { "join id": "orders customers", "source": { "db": "prod postgres", "table": "orders", "column": "customer id" }, "target": { "db": "prod mongodb", "collection": "users", "field": " id" }, "transform": "CUST-{zero pad value, 7 }", "confidence": 0.97 } Cross-database query planning and execution. Sub-query spec format — run queries across multiple databases in one call: { "sub queries": {"db": "sqlite", "query": "SELECT Name, Version FROM packageinfo WHERE IsRelease=1", "key": "pkg"}, {"db": "duckdb", "query": "SELECT Name, Version, ProjectName, Project Information FROM project packageversion JOIN project info ...", "key": "ppv"} , "join on": "pkg.Name", "ppv.Name" , "pkg.Version", "ppv.Version" , "transform": {"op": "extract number", "column": "Project Information", "metric": "stars", "output": "stars"}, {"op": "top n with ties", "column": "stars", "n": 5} } Post-query transform pipeline — agents declare what to compute; DataBridge executes it: | Transform | What it does | |---|---| extract number | Pulls a numeric metric from prose text "38,715 stars" , "94k" | top n with ties | Returns top-N rows including all tied items — LIMIT N silently truncates ties | sort | Sorts rows by column, ascending or descending | cast number | Strips commas/spaces from a text column and casts to integer | compute ema | Exponential moving average per group, sorted by a time column | parse date | Extracts year/decade from prose text containing embedded dates | round down | Rounds a numeric column down to the nearest N e.g. decade | Agents never write TRY CAST REPLACE regexp extract ... , ',', '' AS BIGINT . They call {"op": "extract number", "metric": "stars"} and DataBridge handles it. Math compute — fetch data and compute in one call: Standard deviation without pulling rows to agent context math compute query="SELECT value AS v FROM measurements", databases= "mydb" , expression="math.sqrt sum x - sum v /len v 2 for x in v / len v " EMA over time-series data math compute sub queries= {"db":"patents","query":"SELECT code, year, COUNT AS cnt FROM t GROUP BY code, year","key":"k"} , operation="ema", group col="code", sort col="year", value col="cnt", alpha=0.3 Chi-square test math compute sub queries= ... , operation="chi square", row col="category", col col="flag", count col="cnt" Catch silent failures before the agent acts on wrong answers. - Zero-row results on tables with known large row counts → flagged as suspicious - Query provenance: which databases were queried, which joins were applied - Failure classification: wrong join key / schema mismatch / empty vs failed Append-only log of every query: timestamp, session ID, query text, rows returned, execution time. Queryable by session or recent N entries. Supports query replay for debugging. | Tool | Description | |---|---| db query | Execute SQL or a multi-DB spec across connected databases | db schema | Get schema for a database, table, or column | db joins | List and manage cross-database join relationships | db plan | Get the execution plan for a query without running it | db verify | Check plausibility of a result set | db audit | Query history for the current session | db connections | List active database connections and health status | ┌──────────────────────────────────────────────────┐ │ MCP CLIENT Claude / GPT / any MCP agent │ └────────────────────┬─────────────────────────────┘ │ MCP tool calls ┌────────────────────▼─────────────────────────────┐ │ DATABRIDGE MCP SERVER │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Query Intelligence │ │ │ │ multi-DB planning · transforms · math │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────┐ │ │ │ Safety Enforcement │ │ │ │ read-only at parser level │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────┐ │ │ │ Connection Layer │ │ │ │ unified driver · pooling │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────┐ │ │ │ Schema Memory & Verification │ │ │ │ schema cache · join registry · audit │ │ │ └─────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────┘ │ │ │ PostgreSQL MongoDB DuckDB / SQLite - Python 3.11+ - At least one running database PostgreSQL, MongoDB, SQLite, or DuckDB - An MCP-compatible agent Claude Desktop, Cursor, Windsurf, or any MCP client git clone https://github.com/gaviventures/databridge.git cd databridge pip install -e . Edit ~/Library/Application Support/Claude/claude desktop config.json Mac and add: { "mcpServers": { "databridge": { "command": "databridge", "args": "serve" , "env": { "DATABRIDGE DATABASE URIS": "postgresql://user:pass@localhost:5432/mydb,sqlite:////absolute/path/to/file.db" } } } } Restart Claude Desktop. DataBridge scans your schema on first use and caches it for subsequent sessions. Multiple databases are comma-separated in DATABRIDGE DATABASE URIS . SQLite paths must be absolute 4 slashes: sqlite://// . Once connected, ask Claude a natural language question that spans your databases. DataBridge handles the rest: "Which decade of publication has the highest average rating among detailed reviews?" Claude calls db connections → PostgreSQL + SQLite live Claude calls db schema → finds books info PostgreSQL , review SQLite Claude calls db query → samples rows, discovers publication dates are prose text in details field "May 8, 2012" and join is purchase id ↔ book id Claude calls db query → extracts years via regex, joins tables, aggregates ratings by decade Claude answers → "The 1990s has the highest average rating at 4.32" No connection strings in the prompt. No schema explanation needed. No JOIN syntax across database engines. When running the benchmark, DataBridge reads a db description.txt or db description withhint.txt from each dataset directory and prepends it to the query context — useful for non-obvious join relationships or column semantics the model can't infer from schema alone. This is a planned feature for the hosted MCP server. Join the waitlist → https://gaviventures.com DataBridge out of the box handles schema discovery, join detection, and query planning automatically. But for production use on your specific data, accuracy improves significantly with a few targeted tuning steps: 1. Confirm or correct join relationships Auto-discovery finds joins based on column name similarity and value sampling, but it can miss non-obvious relationships e.g. purchase id ↔ book id or propose false positives. Ask Claude to call db joins to list all discovered candidates — it will show each join with its confidence score and transform. Tell Claude to confirm joins that are correct confirm=<join id or reject ones that aren't reject=<join id . Confirmed joins are shown to the model in every subsequent query as trusted facts, eliminating the need to re-discover them. 2. Add database hints Document non-obvious relationships, column semantics, and business logic in plain text. Examples: which ID fields map across databases, what free-text columns contain embedded dates, what enum values mean. The model uses this context on every query. 3. Normalize your data Inconsistent ID formats 12345 vs "CUST-0012345" , missing foreign keys, nulls in join columns, and mixed date formats all reduce accuracy. The closer your schema is to clean relational data, the better the results. 4. Add ontology and lookup tables Queries that require domain knowledge — category hierarchies, code-to-name mappings, status enumerations — benefit from explicit lookup tables the model can join against rather than having to infer meaning from raw codes. 5. Tune the query context For schemas with many tables, explicitly describing which tables are relevant for which query types reduces the model's search space and improves answer quality. Need help setting this up for your databases? Write to hello@gaviventures.com mailto:hello@gaviventures.com — we'll help you configure DataBridge for your specific schema. DataBridge is built to be measured. We run against DataAgentBench on every release. | System | DAB Pass@1 | |---|---| DataBridge + GLM-5.2 | 61.37% | | MinusX + Claude Sonnet 4.6 + GPT-5.5-mini + Claude Haiku 4.5 | 65.2% | | Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 63.1% | | Spacedock Recce + Claude Opus 4.8 | 67.2% | | Altimate Code + Claude Sonnet 4.6 | 68.2% | | Altimate Code + Claude Sonnet 4.6 | 68.2% | | Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 71.7% | Reproducible eval scripts are in /benchmark . See TESTING.md /gagarwal304/databridge/blob/main/TESTING.md for full instructions. 1. Clone DataAgentBench git clone https://github.com/ucbepic/DataAgentBench.git 2. Set your API key export TOGETHER API KEY=your key or ANTHROPIC API KEY, OPENAI API KEY, etc. 3. Run a single dataset to verify setup databridge benchmark run \ --dab-root /path/to/DataAgentBench \ --provider together \ --model zai-org/GLM-5.2 \ --dataset bookreview 4. Run all 12 official datasets one run databridge benchmark run \ --dab-root /path/to/DataAgentBench \ --provider together \ --model zai-org/GLM-5.2 \ --official \ --run 0 5. Run all 5 trials for leaderboard submission for i in 0 1 2 3 4; do databridge benchmark run \ --dab-root /path/to/DataAgentBench \ --provider together \ --model zai-org/GLM-5.2 \ --official \ --run $i done Results are written incrementally to benchmark/results/submission {model}.json — a crash mid-run won't lose completed queries. Re-running a specific --run index overwrites only that run's entries. Supported providers: anthropic · openai · together · groq · kimi · ollama - PostgreSQL, MongoDB, SQLite, DuckDB connectors - Read-only safety enforcement parser level - Schema scanner and cache - Cross-database join registry auto-discovery + human confirmation - Multi-DB sub-query spec with join and transform pipeline - Post-query transforms extract number, top n with ties, compute ema, parse date, ... - Math compute EMA, chi-square, arbitrary Python expressions - Result plausibility verification - Audit log with query replay - DAB benchmark eval harness - 61.1% on DataAgentBench with GLM-5.1 Hosted MCP endpoint — no self-hosting required. Add databases via UI, share with your team, tune join discovery, view eval logs. Join the waitlist → https://gaviventures.com Safety is deterministic, not instructional. Read-only enforcement happens at the SQL parser level. No prompt can override it. Silent failures are the real enemy. A wrong answer delivered confidently is worse than an error. DataBridge catches zero-row results on populated tables, type mismatches, and plausibility failures before the agent acts. Computation is cheap. Context is expensive. Value sampling, join confidence scoring, post-query transforms, and math operations all happen inside tool calls. The agent sees a result, not the process that produced it. Benchmark-first development. Every feature is evaluated against DAB. If it doesn't move the score, it doesn't ship. Open core. The MCP server, connectors, safety enforcement, schema memory, and benchmark tooling are open source Apache 2.0 forever. DataBridge welcomes contributors, especially: - Database connector implementations BigQuery, Snowflake, Supabase, Neon - DAB benchmark improvements - Safety layer hardening - Schema learning algorithms Apache 2.0 — free to use, modify, and distribute. Commercial use permitted. Built by Gavi Ventures