Show HN: SQL MCP Server – 61.37% on DataAgentBench with GLM-5.2 DataBridge, an open-source MCP server, achieves 61.37% on UC Berkeley's DataAgentBench using GLM-5.2, enabling AI agents to query heterogeneous databases (PostgreSQL, MongoDB, SQLite, DuckDB) with deterministic safety and schema-aware cross-database joins. One MCP server. Any database. Benchmark-proven. DataBridge is an open-source MCP server that gives AI agents Claude, GPT, Gemini, and any MCP-compatible agent reliable, safe, and intelligent access to heterogeneous databases. It sits between your agent and your data — handling connections, enforcing safety, learning schema, normalizing cross-database joins, and running post-query transforms so the agent gets answers, not raw data engineering problems. Benchmarked on DataAgentBench DAB https://ucbepic.github.io/DataAgentBench/ — the UC Berkeley + Hasura benchmark for real-world data agents across 12 datasets and 4 database systems. Enterprise data lives across multiple systems simultaneously — PostgreSQL for transactions, MongoDB for documents, DuckDB for analytics, SQLite for local state. Answering a single business question often requires querying all of them together. Current AI agents fail at this in four specific ways: 1. Silent wrong answers. An agent joins PostgreSQL's integer subscriber id: 12345 with MongoDB's string "CUST-0012345" , gets zero rows, and confidently reports "no results found." No error. No warning. Wrong answer delivered with certainty. 2. No safety layer. Agents given database access can — and do — execute destructive operations. A misunderstood task becomes a DELETE FROM orders with no WHERE clause. Prompt-based safety instructions are insufficient. A deterministic enforcement layer is required. 3. Cold start every session. Every new agent session re-discovers schema from scratch — re-reading table definitions, re-learning join patterns, re-discovering that customer id in PostgreSQL maps to id in MongoDB. This wastes tokens, time, and produces inconsistent results. 4. Raw row fetching. Agents pull full tables into context when they should push aggregation to the database. A SELECT on a 500,000-row table is a context window disaster. DataAgentBench tests agents on 54 realistic queries across 12 real-world datasets spanning PostgreSQL, MongoDB, SQLite, and DuckDB: | System | DAB Pass@1 | |---|---| DataBridge + GLM-5.2 | 61.37% | | MinusX + Claude Sonnet 4.6 + GPT-5.5-mini + Claude Haiku 4.5 | 65.2% | | Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 63.1% | | Spacedock Recce + Claude Opus 4.8 | 67.2% | | Altimate Code + Claude Sonnet 4.6 | 68.2% | | Altimate Code + Claude Sonnet 4.6 | 68.2% | | Altimate Code + GPT-5.5 + Claude Sonnet 4.6 | 71.7% | DataBridge with a significantly lower cost model matches frontier models DataBridge exposes a single MCP interface that any agent calls with a natural language question or structured intent. Agent: "Which customers bought product X in Q1 but not Q2, and what was their average order value?" DataBridge: → Identifies: orders in PostgreSQL, customer profiles in MongoDB → Plans: two sub-queries + cross-DB join → Normalizes: integer customer id PG ↔ string "CUST-XXXXX" Mongo → Safety check: read-only enforcement at parser level → Executes: sub-queries, merges results → Returns: clean structured JSON Agent receives: the answer, not the data engineering problem. Connect any combination of databases by listing their URIs in a single environment variable — comma-separated, no config files required. Supported databases: PostgreSQL · MongoDB · SQLite · DuckDB DATABRIDGE DATABASE URIS=postgresql://user:pass@localhost:5432/mydb,sqlite:////absolute/path/to/file.db Pass it in your MCP client config, in a .env file, or directly in the shell. SQLite and DuckDB paths must be absolute 4 slashes: sqlite://// . Deterministic safety. Not prompt-based instructions. - All queries are read-only by default — enforced at the SQL parser level - DML INSERT, UPDATE, DELETE and DDL CREATE, DROP, ALTER blocked unconditionally - No prompt injection can override parser-level enforcement Persistent, versioned knowledge about your databases. - Schema scanner: introspects all connected databases, stores column types, row counts, null rates - Schema cache: persists to local SQLite — no re-scanning on every session - Diff detection: flags schema changes since last scan Cross-database join registry: Auto-discovers join keys between databases using column name similarity WordNet + rapidfuzz and value sampling with a transform grammar. Covers common format differences like 12345 ↔ "CUST-0012345" without API calls. Human confirmation flow for ambiguous pairs. { "join id": "orders customers", "source": { "db": "prod postgres", "table": "orders", "column": "customer id" }, "target": { "db": "prod mongodb", "collection": "users", "field": " id" }, "transform": "CUST-{zero pad value, 7 }", "confidence": 0.97 } Cross-database query planning and execution. Sub-query spec format — run queries across multiple databases in one call: { "sub queries": {"db": "sqlite", "query": "SELECT Name, Version FROM packageinfo WHERE IsRelease=1", "key": "pkg"}, {"db": "duckdb", "query": "SELECT Name, Version, ProjectName, Project Information FROM project packageversion JOIN project info ...", "key": "ppv"} , "join on": "pkg.Name", "ppv.Name" , "pkg.Version", "ppv.Version" , "transform": {"op": "extract number", "column": "Project Information", "metric": "stars", "output": "stars"}, {"op": "top n with ties", "column": "stars", "n": 5} } Post-query transform pipeline — agents declare what to compute; DataBridge executes it: | Transform | What it does | |---|---| extract number | Pulls a numeric metric from prose text "38,715 stars" , "94k" | top n with ties | Returns top-N rows including all tied items — LIMIT N silently truncates ties | sort | Sorts rows by column, ascending or descending | cast number | Strips commas/spaces from a text column and casts to integer | compute ema | Exponential moving average per group, sorted by a time column | parse date | Extracts year/decade from prose text containing embedded dates | round down | Rounds a numeric column down to the nearest N e.g. decade | Agents never write TRY CAST REPLACE regexp extract ... , ',', '' AS BIGINT . They call {"op": "extract number", "metric": "stars"} and DataBridge handles it. Math compute — fetch data and compute in one call: Standard deviation without pulling rows to agent context math compute query="SELECT value AS v FROM measurements", databases= "mydb" , expression="math.sqrt sum x - sum v /len v 2 for x in v / len v " EMA over time-series data math compute sub queries= {"db":"patents","query":"SELECT code, year, COUNT AS cnt FROM t GROUP BY code, year","key":"k"} , operation="ema", group col="code", sort col="year", value col="cnt", alpha=0.3 Chi-square test math compute sub queries= ... , operation="chi square", row col="category", col col="flag", count col="cnt" Catch silent failures before the agent acts on wrong answers. - Zero-row results on tables with known large row counts → flagged as suspicious - Query provenance: which databases were queried, which joins were applied - Failure classification: wrong join key / schema mismatch / empty vs failed Append-only log of every query: timestamp, session ID, query text, rows returned, execution time. Queryable by session or recent N entries. Supports query replay for debugging. | Tool | Description | |---|---| db query | Execute SQL or a multi-DB spec across connected databases | db schema | Get schema for a database, table, or column | db joins | List and manage cross-database join relationships | db plan | Get the execution plan for a query without running it | db verify | Check plausibility of a result set | db audit | Query history for the current session | db connections | List active database connections and health status | ┌──────────────────────────────────────────────────┐ │ MCP CLIENT Claude / GPT / any MCP agent │ └────────────────────┬─────────────────────────────┘ │ MCP tool calls ┌────────────────────▼─────────────────────────────┐ │ DATABRIDGE MCP SERVER │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Query Intelligence │ │ │ │ multi-DB planning · transforms · math │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────┐ │ │ │ Safety Enforcement │ │ │ │ read-only at parser level │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────┐ │ │ │ Connection Layer │ │ │ │ unified driver · pooling │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ ┌──────────────────▼──────────────────────┐ │ │ │ Schema Memory & Verification │ │ │ │ schema cache · join registry · audit │ │ │ └─────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────┘ │ │ │ PostgreSQL MongoDB DuckDB / SQLite - Python 3.11+ - At least one running database PostgreSQL, MongoDB, SQLite, or DuckDB - An MCP-compatible agent Claude Desktop, Cursor, Windsurf, or any MCP client git clone https://github.com/gaviventures/databridge.git cd databridge pip install -e . Edit ~/Library/Application Support/Claude/claude desktop config.json Mac and add: { "mcpServers": { "databridge": { "command": "databridge", "args": "serve" , "env": { "DATABRIDGE DATABASE URIS": "postgresql://user:pass@localhost:5432/mydb,sqlite:////absolute/path/to/file.db" } } } } Restart Claude Desktop. DataBridge scans your schema on first use and caches it for subsequent sessions. Multiple databases are comma-separated in DATABRIDGE DATABASE URIS . SQLite paths must be absolute 4 slashes: sqlite://// . Once connected, ask Claude a natural language question that spans your databases. DataBridge handles the rest: "Which decade of publication has the highest average rating among detailed reviews?" Claude calls db connections → PostgreSQL + SQLite live Claude calls db schema → finds books info PostgreSQL , review SQLite Claude calls db query → samples rows, discovers publication dates are prose text in details field "May 8, 2012" and join is purchase id ↔ book id Claude calls db query → extracts years via regex, joins tables, aggregates ratings by decade Claude answers → "The 1990s has the highest average rating at 4.32" No connection strings in the prompt. No schema explanation needed. No JOIN syntax across database engines. When running the benchmark, DataBridge reads a db description.txt or db description withhint.txt from each dataset directory and prepends it to the query context — useful for non-obvious join relationships or column semantics the model can't infer from schema alone. This is a planned feature for the hosted MCP server. Join the waitlist → https://gaviventures.com DataBridge out of the box handles schema discovery, join detection, and query planning automatically. But for production use on your specific data, accuracy improves significantly with a few targeted tuning steps: 1. Confirm or correct join relationships Auto-discovery finds joins based on column name similarity and value sampling, but it can miss non-obvious relationships e.g. purchase id ↔ book id or propose false positives. Ask Claude to call db joins to list all discovered candidates — it will show each join with its confidence score and transform. Tell Claude to confirm joins that are correct confirm=