cd /news/large-language-models/beaver-enterprise-benchmark-for-llm-… · home topics large-language-models article
[ARTICLE · art-27409] src=beaverbench.github.io ↗ pub= topic=large-language-models verified=true sentiment=· neutral

BEAVER: Enterprise benchmark for LLM Text-to-SQL from private data warehouses

Researchers at MIT and other institutions released BEAVER, a large-scale enterprise benchmark for evaluating LLM text-to-SQL capabilities, containing 9,128 queries from private data warehouses across 19 domains. The benchmark includes a public set of 7,978 queries and a private test set, designed to assess real-world enterprise SQL generation. The team invites submissions for evaluation and provides code and data for citation.

read1 min publishedJun 15, 2026

Please send an email to peterbc@mit.edu, along with your method name, a brief description of the method, and, optionally, a link to your paper or codebase. We will follow up with detailed instructions.

| Rank | Submission Date | Method | Model | Execution Accuracy | |---|

If you find our data, code, or the paper helpful, please cite the paper:

@article{chen2024beaver,
  title={BEAVER: an enterprise benchmark for text-to-sql},
  author={Chen, Peter Baile and Yang, Devin and Li, Weiyue and Wenz, Fabian and Zhang, Yi and Tatbul, Nesime and Cafarella, Michael and Demiralp, {\c{C}}a{\u{g}}atay and Stonebraker, Michael},
  journal={arXiv preprint arXiv:2409.02038},
  year={2024}
}

BEAVER is a large-scale enterprise text-to-SQL dataset containing 9128 queries spanning 812 tables across 19 diverse domains. Of these, 7978 queries are publicly released, while the remaining portion is held out as a private test set. Queries and databases were collected from private organizations.

To facilitate fine-grained evaluation and analysis, we provide

Representative BEAVER tasks with question, SQL, and subtask annotations.

If you find our data, code, or the paper helpful, please cite the paper:

article{chen2024beaver,
  title={BEAVER: an enterprise benchmark for text-to-sql},
  author={Chen, Peter Baile and Yang, Devin and Li, Weiyue and Wenz, Fabian and Zhang, Yi and Tatbul, Nesime and Cafarella, Michael and Demiralp, {\c{C}}a{\u{g}}atay and Stonebraker, Michael},
  journal={arXiv preprint arXiv:2409.02038},
  year={2024}
}
── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/beaver-enterprise-be…] indexed:0 read:1min 2026-06-15 ·