BEAVER: Enterprise benchmark for LLM Text-to-SQL from private data warehouses

Researchers at MIT and other institutions released BEAVER, a large-scale enterprise benchmark for evaluating LLM text-to-SQL capabilities, containing 9,128 queries from private data warehouses across 19 domains. The benchmark includes a public set of 7,978 queries and a private test set, designed to assess real-world enterprise SQL generation. The team invites submissions for evaluation and provides code and data for citation.

Please send an email to peterbc@mit.edu, along with your method name, a brief description of the method, and, optionally, a link to your paper or codebase. We will follow up with detailed instructions. | Rank | Submission Date | Method | Model | Execution Accuracy | |---| If you find our data, code, or the paper helpful, please cite the paper: @article{chen2024beaver, title={BEAVER: an enterprise benchmark for text-to-sql}, author={Chen, Peter Baile and Yang, Devin and Li, Weiyue and Wenz, Fabian and Zhang, Yi and Tatbul, Nesime and Cafarella, Michael and Demiralp, {\c{C}}a{\u{g}}atay and Stonebraker, Michael}, journal={arXiv preprint arXiv:2409.02038}, year={2024} } BEAVER is a large-scale enterprise text-to-SQL dataset containing 9128 queries spanning 812 tables across 19 diverse domains. Of these, 7978 queries are publicly released, while the remaining portion is held out as a private test set. Queries and databases were collected from private organizations. To facilitate fine-grained evaluation and analysis, we provide Representative BEAVER tasks with question, SQL, and subtask annotations. If you find our data, code, or the paper helpful, please cite the paper: article{chen2024beaver, title={BEAVER: an enterprise benchmark for text-to-sql}, author={Chen, Peter Baile and Yang, Devin and Li, Weiyue and Wenz, Fabian and Zhang, Yi and Tatbul, Nesime and Cafarella, Michael and Demiralp, {\c{C}}a{\u{g}}atay and Stonebraker, Michael}, journal={arXiv preprint arXiv:2409.02038}, year={2024} }