Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Scores 80.04% on BIRD Single-Model Leaderboard

Google Research announced Gemini-SQL2, a text-to-SQL capability powered by Gemini 3.1 Pro, which achieved 80.04% execution accuracy on the BIRD Single-Model Leaderboard, surpassing its predecessor Gemini-SQL and competitors from AWS, Databricks, and OpenAI. The system translates natural language into executable SQL queries, targeting integration into Google Cloud data services like BigQuery Studio, though specific product availability remains unconfirmed.

Google Research team has announced the launch of Gemini-SQL2 on X . They described this system as a breakthrough text-to-SQL capability powered by Gemini 3.1 Pro. Gemini-SQL2 posted 80.04% execution accuracy on the BIRD Text-to-SQL Leaderboard Single Model . Google’s chart places it above its own Gemini-SQL, the prior top entry. The metric measures whether generated SQL runs and returns correct results, not whether it looks valid. Gemini-SQL2 Gemini-SQL2 is a text-to-SQL capability, not a standalone foundation model release. It translates natural language questions into what Google calls ‘execution-ready SQL queries.’ The capability is built on Gemini 3.1 Pro. Per the announcement on X, “data subtlety & complex business contexts make generating accurate SQL from natural language notoriously hard.” The X Post also stated that “improved SQL understanding can elevate natural language skills across Google’s data services.” That points toward integration targets like BigQuery Studio, AlloyDB AI, and Cloud SQL Studio, which already ship Gemini-based SQL generation. Google has not yet confirmed which products will receive Gemini-SQL2. Benchmarks BIRD BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation is an industry standard for this task. It contains 12,751 question-SQL pairs across 95 databases spanning 37 professional domains, totaling 33.4GB. The databases include dirty values and require external knowledge grounding, unlike older benchmarks such as Spider. BIRD measures execution accuracy EX : the generated SQL must run and return results matching the gold query. Google stated this directly. “Per the BIRD benchmark, which measures execution-verified accuracy, GeminiSQL-2’s SQL doesn’t just look right, it also runs successfully.” The Single Trained Model Track restricts the preprocessing, retrieval, and agentic frameworks that ensembles use to boost scores. It measures the model’s core text-to-SQL ability. Google Cloud’s prior record on this track, reported November 15, 2025, was 76.13. Google benchmarks human performance at 92.96, leaving a 12.92-point gap from 80.04. How the Leaderboard Stacks Up Google’s chart, on X post, shows Gemini-SQL2 ahead of eight named competitors, along with several unlabeled points. Only 80.04% is stated as text. The values below are read from the chart’s position and are approximate; dates reflect each point’s horizontal placement. | System | Organization | BIRD Execution Accuracy Single Model | Chart Date | |---|---|---|---| | Gemini-SQL2 | 80.04% stated | Jun 2026 | | | Gemini-SQL | ~77.2% | Mar 2026 | | | Q-SQL | AWS | ~76.5% | Dec 2025 | | Databricks RLVR 32B | Databricks | ~75.7% | Jul 2025 | | SiriusAI-Text2SQL-32B-v2 | Tencent | ~75.0% | Dec 2025 | | Arctic-Text2SQL-R1-32B | Snowflake | ~73.9% | Jun 2025 | | GPT-5.5-xhigh | OpenAI | ~72.5% | Apr 2026 | | SQLWeaver-32B | Alibaba | ~71.7% | May 2026 | | Claude Opus 4.6 | Anthropic | ~70.1% | Feb 2026 | Two patterns are visible. Google now holds the top two named positions, Gemini-SQL2 and Gemini-SQL. Several specialized 32B SQL models also sit above some general frontier models on this chart. Use Cases with Examples Self-service analytics : A revenue manager asks for monthly recurring revenue by region, for accounts that churned within 90 days of upgrade. This needs joins, window logic, and date arithmetic. Execution-verified generation catches SQL that runs but returns wrong rows. Data engineering drafts : Devs can draft BigQuery transformations from English, then review rather than write from scratch. Google’s November 2025 work identified schema understanding as the hard part. Higher BIRD scores reflect better handling of ambiguous columns and messy values. Embedded “ask your data” features : SaaS teams adding natural-language query interfaces still need human review at 80% accuracy. One in five queries can be wrong. The score sets expectations, not a removal of review. Gemini-SQL2 Launch: Community Reception Dashboard Gemini-SQL2 Launch: Community Reception Dashboard Verified public engagement on Google Research’s announcement posts • first ~3 hours • Jun 12, 2026 X / Twitter main post LinkedIn main post Reception signal Implementation Pattern Google has not published a Gemini-SQL2 model string or API yet. The schema-grounded pattern below works with current Gemini models via the google-genai SDK. Swap the model string when Gemini-SQL2 ships. python from google import genai client = genai.Client reads GEMINI API KEY from environment schema = """ CREATE TABLE orders order id INTEGER, customer TEXT, region TEXT, amount REAL, status TEXT, created at DATE ; """ question = "Total paid order amount by region in 2026, highest first." prompt = f"""You are a text-to-SQL system. Schema:{schema} Question: {question} Return only one executable SQLite query. No explanation.""" resp = client.models.generate content model="gemini-3.1-pro-preview", the base model named in the announcement; swap when a Gemini-SQL2 ID ships contents=prompt, print resp.text Production systems should add execution verification. Run the returned SQL, catch errors, and retry with the error message appended. That loop mirrors what BIRD's execution accuracy metric rewards. Key Takeaways - Google reports Gemini-SQL2 at 80.04% execution accuracy on the BIRD single-model leaderboard. - The capability is powered by Gemini 3.1 Pro and targets "execution-ready SQL," not just plausible SQL. - On Google's chart, Gemini-SQL2 and Gemini-SQL hold the top two named positions; human performance is 92.96. - No API, model card, technical report, or product integration details have been published yet. MARKTECHPOST Visual Explainer The task Gemini-SQL2 just scored 80.04% on BIRD benchmark, single model . Pick a question, inspect the generated SQL, then run it on a live in-browser dataset. Select a question above to generate SQL. CREATE TABLE orders order id INTEGER, customer TEXT, region TEXT, amount REAL, status TEXT, created at DATE ; -- 12 sample rows loaded in this browser Check out the Details here. Also, feel free to follow us on and don’t forget to join our Twitter https://x.com/intent/follow?screen name=marktechpost and Subscribe to 150k+ML SubReddit https://www.reddit.com/r/machinelearningnews/ . Wait are you on telegram? our Newsletter https://www.aidevsignals.com/ now you can join us on telegram as well. https://t.me/machinelearningresearchnews Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us https://forms.gle/wbash1wF6efRj8G58