cd /news/artificial-intelligence/openai-introduces-genebench-to-evalu… · home topics artificial-intelligence article
[ARTICLE · art-45295] src=cryptobriefing.com ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

OpenAI introduces GeneBench to evaluate AI on computational biology’s hardest problems

OpenAI released GeneBench, a benchmark of 103 multi-stage genomics problems, revealing that top AI models like GPT-5.5 Pro achieve only a 33.2% pass rate, with 60% of problems below 20% for current models. The benchmark highlights significant limitations in AI's ability to perform complex computational biology tasks that require senior scientists 10-40 hours each.

read2 min views1 publishedJun 30, 2026
OpenAI introduces GeneBench to evaluate AI on computational biology’s hardest problems
Image: Cryptobriefing (auto-discovered)

A new benchmark reveals that even the best AI models struggle to solve the kinds of multi-stage genomics problems that take senior scientists weeks to crack.

OpenAI has published GeneBench, a benchmark designed to stress-test AI models on the kinds of problems that make computational biologists earn their salaries. The benchmark, released as a bioRxiv preprint on April 23, 2026, is not asking models to explain what DNA is. It is asking them to do the actual work.

Each of GeneBench’s 103 problems spans multiple analytical stages across ten domains in genomics and quantitative biology. A senior scientist would need roughly 10 to 40 hours to work through a single one of them. Current AI models are, to put it politely, not there yet.

What the numbers actually say #

GPT-5.5 Pro posted the highest pass rate among evaluated models, at 33.2%. That sounds modest until you see what the rest of the field managed.

The standard version of GPT-5.5 reached a 25.0% pass rate. Gemini 3.1 Pro landed at 11.2%. Roughly 60% of the benchmark’s problems remained below a 20% pass rate even for the best models tested.

The benchmark was published by J. Li and collaborators alongside developments related to GPT-5.5, framing it as both an evaluation tool and a signal of where the field stands heading into a period of rapid model development.

A June 2026 update to a model called GPT-Rosalind achieved a 21.6% pass rate on GeneBench compared to GPT-5.5’s 20.4%, while using 31% fewer tokens to get there.

Why a benchmark like this matters #

GeneBench is grounding its problems in work that reflects what scientists actually do. Multi-stage genomics analysis involves long inference chains, domain-specific reasoning, and decisions that compound across many steps.

The fact that 60% of problems sit below a 20% pass rate for current models tells researchers, investors, and companies where the ceiling currently sits.

What this means for the market #

AI companies are increasingly positioning their models as tools for drug discovery, genomic analysis, and biomedical research. A 33.2% pass rate on problems requiring 10 to 40 hours of senior scientist effort is both an honest admission of current limits and a baseline that future models can be measured against.

The GPT-Rosalind efficiency result adds another dimension. If a model can approach the performance of a larger, more expensive model while consuming 31% fewer tokens, the unit economics of deploying AI in research workflows improve considerably.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our

Editorial Policy.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/openai-introduces-ge…] indexed:0 read:2min 2026-06-30 ·