OpenAI introduces GeneBench to evaluate AI on computational biology’s hardest problems

wpnews.pro

cd /news/artificial-intelligence/openai-introduces-genebench-to-evalu… · home › topics › artificial-intelligence › article

[ARTICLE · art-45295] src=cryptobriefing.com ↗ pub=2026-06-30T17:24Z topic=artificial-intelligence verified=true sentiment=· neutral

OpenAI introduces GeneBench to evaluate AI on computational biology’s hardest problems

OpenAI released GeneBench, a benchmark of 103 multi-stage genomics problems, revealing that top AI models like GPT-5.5 Pro achieve only a 33.2% pass rate, with 60% of problems below 20% for current models. The benchmark highlights significant limitations in AI's ability to perform complex computational biology tasks that require senior scientists 10-40 hours each.

read2 min views1 publishedJun 30, 2026

Image: Cryptobriefing (auto-discovered)

A new benchmark reveals that even the best AI models struggle to solve the kinds of multi-stage genomics problems that take senior scientists weeks to crack.

OpenAI has published GeneBench, a benchmark designed to stress-test AI models on the kinds of problems that make computational biologists earn their salaries. The benchmark, released as a bioRxiv preprint on April 23, 2026, is not asking models to explain what DNA is. It is asking them to do the actual work.

Each of GeneBench’s 103 problems spans multiple analytical stages across ten domains in genomics and quantitative biology. A senior scientist would need roughly 10 to 40 hours to work through a single one of them. Current AI models are, to put it politely, not there yet.

What the numbers actually say #

GPT-5.5 Pro posted the highest pass rate among evaluated models, at 33.2%. That sounds modest until you see what the rest of the field managed.

The standard version of GPT-5.5 reached a 25.0% pass rate. Gemini 3.1 Pro landed at 11.2%. Roughly 60% of the benchmark’s problems remained below a 20% pass rate even for the best models tested.

The benchmark was published by J. Li and collaborators alongside developments related to GPT-5.5, framing it as both an evaluation tool and a signal of where the field stands heading into a period of rapid model development.

A June 2026 update to a model called GPT-Rosalind achieved a 21.6% pass rate on GeneBench compared to GPT-5.5’s 20.4%, while using 31% fewer tokens to get there.

Why a benchmark like this matters #

GeneBench is grounding its problems in work that reflects what scientists actually do. Multi-stage genomics analysis involves long inference chains, domain-specific reasoning, and decisions that compound across many steps.

The fact that 60% of problems sit below a 20% pass rate for current models tells researchers, investors, and companies where the ceiling currently sits.

What this means for the market #

AI companies are increasingly positioning their models as tools for drug discovery, genomic analysis, and biomedical research. A 33.2% pass rate on problems requiring 10 to 40 hours of senior scientist effort is both an honest admission of current limits and a baseline that future models can be measured against.

The GPT-Rosalind efficiency result adds another dimension. If a model can approach the performance of a larger, more expensive model while consuming 31% fewer tokens, the unit economics of deploying AI in research workflows improve considerably.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our

Editorial Policy.

source & further reading

cryptobriefing.com — original article Anthropic accused of embedding hidden spyware in Claude Code targeting Chinese users Google’s NotebookLM now generates 60-second vertical videos from your notes Anthropic releases Claude Science to automate scientific research

~/api · this article 200

$curl api.wpnews.pro/v1/news/openai-introduces-genebe…

Read original on cryptobriefing.com → cryptobriefing.com/openai-genebench-computationa…

mentioned entities

OpenAI

GeneBench

GPT-5.5

Gemini 3.1 Pro

GPT-Rosalind

J. Li

bioRxiv

metadata

slugopenai-introduces-genebench-to-evaluate-ai-on-computational-biologys-hardest

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalcryptobriefing.com

navigation

← prevShow HN: I built an AI agent to …

next →What would an animal-aligned AI …

── more in #artificial-intelligence 4 stories · sorted by recency

techcrunch.com · 30 Jun · #artificial-intelligence

Anthropic’s Claude Science bets on workflow, not a new model, to win over scientists

dev.to · 30 Jun · #artificial-intelligence

I Spent $50K on AI APIs Last Year — Here's What I'd Do Differently as a...

thenextweb.com · 30 Jun · #artificial-intelligence

AWS is spending $1bn to put its engineers inside customers’ offices

arxiv.org · 30 Jun · #artificial-intelligence

Expert Evaluation of Clinical AI Tools on Real Point-of-Care Clinical Queries

── more on @openai 3 stories trending now

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 29 Jun · #large-language-models

The Silent Cost of AI Agents: Why Your Next.js SaaS Is Burning Money on LLM Calls

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required