cd /news/artificial-intelligence/openai-launches-lifescibench-to-eval… · home topics artificial-intelligence article
[ARTICLE · art-31852] src=cryptobriefing.com ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

OpenAI launches LifeSciBench to evaluate AI in life sciences

OpenAI launched LifeSciBench on June 17, a benchmark with 750 expert-authored tasks and nearly 20,000 evaluation criteria to test AI models on real-world biological research workflows. The benchmark, which serves as the primary measure for OpenAI's GPT-Rosalind model, was validated by 173 PhD-level scientists and 453 expert reviewers, with 79% of tasks requiring multi-step reasoning.

read2 min views1 publishedJun 17, 2026

The new benchmark uses 750 expert-authored tasks and nearly 20,000 evaluation criteria to stress-test AI models on real-world biological research workflows

OpenAI just dropped what amounts to a final exam for AI models trying to do real science. LifeSciBench, published on June 17, is a benchmarking tool built to measure how well AI systems handle actual life sciences research, not the sanitized textbook version, but the messy, multi-step, figure-laden work that PhD scientists do every day.

The benchmark includes 750 tasks spanning seven distinct research workflows, from evidence handling and analysis to experimental design, scientific reasoning, and communication.

What makes LifeSciBench different #

The 750 tasks were authored and reviewed by 173 PhD-level scientists with backgrounds in biotechnology and pharmaceuticals. An additional 453 expert reviewers helped validate them. Each task averaged six automated review cycles, and expert consensus required at least 90% agreement before a task made it into the final set.

The tasks come loaded with 1,062 attached artifacts, including figures, PDFs, and datasets. That matters because real research doesn’t happen in clean text boxes. It happens in spreadsheets with missing columns, in blurry gel images, in 40-page supplementary files that nobody wants to read. LifeSciBench forces AI models to deal with all of it.

79% of the tasks require multi-step reasoning, with an average of four reasoning steps per task. The assessment rubric contains 19,020 individual criteria evaluating correctness, justification, and usefulness of AI-generated responses.

The seven biological domains covered span the breadth of modern life sciences research, and the seven workflow categories, evidence handling, analysis, design and optimization, scientific reasoning, validation and operations, translation, and scientific communication, map directly onto how scientists actually spend their time.

GPT-Rosalind and the competitive landscape #

LifeSciBench serves as the primary measuring stick for GPT-Rosalind, OpenAI’s specialized life sciences model that was first introduced in April 2026.

According to OpenAI’s results, GPT-Rosalind leads other models on overall LifeSciBench scores. The competition it was measured against includes GPT-5.5, Grok 4.3, and Gemini 3.1 Pro.

LifeSciBench also joins a growing ecosystem of specialized scientific benchmarks. It complements MedChemBench for medicinal chemistry, GeneBench for genomics, and LabWorkBench for wet-lab troubleshooting, each evaluating token-efficient performance in their respective domains.

What this means for crypto and AI investors #

There’s no direct crypto angle here. LifeSciBench is a pure AI research infrastructure play, and none of the major crypto-focused outlets have drawn connections to blockchain or decentralized science (DeSci) protocols in their coverage.

The sheer scale of expert involvement, 173 contributors and 453 reviewers, highlights something decentralized science protocols have been trying to solve: how to coordinate large numbers of domain experts around a shared research goal. OpenAI did it through traditional means, hiring and contracting. Whether token-incentivized coordination could achieve similar quality at similar scale remains one of DeSci’s biggest open questions.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our

Editorial Policy.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/openai-launches-life…] indexed:0 read:2min 2026-06-17 ·