OpenAI launches LifeSciBench to evaluate AI in life sciences

wpnews.pro

cd /news/artificial-intelligence/openai-launches-lifescibench-to-eval… · home › topics › artificial-intelligence › article

[ARTICLE · art-31852] src=cryptobriefing.com ↗ pub=2026-06-17T23:15Z topic=artificial-intelligence verified=true sentiment=· neutral

OpenAI launches LifeSciBench to evaluate AI in life sciences

OpenAI launched LifeSciBench on June 17, a benchmark with 750 expert-authored tasks and nearly 20,000 evaluation criteria to test AI models on real-world biological research workflows. The benchmark, which serves as the primary measure for OpenAI's GPT-Rosalind model, was validated by 173 PhD-level scientists and 453 expert reviewers, with 79% of tasks requiring multi-step reasoning.

read2 min views32 publishedJun 17, 2026

The new benchmark uses 750 expert-authored tasks and nearly 20,000 evaluation criteria to stress-test AI models on real-world biological research workflows

OpenAI just dropped what amounts to a final exam for AI models trying to do real science. LifeSciBench, published on June 17, is a benchmarking tool built to measure how well AI systems handle actual life sciences research, not the sanitized textbook version, but the messy, multi-step, figure-laden work that PhD scientists do every day.

The benchmark includes 750 tasks spanning seven distinct research workflows, from evidence handling and analysis to experimental design, scientific reasoning, and communication.

What makes LifeSciBench different #

The 750 tasks were authored and reviewed by 173 PhD-level scientists with backgrounds in biotechnology and pharmaceuticals. An additional 453 expert reviewers helped validate them. Each task averaged six automated review cycles, and expert consensus required at least 90% agreement before a task made it into the final set.

The tasks come loaded with 1,062 attached artifacts, including figures, PDFs, and datasets. That matters because real research doesn’t happen in clean text boxes. It happens in spreadsheets with missing columns, in blurry gel images, in 40-page supplementary files that nobody wants to read. LifeSciBench forces AI models to deal with all of it.

79% of the tasks require multi-step reasoning, with an average of four reasoning steps per task. The assessment rubric contains 19,020 individual criteria evaluating correctness, justification, and usefulness of AI-generated responses.

The seven biological domains covered span the breadth of modern life sciences research, and the seven workflow categories, evidence handling, analysis, design and optimization, scientific reasoning, validation and operations, translation, and scientific communication, map directly onto how scientists actually spend their time.

GPT-Rosalind and the competitive landscape #

LifeSciBench serves as the primary measuring stick for GPT-Rosalind, OpenAI’s specialized life sciences model that was first introduced in April 2026.

According to OpenAI’s results, GPT-Rosalind leads other models on overall LifeSciBench scores. The competition it was measured against includes GPT-5.5, Grok 4.3, and Gemini 3.1 Pro.

LifeSciBench also joins a growing ecosystem of specialized scientific benchmarks. It complements MedChemBench for medicinal chemistry, GeneBench for genomics, and LabWorkBench for wet-lab troubleshooting, each evaluating token-efficient performance in their respective domains.

What this means for crypto and AI investors #

There’s no direct crypto angle here. LifeSciBench is a pure AI research infrastructure play, and none of the major crypto-focused outlets have drawn connections to blockchain or decentralized science (DeSci) protocols in their coverage.

The sheer scale of expert involvement, 173 contributors and 453 reviewers, highlights something decentralized science protocols have been trying to solve: how to coordinate large numbers of domain experts around a shared research goal. OpenAI did it through traditional means, hiring and contracting. Whether token-incentivized coordination could achieve similar quality at similar scale remains one of DeSci’s biggest open questions.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our

Editorial Policy.

source & further reading

cryptobriefing.com — original article Data center costs surge past $170B in a single quarter as Big Tech’s AI arms race heats up AI-related ETFs now account for a record 19% of all US ETF trading volume Amazon’s debt nearly doubles to $129B amid $220B data center spending spree

~/api · this article 200

$curl api.wpnews.pro/v1/news/openai-launches-lifescib…

Read original on cryptobriefing.com → cryptobriefing.com/openai-lifescibench-ai-life-s…

mentioned entities

OpenAI

LifeSciBench

GPT-Rosalind

GPT-5.5

Grok 4.3

Gemini 3.1 Pro

MedChemBench

GeneBench

metadata

slugopenai-launches-lifescibench-to-evaluate-ai-in-life-sciences

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalcryptobriefing.com

navigation

← prevShow HN: ML condenses billions o…

next →ECB’s Lagarde says AI could trig…

── more in #artificial-intelligence 4 stories · sorted by recency

sourcefeed.dev · 2 Aug · #artificial-intelligence

The GUI for AI Agents Won't Be an OS

officechai.com · 2 Aug · #artificial-intelligence

“Big Deal”: How The Math Community Has Reacted To OpenAI’s Astra Model Solving 10 Open Math Problems

flashblaze.xyz · 2 Aug · #artificial-intelligence

Having fun with oh my pi, DeepSeek-V4-Flash, GPT-5.6 Luna and Antigravity CLI

testingcatalog.com · 2 Aug · #artificial-intelligence

Microsoft tests new MAI Realtime voice model

── more on @openai 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 1 Aug · #ai-agents

Quality Isn't Accidental — Maker/Checker Separation and Automated Validation

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required