OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

wpnews.pro

cd /news/artificial-intelligence/openai-releases-lifescibench-a-750-t… · home › topics › artificial-intelligence › article

[ARTICLE · art-31997] src=marktechpost.com ↗ pub=2026-06-18T02:28Z topic=artificial-intelligence verified=true sentiment=· neutral

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

OpenAI released LifeSciBench, a benchmark of 750 expert-authored tasks evaluating AI models on real life-science research. The best model, GPT-Rosalind, scored 36.1%, indicating significant room for improvement in reasoning and operational tasks.

read1 min views34 publishedJun 18, 2026

OpenAI's LifeSciBench evaluates whether frontier AI can handle real life-science research across 750 expert-authored tasks, seven workflows, and seven biological domains. Built by 173 PhD scientists with 19,020 rubric criteria, it grades reasoning and decisions, not just recall. The best model, GPT-Rosalind, passes 36.1%, leaving large headroom on artifacts, exact outputs, and operational calls.

The post OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric appeared first on MarkTechPost.

source & further reading

marktechpost.com — original article NVIDIA AI Releases Molt: A PyTorch-Native Agentic Reinforcement Learning Framework AMD Releases Instella-MoE-16B-A3B: A Fully Open Mixture-of-Experts LLM With 2.8B Active Parameters Trained On Instinct GPUs Accelerating Transformer Training with NVIDIA Transformer Engine, Fused Kernels, BF16, FP8, and GPU Benchmarking

~/api · this article 200

$curl api.wpnews.pro/v1/news/openai-releases-lifescib…

Read original on marktechpost.com → www.marktechpost.com/2026/06/17/openai-releases-…

mentioned entities

OpenAI

LifeSciBench

GPT-Rosalind

MarkTechPost

metadata

slugopenai-releases-lifescibench-a-750-task-benchmark-grading-ai-models-on-real-life

topic#artificial-intelligence

secondary3 topics

sentimentneutral

canonicalmarktechpost.com

navigation

← prevBrandScreen Launch – Screen bran…

next →Your Definition of Done Is Wrong

── more in #artificial-intelligence 4 stories · sorted by recency

sourcefeed.dev · 2 Aug · #artificial-intelligence

The GUI for AI Agents Won't Be an OS

officechai.com · 2 Aug · #artificial-intelligence

“Big Deal”: How The Math Community Has Reacted To OpenAI’s Astra Model Solving 10 Open Math Problems

science.slashdot.org · 20 Jun · #artificial-intelligence

OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test

runtimewire.com · 18 Jun · #artificial-intelligence

OpenAI's LifeSciBench turns life-science AI into a harder test than biology trivia

── more on @openai 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required