cd /news/artificial-intelligence/openai-announces-benchmarks-for-ai-l… · home topics artificial-intelligence article
[ARTICLE · art-35839] src=science.slashdot.org ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test

OpenAI released LifeSciBench, a 750-task benchmark to evaluate AI systems on realistic life science research tasks. Its top-performing GPT-Rosalind model achieved only a 36.1% pass rate, failing nearly two-thirds of the tasks, with performance dropping further when handling documents, figures, or datasets. The benchmark highlights AI's growing ability in scientific communication but underscores that models are far from replacing human expertise in research.

read1 min views1 publishedJun 20, 2026

This week OpenAI announced a 750-task test to to measure "whether AI systems can support realistic life science research tasks, not just answer biology questions." But while OpenAI's top-performing GPT-Rosalind model led the rankings, Slashdot reader BrianFagioli notes that "it achieved a pass rate of just 36.1 percent, failing nearly two-thirds of benchmark tasks." Nerds.xyz points out that means "the best-performing model failed nearly two-thirds of the benchmark's tasks." The benchmark also revealed a familiar weakness. AI systems generally perform better when everything is presented as text. Once they are forced to work with supporting documents, figures, or complex datasets, performance drops noticeably. GPT-Rosalind's pass rate fell from 45.1 percent on text-only tasks to 28.1 percent on tasks involving artifacts or URLs. To be fair, the benchmark is not intended to suggest AI is useless in research. Quite the opposite. OpenAI found that models are becoming increasingly capable of scientific communication, evidence synthesis, and translating research findings into practical explanations. Those are valuable skills, particularly for researchers drowning in information. But LifeSciBench serves as a useful reminder that today's AI systems are still far from autonomous scientists. They can help. They can assist. They can sometimes provide surprisingly useful insights. What they cannot reliably do, however, is replace the expertise, judgment, and skepticism that real scientific research requires.Read more of this story at Slashdot.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/openai-announces-ben…] indexed:0 read:1min 2026-06-20 ·