{"slug": "openai-launches-lifescibench-to-evaluate-ai-in-life-sciences", "title": "OpenAI launches LifeSciBench to evaluate AI in life sciences", "summary": "OpenAI launched LifeSciBench on June 17, a benchmark with 750 expert-authored tasks and nearly 20,000 evaluation criteria to test AI models on real-world biological research workflows. The benchmark, which serves as the primary measure for OpenAI's GPT-Rosalind model, was validated by 173 PhD-level scientists and 453 expert reviewers, with 79% of tasks requiring multi-step reasoning.", "body_md": "# OpenAI launches LifeSciBench to evaluate AI in life sciences\n\nThe new benchmark uses 750 expert-authored tasks and nearly 20,000 evaluation criteria to stress-test AI models on real-world biological research workflows\n\nOpenAI just dropped what amounts to a final exam for AI models trying to do real science. LifeSciBench, published on June 17, is a benchmarking tool built to measure how well AI systems handle actual life sciences research, not the sanitized textbook version, but the messy, multi-step, figure-laden work that PhD scientists do every day.\n\nThe benchmark includes 750 tasks spanning seven distinct research workflows, from evidence handling and analysis to experimental design, scientific reasoning, and communication.\n\n## What makes LifeSciBench different\n\nThe 750 tasks were authored and reviewed by 173 PhD-level scientists with backgrounds in biotechnology and pharmaceuticals. An additional 453 expert reviewers helped validate them. Each task averaged six automated review cycles, and expert consensus required at least 90% agreement before a task made it into the final set.\n\nThe tasks come loaded with 1,062 attached artifacts, including figures, PDFs, and datasets. That matters because real research doesn’t happen in clean text boxes. It happens in spreadsheets with missing columns, in blurry gel images, in 40-page supplementary files that nobody wants to read. LifeSciBench forces AI models to deal with all of it.\n\n79% of the tasks require multi-step reasoning, with an average of four reasoning steps per task. The assessment rubric contains 19,020 individual criteria evaluating correctness, justification, and usefulness of AI-generated responses.\n\nThe seven biological domains covered span the breadth of modern life sciences research, and the seven workflow categories, evidence handling, analysis, design and optimization, scientific reasoning, validation and operations, translation, and scientific communication, map directly onto how scientists actually spend their time.\n\n## GPT-Rosalind and the competitive landscape\n\nLifeSciBench serves as the primary measuring stick for GPT-Rosalind, OpenAI’s specialized life sciences model that was first introduced in April 2026.\n\nAccording to OpenAI’s results, GPT-Rosalind leads other models on overall LifeSciBench scores. The competition it was measured against includes GPT-5.5, Grok 4.3, and Gemini 3.1 Pro.\n\nLifeSciBench also joins a growing ecosystem of specialized scientific benchmarks. It complements MedChemBench for medicinal chemistry, GeneBench for genomics, and LabWorkBench for wet-lab troubleshooting, each evaluating token-efficient performance in their respective domains.\n\n## What this means for crypto and AI investors\n\nThere’s no direct crypto angle here. LifeSciBench is a pure AI research infrastructure play, and none of the major crypto-focused outlets have drawn connections to blockchain or decentralized science (DeSci) protocols in their coverage.\n\nThe sheer scale of expert involvement, 173 contributors and 453 reviewers, highlights something decentralized science protocols have been trying to solve: how to coordinate large numbers of domain experts around a shared research goal. OpenAI did it through traditional means, hiring and contracting. Whether token-incentivized coordination could achieve similar quality at similar scale remains one of DeSci’s biggest open questions.\n\n**Disclosure:** This article was edited by Editorial Team. For more information on how we create and review content, see our\n\n[Editorial Policy](https://cryptobriefing.com/editorial-policy/).", "url": "https://wpnews.pro/news/openai-launches-lifescibench-to-evaluate-ai-in-life-sciences", "canonical_source": "https://cryptobriefing.com/openai-lifescibench-ai-life-sciences/", "published_at": "2026-06-17 23:15:32+00:00", "updated_at": "2026-06-17 23:23:25.321093+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-research", "ai-products", "ai-tools", "large-language-models"], "entities": ["OpenAI", "LifeSciBench", "GPT-Rosalind", "GPT-5.5", "Grok 4.3", "Gemini 3.1 Pro", "MedChemBench", "GeneBench"], "alternates": {"html": "https://wpnews.pro/news/openai-launches-lifescibench-to-evaluate-ai-in-life-sciences", "markdown": "https://wpnews.pro/news/openai-launches-lifescibench-to-evaluate-ai-in-life-sciences.md", "text": "https://wpnews.pro/news/openai-launches-lifescibench-to-evaluate-ai-in-life-sciences.txt", "jsonld": "https://wpnews.pro/news/openai-launches-lifescibench-to-evaluate-ai-in-life-sciences.jsonld"}}