Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models

wpnews.pro

cd /news/generative-ai/can-ai-draw-science-a-benchmark-for-… · home › topics › generative-ai › article

[ARTICLE · art-44335] src=arxiv.org ↗ pub=2026-06-30T04:00Z topic=generative-ai verified=true sentiment=· neutral

Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models

Researchers introduced SciDraw-Bench, a benchmark for evaluating scientific figure generation by text-to-image and multimodal models, covering 32 tasks across eight figure types and ten disciplines. In a pilot study, a domain-specific system, SciDraw AI, substantially outperformed general-purpose models on all evaluation dimensions, with text fidelity remaining the hardest challenge.

read1 min views1 publishedJun 30, 2026

arXiv:2606.28406v1 Announce Type: new Abstract: Text-to-image and multimodal generative models are increasingly used to produce scientific figures such as mechanism diagrams, experimental-design schematics, conceptual frameworks, and graphical abstracts. Yet existing image-generation benchmarks (e.g., GenEval, T2I-CompBench, DPG-Bench) evaluate natural images and measure compositionality, object counting, or photorealism. None of them measure what makes a generated scientific figure usable: correct and legible text labels, faithful depiction of entities and their relations, coherent diagrammatic structure, and adherence to disciplinary drawing conventions. We introduce SciDraw-Bench, a benchmark of 32 structured scientific-figure generation tasks spanning eight figure types and ten disciplines, where each task pairs a natural-language prompt with a machine-checkable specification of required labels, relations, components, conventions, and negative constraints. We propose a four-dimensional evaluation protocol: Text Fidelity (OCR-based label recall and character error rate), Semantic Correctness (vision-language-model judging against the specification), Structural Quality, and Convention Adherence, together with a meta-evaluation protocol and a preliminary inter-judge reliability analysis (human-rating validation is ongoing). We evaluate a domain-specific system, SciDraw AI, against representative general-purpose text-to-image models, and outline a code-to-figure baseline as a planned extension. In a pilot over all eight figure types, the domain-specific system substantially outperforms the general-purpose baselines on every dimension and figure type, with the largest gaps on semantic correctness and convention adherence; text fidelity remains the hardest dimension for all systems.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/can-ai-draw-science-a-be…

Read original on arxiv.org → arxiv.org/abs/2606.28406

mentioned entities

SciDraw-Bench

SciDraw AI

GenEval

T2I-CompBench

DPG-Bench

metadata

slugcan-ai-draw-science-a-benchmark-for-evaluating-scientific-figure-generation-by

topic#generative-ai

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevShow HN: We made an Audio ML sha…

next →OpenAI ads boss David Dugan on t…

── more in #generative-ai 4 stories · sorted by recency

arxiv.org · 30 Jun · #generative-ai

Phonological Perception of Sign Language Models

arxiv.org · 30 Jun · #generative-ai

Data Provenance for Image Auto-Regressive Generation

arxiv.org · 30 Jun · #generative-ai

Semantic-Aware Generative Image Transmission for Resource-Constrained Visual IoT Systems

arxiv.org · 30 Jun · #generative-ai

AEGIS: A Semantic GAN and Evidential Learning Frameworkfor Robust Adversarial Detection in Vision Sensors

── more on @scidraw-bench 3 stories trending now

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 29 Jun · #ai-agents

I built 25 executable skills for AI coding agents �“ all open source

wpnews · 29 Jun · #large-language-models

The Silent Cost of AI Agents: Why Your Next.js SaaS Is Burning Money on LLM Calls

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required