cd /news/machine-learning/thousandworlds-a-benchmark-for-clima… · home topics machine-learning article
[ARTICLE · art-32106] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=· neutral

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

Researchers introduced ThousandWorlds, an ML-ready benchmark dataset for emulating exoplanet climates using machine learning, containing approximately 1800 simulations from five global climate models. The benchmark aims to accelerate the interpretation of atmospheric signatures in the search for life beyond Earth by enabling faster climate emulation.

read1 min views1 publishedJun 18, 2026

arXiv:2606.18338v1 Announce Type: new Abstract: The search for life beyond Earth will depend on detecting faint signatures in the atmospheres of potentially habitable exoplanets. Interpreting those signatures requires understanding the host planet's climate: the same molecule may signal life on one planet and abiotic chemistry on another. Global climate models (GCMs) provide this understanding, but individual runs can require up to millions of core-hours and substantial domain expert time. Machine-learning emulators could remove this bottleneck, but progress has been limited by the absence of a curated, multi-model exoclimate dataset. We introduce ThousandWorlds, an ML-ready benchmark for exoclimate emulation and for the broader regime of low-data, multi-simulator, parameter-to-field regression. The dataset contains approximately 1800 simulations from five GCMs, mapping eight planet parameters to 3D atmospheric fields including temperature, humidity, winds, clouds, and radiation. Three nested subsets define progressively harder challenges: single-simulator regression, multi-simulator regression with complete observations, and multi-simulator regression with structured missingness. We propose two evaluation protocols: one for ranking methods, and one that measures performance relative to the disagreement between GCMs themselves. We evaluate seven baselines spanning simple methods, deep learning, and Gaussian processes. GP-based methods perform best, suggesting that ThousandWorlds exposes a regime where off-the-shelf deep learning does not yet succeed. Data: https://doi.org/10.57967/hf/8695. Code: https://github.com/edstevenson/ThousandWorlds.

── more in #machine-learning 4 stories · sorted by recency
── more on @thousandworlds 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/thousandworlds-a-ben…] indexed:0 read:1min 2026-06-18 ·