GeoDisaster: Benchmarking Orchestrated Agents for Operational Disaster Geo-Intelligence

Researchers introduced GeoDisaster, a benchmark for operational geospatial disaster reasoning with 2,921 instances across 43 question types, and proposed an orchestrated multi-agent framework with 18 disaster-oriented tools. The framework uses Role-Contract Expectation Alignment to improve tool use, evidence grounding, and decision generation, outperforming existing remote-sensing vision-language models.

arXiv:2606.17246v1 Announce Type: new Abstract: Remote-sensing vision-language models RS-VLMs have advanced Earth-observation analysis toward visual interpretation and instruction-following, yet fall short of operational geo-intelligence, which demands tool-grounded spatial reasoning and structured, evidence-backed decisions. We introduce GeoDisaster, an operational geospatial disaster reasoning benchmark with 2,921 verified instances across 43 question types and five task families: deforestation monitoring, multi-hazard analysis, building-damage assessment, flood-safe routing, and Sentinel-1 SAR flood monitoring. Instances integrate heterogeneous EO/GIS evidence-optical and SAR imagery, raster masks, vector geometries, road networks, and exposure layers-spanning hazard detection, damage assessment, exposure estimation, and diagnostic report generation. Ground-truth answers are grounded in executable geospatial workflows and deterministic consistency checks, removing the need for language-model annotation. We further propose an orchestrated multi-agent framework with 18 disaster-oriented tools, where role-specialized agents coordinate through explicit execution contracts, aligned via Role-Contract Expectation Alignment RCEA : failure-aware supervised fine-tuning combined with contract-grounded reinforcement learning over dense step-level signals. Experiments show that GeoDisaster challenges existing RS-VLMs and agentic systems, while RCEA improves tool use, evidence grounding, state consistency, and decision generation.