{"slug": "navi-orbital-first-in-orbit-demonstration-of-a-zero-shot-vision-language-model", "title": "NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation", "summary": "On April 16, 2026, NAVI-Orbital became the first system to demonstrate a zero-shot vision-language model performing autonomous multi-modal inference entirely onboard a Low Earth Orbit spacecraft. The system uses Gemma 3 to classify scenes, generate text descriptions, and respond to operator queries via natural language, achieving 88.16% accuracy on the AID benchmark and processing live uncorrected YAM-9 imagery with hardware-accelerated GPU inference. This marks a shift from conventional acquire-then-downlink approaches to in-orbit semantic compression of Earth observations.", "body_md": "arXiv:2606.18271v1 Announce Type: new\nAbstract: As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.", "url": "https://wpnews.pro/news/navi-orbital-first-in-orbit-demonstration-of-a-zero-shot-vision-language-model", "canonical_source": "https://arxiv.org/abs/2606.18271", "published_at": "2026-06-18 04:00:00+00:00", "updated_at": "2026-06-18 04:22:31.529138+00:00", "lang": "en", "topics": ["computer-vision", "natural-language-processing", "ai-agents", "large-language-models", "ai-infrastructure"], "entities": ["NAVI-Orbital", "Gemma 3", "LangGraph", "AID", "YAM-9"], "alternates": {"html": "https://wpnews.pro/news/navi-orbital-first-in-orbit-demonstration-of-a-zero-shot-vision-language-model", "markdown": "https://wpnews.pro/news/navi-orbital-first-in-orbit-demonstration-of-a-zero-shot-vision-language-model.md", "text": "https://wpnews.pro/news/navi-orbital-first-in-orbit-demonstration-of-a-zero-shot-vision-language-model.txt", "jsonld": "https://wpnews.pro/news/navi-orbital-first-in-orbit-demonstration-of-a-zero-shot-vision-language-model.jsonld"}}