{"slug": "hallucination-behavior-in-multimodal-llms-across-agricultural-image-and-tasks", "title": "Hallucination Behavior in Multimodal LLMs Across Agricultural Image Interpretation and Generation Tasks", "summary": "A new study from arXiv reveals that multimodal large language models (LLMs) frequently produce hallucinated outputs in agricultural imaging tasks, generating biologically inconsistent or agronomically implausible results. In image interpretation, models like Gemma and LLAVA achieved only 63 to 75 percent zero-shot accuracy, while text-to-image models such as GPT-5 and Gemini 2.5 Flash produced up to 91 percent biologically inconsistent scenes under relaxed prompts. The findings highlight critical reliability weaknesses in LLM-based agricultural platforms, potentially leading to misinformed agronomic insights.", "body_md": "arXiv:2605.27595v1 Announce Type: new\nAbstract: Large Language Models (LLMs) are being rapidly adopted in agricultural imaging applications, ranging from crop interpretation to synthetic field image generation. However, these models frequently exhibit hallucinations outputs that appear confident yet deviate from biological or environmental reality potentially leading to misinformed agronomic insights. This study investigates such hallucinations in two complementary directions: image-to-text, where LLMs interpret crop or field imagery to describe conditions such as biotic and abiotic stresses, and text-to-image, where models generate synthetic agricultural scenes based on descriptive prompts. We examine errors involving biological inconsistency, contextual inaccuracy, and agronomic implausibility, evaluating the outputs under domain-informed criteria across multiple imaging modalities. Our analysis identifies recurring hallucination patterns within both interpretive and generative tasks. In image interpretation, LLMs (e.g., Gemma, LLAVA, Qwen, and MiniCPM) achieved modest zero-shot accuracy (63 to 75 percent), whereas few-shot prompting improved performance up to 86.8 percent, exhibiting false detections and missed infections, indicating residual hallucination effects. In text-to-image tasks, advanced models such as GPT-5 and Gemini 2.5 Flash generate up to 91 percent biologically inconsistent scenes under relaxed prompt constraints, revealing fundamental weaknesses in current LLMs. This systematic assessment of visual reasoning and generation offers critical insights toward enhancing the reliability and trustworthiness of LLM-based agricultural imaging platforms.", "url": "https://wpnews.pro/news/hallucination-behavior-in-multimodal-llms-across-agricultural-image-and-tasks", "canonical_source": "https://arxiv.org/abs/2605.27595", "published_at": "2026-05-28 04:00:00+00:00", "updated_at": "2026-05-28 04:27:43.018750+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "computer-vision", "ai-safety", "ai-research"], "entities": ["Gemma", "LLAVA", "Qwen", "MiniCPM", "GPT-5", "Gemini 2.5 Flash"], "alternates": {"html": "https://wpnews.pro/news/hallucination-behavior-in-multimodal-llms-across-agricultural-image-and-tasks", "markdown": "https://wpnews.pro/news/hallucination-behavior-in-multimodal-llms-across-agricultural-image-and-tasks.md", "text": "https://wpnews.pro/news/hallucination-behavior-in-multimodal-llms-across-agricultural-image-and-tasks.txt", "jsonld": "https://wpnews.pro/news/hallucination-behavior-in-multimodal-llms-across-agricultural-image-and-tasks.jsonld"}}