Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

wpnews.pro

cd /news/large-language-models/micro-macro-retrieval-reducing-long-… · home › topics › large-language-models › article

[ARTICLE · art-17170] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

Researchers have introduced Micro-Macro Retrieval (M2R), a new framework designed to reduce hallucination in large language models during long-form text generation. The system addresses the problem of factual errors by ensuring key information remains close to model outputs, using a two-tier retrieval process that extracts coarse-grained evidence from external sources and fine-grained details from an internal reasoning repository. Tested across multiple benchmarks, M2R demonstrated significant improvements in factual accuracy, particularly in tasks requiring lengthy context processing.

read1 min views11 publishedMay 29, 2026

arXiv:2605.28828v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form generation where redundant retrieved contexts and lengthy reasoning chains amplify factual errors. Recent studies highlight a critical phenomenon: the closer key information appears to the model outputs, the higher the factual accuracy. However, existing retrieval-augmented language models (RALMs) lack effective mechanisms to ensure this proximity - external evidence is injected into reasoning via multi-turn retrieval, but this cannot ensure key information stays close to the outputs. We propose Micro-Macro Retrieval (M2R), a novel retrieve-while-generate framework to fill this gap. At the macro level, M2R retrieves coarse-grained evidence from external sources; at the micro level, it extracts essential results from a key information repository built during reasoning and reuses them while generating answers. This design directly addresses the key-information-to-output proximity bottleneck, effectively reducing hallucination in long-form tasks. M2R is trained with a curriculum learning-based reinforcement learning strategy using customized rule-based rewards, enabling stable acquisition of retrieval and grounding skills. Extensive experiments across different benchmarks demonstrate the effectiveness of M2R, especially in lengthy-context settings.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/micro-macro-retrieval-re…

Read original on arxiv.org → arxiv.org/abs/2605.28828

mentioned entities

Micro-Macro Retrieval

M2R

Large Language Models

LLMs

Retrieval-Augmented Language Models

RALMs

metadata

slugmicro-macro-retrieval-reducing-long-form-hallucination-in-large-language-models

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 7 Jul · #large-language-models

Human-Centric Reflective Architecture for Human-AI Collaborative Decision-Making

sarahtavel.com · 1 Jul · #large-language-models

AI startups: Sell work, not software

arxiv.org · 1 Jul · #large-language-models

Investigating Multi-Agent Deliberation in Law

bulaev.net · 15 Jul · #large-language-models

Show HN: I spent a month turning LinkedIn into software

── more on @micro-macro retrieval 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required