{"slug": "discourse-role-labels-as-presentation-time-variables-for-context-use-in-language", "title": "Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models", "summary": "Researchers introduced a paired fixed-content probe over 500 MMLU-Pro items to test how discourse-role labels such as Instruction:, Reference:, and Example: affect language model adoption of misleading information. Across GPT-5.5, DeepSeek V4 Pro, Llama-3-8B-Instruct, and Qwen2.5-7B-Instruct, Misleading Adoption Rate shifted by 56-84 percentage points, with binding labels like Instruction: and Reference: driving high adoption while Example: consistently suppressed it. The findings show that context-utilization and reader-side RAG benchmarks should report and control wrapper labels, as presentation choices can significantly change measured reliance on supplied context.", "body_md": "arXiv:2606.04109v1 Announce Type: new\nAbstract: Context-augmented language model systems often wrap supplied content with labels such as Reference:, Evidence:, Instruction:, Note:, or Example:, but the effect of these labels on reader-model behavior remains underexplored. We introduce a paired fixed-content probe over 500 MMLU-Pro items: each item receives the same misleading answer-bearing assertion under different discourse-role labels, and adoption is measured by whether the model outputs the injected wrong option. Across GPT-5.5, DeepSeek V4 Pro, Llama-3-8B-Instruct, and Qwen2.5-7B-Instruct, Misleading Adoption Rate shifts by 56-84 percentage points. Binding or source-like labels such as Instruction: and Reference: produce high adoption, whereas Example: consistently suppresses it. Paired tests, bootstrap intervals, final-instruction ablations, and Qwen final-step log-probability probes support a label-conditioned candidate preference. Boundary probes show where the effect weakens or persists: arithmetic tasks reduce adoption, passage-shaped external context preserves smaller label gaps, short-answer evaluation rules out option-letter copying, and nested-label conflicts suggest that illustrative framing can delimit adoption scope. A 200-case single-author manual audit confirms that the short-answer contrasts are stable under conservative adjudication. The resulting claim is bounded but practical: context-utilization and reader-side RAG benchmarks should report and control wrapper labels, because presentation choices can change measured reliance on supplied context.", "url": "https://wpnews.pro/news/discourse-role-labels-as-presentation-time-variables-for-context-use-in-language", "canonical_source": "https://arxiv.org/abs/2606.04109", "published_at": "2026-06-04 04:00:00+00:00", "updated_at": "2026-06-04 04:21:01.149629+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-research"], "entities": ["GPT-5.5", "DeepSeek V4 Pro", "Llama-3-8B-Instruct", "Qwen2.5-7B-Instruct", "MMLU-Pro"], "alternates": {"html": "https://wpnews.pro/news/discourse-role-labels-as-presentation-time-variables-for-context-use-in-language", "markdown": "https://wpnews.pro/news/discourse-role-labels-as-presentation-time-variables-for-context-use-in-language.md", "text": "https://wpnews.pro/news/discourse-role-labels-as-presentation-time-variables-for-context-use-in-language.txt", "jsonld": "https://wpnews.pro/news/discourse-role-labels-as-presentation-time-variables-for-context-use-in-language.jsonld"}}