PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

Researchers introduced PragReST, a self-supervised framework that uses counterfactual reasoning to improve large language models' pragmatic language understanding. Across four benchmarks, PragReST boosted accuracy by up to 5.50% over backbone models without human-labeled data or teacher distillation. The method primarily reduces errors from failing to contrast observed utterances with plausible alternatives.

arXiv:2606.18624v1 Announce Type: new Abstract: Natural language understanding often depends on meanings that are implied rather than explicitly stated, requiring pragmatic reasoning. Despite strong performance on math and logical reasoning, large language models LLMs still struggle with making pragmatic inferences, often choosing literal interpretations. To improve LLM pragmatic reasoning, we introduce PragReST, a self-supervised framework that constructs pragmatic QA data, generates counterfactual reasoning traces, and trains models to internalize them through supervised fine-tuning and reinforcement learning, without human-labeled training data or distillation from a stronger teacher. Across four pragmatic benchmarks PragMega, Ludwig, MetoQA, and AltPrag , PragReST improves over backbone models, task-specific pragmatic tuning baselines, and non-counterfactual variants of the same pipeline. On accuracy-based benchmarks, PragReST improves over the instruct backbone by 5.37 and 5.50% absolute for Qwen3-8B and Qwen3-14B, respectively. Our error analysis and ablations underscore the importance of counterfactual reasoning: PragReST primarily reduces errors caused by failures to contrast observed utterances with plausible alternatives, and removing counterfactual reasoning substantially reduces performance. Moreover, our training preserves out-of-domain performance on general-knowledge and mathematical reasoning benchmarks.