Contrastive Reflection for Iterative Prompt Optimization

wpnews.pro

cd /news/large-language-models/contrastive-reflection-for-iterative… · home › topics › large-language-models › article

[ARTICLE · art-45925] src=arxiv.org ↗ pub=2026-07-01T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Contrastive Reflection for Iterative Prompt Optimization

Researchers introduced Contrastive Reflection, an iterative prompt-optimization framework for agentic information retrieval workflows, which uses error-anchored behavioral slices and contrastive examples to propose targeted prompt edits. On HotpotQA, the method improved exact-match accuracy from 51.4% to 60.4%, outperforming failure-only and random-evidence variants, and achieving results comparable to modern prompt optimizers like MIPROv2 (59.4%) and GEPA (57.0%).

read1 min views1 publishedJul 1, 2026

arXiv:2606.30840v1 Announce Type: new Abstract: LLM agents are becoming central to information retrieval: they issue retrieval queries, synthesize answers, and increasingly serve as judges for IR evaluation. Improving the prompts that control these agents is an optimization problem, but in applied IR settings it often looks less like blind search and more like debugging. Engineers need to know which behavior failed, which nearby behavior still worked, what distinguishes the two, and whether a prompt edit improves held-out quality without introducing regressions. We present Contrastive Reflection, an iterative prompt-optimization framework for agentic IR workflows. The framework starts from a task-centric quality definition: QA agents expose retrieval or reasoning traces, and grading agents expose dimension-level scores and rationales. These structured traces are used to identify error-anchored behavioral slices, add nearby successful examples from the same region, and ask a Teacher LLM to propose a targeted prompt edit. Candidate edits are accepted only when validation performance improves, optionally subject to regression checks. We instantiate the framework with a tree-based slice selector, but the contribution is the contrastive reflection loop rather than the tree itself. On a public HotpotQA retrieval-augmented QA setup, one tree-selected contrastive repair improves held-out exact-match accuracy from 51.4% to 60.4%. Failure-only and random-evidence variants improve less and break more previously correct examples. A light instruction-only comparison places the method near modern prompt optimizers: MIPROv2 reaches 59.4% and GEPA 57.0%. The result is an interpretable optimization loop for IR agents, aimed at making prompt repair more inspectable and validation-driven.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/contrastive-reflection-f…

Read original on arxiv.org → arxiv.org/abs/2606.30840

mentioned entities

HotpotQA

MIPROv2

GEPA

arXiv

metadata

slugcontrastive-reflection-for-iterative-prompt-optimization

topic#large-language-models

secondary3 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevI Built 5 Free AI Tools That Rep…

next →Sivers emission övertecknades "f…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 1 Jul · #large-language-models

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support

arxiv.org · 1 Jul · #large-language-models

A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization

arxiv.org · 1 Jul · #large-language-models

Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale

arxiv.org · 1 Jul · #large-language-models

Beyond expert users: agents should help users construct preferences, not just elicit them

── more on @hotpotqa 3 stories trending now

wpnews · 30 May · #ai-tools

I was wasting 10 minutes every Claude session. So I built a fix.

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 2 Jun · #ai-products

Microsoft launches Discovery platform for scientific R&D with Ginkgo Bioworks partnership

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required