# Spurious Rewards: Rethinking Training Signals in RLVR — interactive visual explainer | Rudrite Research

> Source: <https://research.rudrite.com/spurious-rewards>
> Published: 2026-06-13 00:00:00+00:00

# Spurious Rewards: Rethinking Training Signals in RLVR

On Qwen, even random or wrong RLVR rewards lift math accuracy — what the signal really does.

Shao et al. · arXiv 2025 · Reasoning & RL. [Read the paper ↗](https://arxiv.org/abs/2506.10947)

A free, interactive, animated visual explainer of Spurious Rewards: Rethinking Training Signals in RLVR — every exhibit computed from the real formulas, with verbatim quotes from the source.

## Questions

- What is Spurious Rewards: Rethinking Training Signals in RLVR?
- On Qwen, even random or wrong RLVR rewards lift math accuracy — what the signal really does.
- Who published Spurious Rewards: Rethinking Training Signals in RLVR, and where?
- Shao et al. — arXiv 2025 (arXiv:2506.10947).
- Where can I find a visual explainer of Spurious Rewards: Rethinking Training Signals in RLVR?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

## Related explainers

[DeepSeek-R1](/deepseek-r1)[Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](/chain-of-thought)[Training language models to follow instructions with human feedback](/instructgpt)[Direct Preference Optimization: Your Language Model is Secretly a Reward Model](/dpo)[DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](/deepseekmath)[Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters](/test-time-compute)[Constitutional AI: Harmlessness from AI Feedback](/constitutional-ai)[DAPO: An Open-Source LLM Reinforcement Learning System at Scale](/dapo)