cd/entity/AIME· home› entities› AIME

grep -l @aime /news/*.json | wc -l → 11

AIME

mentions 11 type Organization feed RSS

// recent coverage 11 mentions

04:00

2026-07-13

arxiv.org

artificial-intelligence

KV-PRM: Efficient Process Reward Modeling via KV-Cache Transfer for Multi-Agent Test-Time Scaling

Researchers introduced KV-PRM, a process reward model that reduces scoring cost from O(L²) to O(L) by leveraging KV cache from LLM generation, achieving up to 5,000x reduction in FLOPs and 37x reducti…

20:25

2026-07-10

machinebrief.com

artificial-intelligence

Exploring New AI Pathways: TREK's Innovative Approach

Researchers introduced TREK (Teacher-Routed Exploration via Forward KL), a new AI training method that enhances learning through unconventional exploration strategies. TREK significantly improved perf…

04:00

2026-07-10

arxiv.org

large-language-models

When LLMs Agree, Are They Right? Auditing Self-Consistency and Cross-Model Agreement as Confidence Signals

A large-scale study of 53 LLM runners generating 265,000 samples found that self-consistency and cross-model agreement are weak and regime-dependent proxies for correctness, not reliable confidence si…

04:04

2026-07-08

tobyord.com

large-language-models

How well does RL scale?

Toby Ord analyzes the scaling properties of reinforcement learning in AI, distinguishing between RL-scaling (training compute) and inference-scaling (deployment compute). He finds that in OpenAI's o1 …

09:26

2026-07-01

machinebrief.com

large-language-models

Revolutionizing Language Models: The Role of Relative Surprisal Index

Researchers introduced the Relative Surprisal Index (RSI), an information-theoretic metric that balances token probability and entropy to improve reinforcement learning for large language models. The …

06:38

2026-06-19

pub.towardsai.net

artificial-intelligence

I Ran the 3B Model That Beat Gemini 3 Pro at Olympiad Math — It Shouldn't Work

A 3-billion-parameter model from Weibo scored 94.3 on the AIME 2026 math Olympiad, outperforming Google's Gemini 3 Pro which scored 91.7. The model is MIT-licensed and its success challenges assumptio…

04:00

2026-06-19

arxiv.org

large-language-models

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

Researchers introduced GRACE, a theoretical framework that determines the optimal verification granularity for test-time scaling in large language models based on problem difficulty, verifier accuracy…

02:44

2026-06-05

arxiv.org

machine-learning

OPRD: On-Policy Representation Distillation

Researchers have introduced On-Policy Representation Distillation (OPRD), a method that aligns student and teacher model representations across selected layers during training, bypassing the language …

03:24

2026-06-04

latent.space

generative-ai

[AINews] Reve 2 and Ideogram 4: Layouts in Imagegen

Reve and Ideogram both launched new image-generation models on June 2, 2026, with each company emphasizing advances in layout control through improved labeling and code. Ideogram 4.0 is now ranked as …

13:05

2026-05-31

firethering.com

large-language-models

MiniCPM5-1B Shows Why the Small-Model Race Isn’t Over

OpenBMB released MiniCPM5-1B, a one-billion-parameter model that scores 40.42 on the AIME 2025 mathematics exam, outperforming larger competitors including LFM2.5-1.2B and Qwen3-0.6B. The model offers…

04:00

2026-05-29

arxiv.org

large-language-models

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Researchers have developed Aryabhata 2, a reasoning-focused language model for competitive STEM examinations, trained via reinforcement learning on PhysicsWallah's internal question banks. The model o…

// co-occurs with top 8 entities

GSM8K 2 Aryabhata 2 1 GPT-OSS-20B 1 PhysicsWallah 1 JEE 1 NEET 1 HMMT 1 MMLU-Pro 1