MATH

mentions 8 type Organization feed RSS

// recent coverage 8 mentions

04:00

2026-07-21

arxiv.org

artificial-intelligence

Though Language Models Err While They Strive: Conformal Prediction for Self-Correcting Scientific Generation

A new conformal prediction framework called Scientific Feasibility Control (SFC) achieves 50.1% accuracy on the PhyX physics reasoning benchmark, outperforming DeepSeek-R1 (49.8%) and GPT-4 (45.8%), w…

04:00

2026-07-21

machinebrief.com

artificial-intelligence

Oracle Gap and Signal Fidelity: A Fixed-Pool Diagnostic for Test-Time Collaboration

A new arXiv paper (2607.17531v1) introduces a fixed-pool diagnostic framework for test-time collaboration in LLMs, decomposing net gains into recoverable mass, verification-signal coverage, conditiona…

04:00

2026-07-21

machinebrief.com

large-language-models

PPL-Factory: Task-Aware and Budget-Aware Data Selection from Language Modeling to Reasoning

Researchers propose PPL-Factory, a task-aware and budget-aware data selection framework for large language model fine-tuning that uses perplexity-based scores to select informative training samples. E…

04:00

2026-07-13

arxiv.org

artificial-intelligence

KV-PRM: Efficient Process Reward Modeling via KV-Cache Transfer for Multi-Agent Test-Time Scaling

Researchers introduce KV-PRM, a process reward model that reduces scoring cost from O(L²) to O(L) by reading the KV cache from LLM generation instead of re-encoding trajectory text, achieving up to 5,…

05:53

2026-07-10

github.com

artificial-intelligence

TinyToT – Tree of Thoughts Inference Server

TinyToT, a lightweight inference server compatible with Ollama, achieves 97% accuracy on a 35-question benchmark spanning graduate-level science, medicine, law, finance, and software engineering witho…

16:07

2026-06-17

danlevy.net

large-language-models

LLM benchmarks are answering someone else's question

LLM benchmarks like MMLU and HumanEval are irrelevant for most businesses building AI products, as they measure generic performance rather than specific system tasks. Teams should instead build custom…

06:15

2026-06-05

sapient.inc

large-language-models

Sapient HRM-Text – a 1B PoC text gen model based on the HRM architecture

Sapient Inc. open-sourced HRM-Text in May 2026, a 1.15 billion parameter text generation model based on the HRM architecture. Trained on roughly 40 billion tokens—up to 1,000 times less data than comp…

04:00

2026-06-03

arxiv.org

large-language-models

Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference

Researchers have developed Fast-dLLM++, a training-free extension to diffusion large language models that accelerates inference by selecting parallel token commit sets based on the full sorted confide…

// co-occurs with top 8 entities

GSM8K 4 MMLU 2 HumanEval 2 LLeMU 1 GPT-5 1 Sapient 1 HRM-Text 1 DROP 1

// topics top 6 topics

large language models 8 artificial intelligence 6 ai research 5 ai tools 2 ai safety 2 machine learning 2