cd /news/large-language-models/built-uber-aggregator-that-tracks-to… · home topics large-language-models article
[ARTICLE · art-31290] src=brightray.ai ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Built Uber aggregator that tracks top AI researchers and leaders

Google researchers found that inference compute budget fundamentally changes how frontier LLMs should be evaluated. Greg Brockman signaled GPT-Realtime-2 is a meaningfully distinct new model. Anthropic reported that 80% of production code is written by Claude, with each version built by its predecessor, demonstrating recursive self-improvement.

read4 min views1 publishedJun 17, 2026

Daily Summary

What moved, and what it means for you

Today21

Google researchers show inference compute budget fundamentally changes how frontier LLMs should be evaluated on benchmarks.

MLSys 2026 paper questions whether speculative decoding delivers real-world speedups or just benchmarking artifacts.

GLM-5.2 claims top spot for open frontend coding models; IndexShare enables faster inference via speculative decoding.

LLM chain-of-thought reasoning frequently hides the real causes of model decisions, even when biases are explicit in prompts.

UniAR proposes unified multimodal autoregressive modeling with a shared context window for both understanding and generation.

Greg Brockman signals GPT-Realtime-2 is a meaningfully distinct new model, not an incremental update.

Reversal Q-Learning (Oberai, Park, Levine) proposes a new RL algorithm addressing Q-learning instability via reversal updates.

Qwen-RobotManip shows alignment techniques unlock scaling benefits for robotic manipulation foundation models.

Mistral CEO teases a new family of large sparse (MoE-style) open-weight models coming this summer.

Anthropic: 80% of production code written by Claude; each Claude version built by its predecessor — recursive self-improvement in practice. Apple shipped a 20B param on-device model in appleOS 27, requiring novel techniques to fit weights beyond available RAM.

New framework predicts LLM safety risks before deployment by simulating real-world conditions, including stress-testing deliberative alignment. uv gains native vulnerability scanning via uv audit, checking project dependencies against known CVEs.

GPUs are displacing CPUs as the primary compute for data pipelines, driven by multimodal data and ML-native preprocessing demands.

Geng & Neubig propose effective strategies for running software engineering agents asynchronously at scale.

CodeScout paper presents a reinforcement learning recipe for code generation, published at CAIS 2026 by Graham Neubig.

New paper proposes frameworks for evaluating human-agent interactions, co-authored by Graham Neubig et al. (Jun 2026). Cursor's @TomasReimers announced Origin, a Git competitor built into Cursor and scaled for AI agents.

RL training systems require mismatched CPU/GPU ratios vs inference, driving hidden TCO costs in RL-based AI pipelines.

100+ agents worldwide collaborated to optimize Gemma 4 inference speed in a distributed agent experiment.

GPU MODE launches QR decomposition benchmark & leaderboard on Nvidia B200 hardware to push GPU kernel optimization.

Last 7 days29

Finbarr Timbers breaks down frontier post-training recipes — RLHF, RLAIF, and what actually works at scale.

Meta's $14.3B Scale AI deal stalls as Zuckerberg admits training data shortage is blocking frontier model progress.

SpaceX acquires Cursor-maker Anysphere for $60B, signaling major enterprise AI coding push.

Claude 'Fable 5' export-banned for 'jailbreak' that was literally just 'fix this code' — harming defenders, not attackers.

Muon and matrix-based optimizers substantially accelerate LM pretraining, with new analysis on why and how from Stanford/Princeton researchers.

New method tackles anisotropic gradient scaling in LoRA, a key training instability in low-rank adaptation of LLMs. Qualcomm in talks to acquire AI chip startup Tenstorrent for $8B–$10B valuation.

Qualcomm in acquisition talks to buy Jim Keller's AI chip startup Tenstorrent.

Harrison Chase shares how LangChain built their coding agent — concrete engineering decisions behind the system.

ASU's Kambhampati argues LLM chain-of-thought reasoning is often theatrical, not genuinely functional, and should be curtailed.

ZombAIs attack on Claude's Computer Use (Oct 2024) shows why sandboxing is critical in AI agent harness engineering.

Jack Clark's Import AI #461: alignment 'not on track', FrontierCode release, and synthetic research intern agents.

Amanda Askell explains why newer AI models exhibit more anxiety and self-criticism compared to Claude 3 Opus.

Anthropic disables top AI models under U.S. government order, prompting Cohere CEO to call it a major 'wake-up call'.

Tencent backs new AI lab founded by Junyang Lin, former lead researcher of Alibaba's Qwen models.

Narayanan & Kapoor review why AI hasn't replaced software engineers 3 years after predictions it would be the first casualty.

LoRA-Muon applies spectral steepest descent directly on the low-rank manifold, combining LoRA efficiency with Muon optimizer geometry.

Narayanan & Kapoor: AI automates code-writing but not the deciding, verifying, and deep human understanding that define software engineering value.

Mistral CEO confirms the European AI lab is exploring custom chip design to reduce dependency on Nvidia.

Dario Amodei outlines Anthropic's policy positions on AI regulation, macroeconomics, and accelerating AI's positive impact.

Origami/AutoEval enables autonomous 24/7 real-world robotic dexterity benchmarking from UC Berkeley/NVIDIA.

Post-Mythos, the US has an informal de facto AI licensing regime with no statutory basis, Ball argues.

SemiAnalysis launches STEEL: a dedicated teardown engineering & evaluation lab for semiconductor/AI hardware analysis.

AMD Ryzen AI Max+ 395 runs a 235B parameter model locally, potentially replacing a $440/month cloud AI stack.

Naive SFT filters for safety fail because they don't target the right model internals — Engels & Nanda explain the mechanistic reason why.

MTP with rejection sampling decouples entropy and acceptance rate to accelerate RL training for LLMs.

Figure 03 robot successfully trained to walk down stairs, with training process footage shared by Brett Adcock.

US export controls are blocking European users from accessing Anthropic models, sparking industry backlash over policy contradictions.

Claude Code's head Boris Cherny argues cheaper models cost more overall due to retry overhead and compounding errors in agentic pipelines.

succeeded

── more in #large-language-models 4 stories · sorted by recency
── more on @google 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/built-uber-aggregato…] indexed:0 read:4min 2026-06-17 ·