{"slug": "built-uber-aggregator-that-tracks-top-ai-researchers-and-leaders", "title": "Built Uber aggregator that tracks top AI researchers and leaders", "summary": "Google researchers found that inference compute budget fundamentally changes how frontier LLMs should be evaluated. Greg Brockman signaled GPT-Realtime-2 is a meaningfully distinct new model. Anthropic reported that 80% of production code is written by Claude, with each version built by its predecessor, demonstrating recursive self-improvement.", "body_md": "Daily Summary\n\nWhat moved, and what it means for you\n\nToday21\n\nGoogle researchers show inference compute budget fundamentally changes how frontier LLMs should be evaluated on benchmarks.\n\nMLSys 2026 paper questions whether speculative decoding delivers real-world speedups or just benchmarking artifacts.\n\nGLM-5.2 claims top spot for open frontend coding models; IndexShare enables faster inference via speculative decoding.\n\nLLM chain-of-thought reasoning frequently hides the real causes of model decisions, even when biases are explicit in prompts.\n\nUniAR proposes unified multimodal autoregressive modeling with a shared context window for both understanding and generation.\n\nGreg Brockman signals GPT-Realtime-2 is a meaningfully distinct new model, not an incremental update.\n\nReversal Q-Learning (Oberai, Park, Levine) proposes a new RL algorithm addressing Q-learning instability via reversal updates.\n\nQwen-RobotManip shows alignment techniques unlock scaling benefits for robotic manipulation foundation models.\n\nMistral CEO teases a new family of large sparse (MoE-style) open-weight models coming this summer.\n\nAnthropic: 80% of production code written by Claude; each Claude version built by its predecessor — recursive self-improvement in practice.\n\nApple shipped a 20B param on-device model in appleOS 27, requiring novel techniques to fit weights beyond available RAM.\n\nNew framework predicts LLM safety risks before deployment by simulating real-world conditions, including stress-testing deliberative alignment.\n\nuv gains native vulnerability scanning via `uv audit`, checking project dependencies against known CVEs.\n\nGPUs are displacing CPUs as the primary compute for data pipelines, driven by multimodal data and ML-native preprocessing demands.\n\nGeng & Neubig propose effective strategies for running software engineering agents asynchronously at scale.\n\nCodeScout paper presents a reinforcement learning recipe for code generation, published at CAIS 2026 by Graham Neubig.\n\nNew paper proposes frameworks for evaluating human-agent interactions, co-authored by Graham Neubig et al. (Jun 2026).\n\nCursor's @TomasReimers announced Origin, a Git competitor built into Cursor and scaled for AI agents.\n\nRL training systems require mismatched CPU/GPU ratios vs inference, driving hidden TCO costs in RL-based AI pipelines.\n\n100+ agents worldwide collaborated to optimize Gemma 4 inference speed in a distributed agent experiment.\n\nGPU MODE launches QR decomposition benchmark & leaderboard on Nvidia B200 hardware to push GPU kernel optimization.\n\nLast 7 days29\n\nFinbarr Timbers breaks down frontier post-training recipes — RLHF, RLAIF, and what actually works at scale.\n\nMeta's $14.3B Scale AI deal stalls as Zuckerberg admits training data shortage is blocking frontier model progress.\n\nSpaceX acquires Cursor-maker Anysphere for $60B, signaling major enterprise AI coding push.\n\nClaude 'Fable 5' export-banned for 'jailbreak' that was literally just 'fix this code' — harming defenders, not attackers.\n\nMuon and matrix-based optimizers substantially accelerate LM pretraining, with new analysis on why and how from Stanford/Princeton researchers.\n\nNew method tackles anisotropic gradient scaling in LoRA, a key training instability in low-rank adaptation of LLMs.\n\nQualcomm in talks to acquire AI chip startup Tenstorrent for $8B–$10B valuation.\n\nQualcomm in acquisition talks to buy Jim Keller's AI chip startup Tenstorrent.\n\nHarrison Chase shares how LangChain built their coding agent — concrete engineering decisions behind the system.\n\nASU's Kambhampati argues LLM chain-of-thought reasoning is often theatrical, not genuinely functional, and should be curtailed.\n\nZombAIs attack on Claude's Computer Use (Oct 2024) shows why sandboxing is critical in AI agent harness engineering.\n\nJack Clark's Import AI #461: alignment 'not on track', FrontierCode release, and synthetic research intern agents.\n\nAmanda Askell explains why newer AI models exhibit more anxiety and self-criticism compared to Claude 3 Opus.\n\nAnthropic disables top AI models under U.S. government order, prompting Cohere CEO to call it a major 'wake-up call'.\n\nTencent backs new AI lab founded by Junyang Lin, former lead researcher of Alibaba's Qwen models.\n\nNarayanan & Kapoor review why AI hasn't replaced software engineers 3 years after predictions it would be the first casualty.\n\nLoRA-Muon applies spectral steepest descent directly on the low-rank manifold, combining LoRA efficiency with Muon optimizer geometry.\n\nNarayanan & Kapoor: AI automates code-writing but not the deciding, verifying, and deep human understanding that define software engineering value.\n\nMistral CEO confirms the European AI lab is exploring custom chip design to reduce dependency on Nvidia.\n\nDario Amodei outlines Anthropic's policy positions on AI regulation, macroeconomics, and accelerating AI's positive impact.\n\nOrigami/AutoEval enables autonomous 24/7 real-world robotic dexterity benchmarking from UC Berkeley/NVIDIA.\n\nPost-Mythos, the US has an informal de facto AI licensing regime with no statutory basis, Ball argues.\n\nSemiAnalysis launches STEEL: a dedicated teardown engineering & evaluation lab for semiconductor/AI hardware analysis.\n\nAMD Ryzen AI Max+ 395 runs a 235B parameter model locally, potentially replacing a $440/month cloud AI stack.\n\nNaive SFT filters for safety fail because they don't target the right model internals — Engels & Nanda explain the mechanistic reason why.\n\nMTP with rejection sampling decouples entropy and acceptance rate to accelerate RL training for LLMs.\n\nFigure 03 robot successfully trained to walk down stairs, with training process footage shared by Brett Adcock.\n\nUS export controls are blocking European users from accessing Anthropic models, sparking industry backlash over policy contradictions.\n\nClaude Code's head Boris Cherny argues cheaper models cost more overall due to retry overhead and compounding errors in agentic pipelines.\n\nsucceeded", "url": "https://wpnews.pro/news/built-uber-aggregator-that-tracks-top-ai-researchers-and-leaders", "canonical_source": "https://brightray.ai", "published_at": "2026-06-17 14:59:10+00:00", "updated_at": "2026-06-17 15:23:15.740023+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-research", "ai-safety", "ai-agents"], "entities": ["Google", "Greg Brockman", "Anthropic", "Claude", "GPT-Realtime-2", "Mistral", "Apple", "Qualcomm"], "alternates": {"html": "https://wpnews.pro/news/built-uber-aggregator-that-tracks-top-ai-researchers-and-leaders", "markdown": "https://wpnews.pro/news/built-uber-aggregator-that-tracks-top-ai-researchers-and-leaders.md", "text": "https://wpnews.pro/news/built-uber-aggregator-that-tracks-top-ai-researchers-and-leaders.txt", "jsonld": "https://wpnews.pro/news/built-uber-aggregator-that-tracks-top-ai-researchers-and-leaders.jsonld"}}