[AINews] Everything is Conductor

wpnews.pro

an ultra quiet day lets us highlight a smaller trend.

If you’re interested in how AI is improving Healthcare, tune in to our first pod on it out today, and if you want to meet other top engineers in the field, apply to speak!

There’s an ongoing joke in evolutionary biology that “Everything is Crab”: the Crab form factor has independently evolved at least 7 times on earth:

The proximate cause of today’s op-ed is GitHub announcing the new GitHub App - as Oren Melamed says, “If you are code first you might wanna stay on good ol’ VS Code, but if you are agent first and GitHub first you are in for a treat!”

Hmm. That looks familiar…

This is of course very nice for Conductor, which pioneered this form factor, and now has a loudly vocal fan in Garry Tan, the AI pilled CEO of Y Combinator:

Now for two billion dollar questions:

if you pioneered a form factor, how do you monetize it while others copy it? what’s next after this one?

For those interested in alternate histories, here’s what happened with the Kanban board form factor that briefly trended last year:

And here is Maggie Appleton breaking down the design thinking [behind GitHub Ace](https://www.youtube.com/watch?v=ClWD8OEYgp8&t=372s):

AI News for 5/13/2026-5/14/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

AI Twitter Recap

Coding Agent Tooling: Codex Mobile, GitHub’s New App, VS Code Multi-Agent UX, and Hermes/Codex Interop

OpenAI pushed Codex further into day-to-day workflows: the biggest product launch in this set was** Codex in the ChatGPT mobile app**, letting users start tasks, review outputs, approve commands, and steer execution remotely while Codex continues running on a laptop, Mac mini, or devbox. OpenAI also notedRemote SSH is now generally available for managed remote environments, and later addedhooks plusprogrammatic access tokens for Business/Enterprise automation around the Codex loop (OpenAI,OpenAI follow-up,@OpenAIDevs on mobile workflow,@OpenAIDevs on Remote SSH,@OpenAIDevs on hooks/tokens). Separately, OpenAI published a technical writeup on theWi`ndows sandbox for Codex, focused on the tradeoff between utility and constrained machine access for coding agents (OpenAI Devs,@gdb).The broader IDE/app ecosystem is converging on “agent-first” UX: GitHub announced a technical preview of the** GitHub Copilot App**, described as a desktop environment for parallel workstreams, repo/PR lifecycle management, and model flexibility (GitHub,@adrianmg,@OrenMe).VS Code shipped a newAgents window for multi-agent, multi-project workflows, browser/mobile support viavscode.dev/agents, BYOK improvements, and token-efficiency features like compressed terminal output (VS Code,remote/browser support,BYOK updates,terminal compression). On the open side,Nous/Hermes Agent addedCodex runtime integration, effectively routing OpenAI-backed turns through Codex CLI/app-server and reusing ChatGPT subscription-backed execution in Hermes sessions (Nous Research,@Teknium,@HermesAgentTips). Kimi also shippedKimi Web Bridge, a browser extension exposing human-like web interaction to Kimi Code CLI, Claude Code, Cursor, Codex, Hermes, and others (Moonshot AI).

Agent Infrastructure and Self-Improvement Loops: LangSmith Engine, SmithDB, Sandboxes, and Continual Learning

LangChain’s launch stack was the most substantive agent-infra release cluster:** SmithDBis a database purpose-built for agent trace data**, while** LangSmith Engineconsumes traces, clusters failures, identifies likely code issues, and proposes fixes/evals—turning observability into an improvement loop rather than passive inspection (@hwchase17,@caspar_br on Engine,@bentannyhill). Community commentary emphasized SmithDB’s architectural shift toward object storage and a custom storage/query path for this workload shape (@caspar_br on SmithDB,@ngates_,Chinese summary).LangChain also announced LangChain Labs, an applied research effort around continual learningfor agents, with the thesis that production traces should become training signal, evals, and targeted capability improvements over long horizons (LangChain,@jakebroekhuizen,@willccbb,Prime Intellect partnership).Execution isolation for agents continues to mature: W&B/CoreWeave launched CoreWeave Sandboxes**for isolated execution in RL, tool use, and eval workloads, explicitly testing destructive commands likerm -rf /

at scale (Weights & Biases). In a similar spirit, open-source/local dev tooling surfaced around agent debugging:@benhylakhighlighted a free local agent debugging stack with traces exposed to Codex/Claude Code for automated eval authoring.

Anthropic Claude Code Restrictions and the Developer Backlash

The sharpest ecosystem reaction was to Anthropic restricting/reshaping Claude Code usage, especially for third-party wrappers and high-volume programmatic workflows. Theo’s thread became the focal point: he argued users of T3 Code were effectively hit with dramatic rate-limit reductions despite integrating through the officially supported path, and he subsequently cancelled his subscription while encouraging others to post cancellation screenshots for open-source donations (@theo initial thread,subscription cancellation,donation thread,T3 Code clarification). Other prominent builders echoed the complaint that Anthropic had effectively cut off open-source devs/apps and destabilized harnesses built aroundclaude -p

(@theo,@andersonbcdefg).There was also a more strategic counterargument: some users argued Anthropic does not owe developers heavily subsidized flat-fee tokens for third-party apps, and that the ecosystem will likely shift toward more explicit API economics and smarter routing between expensive and cheap models (Sentdex,@tadasayy). Still, the visible churn signal was nontrivial, including users estimating meaningful ARR loss from reply-thread cancellations alone (@thegenioo,Uncle Bob Martin,Theo later). For agent engineers, the practical takeaway is straightforward:subscription-backed harnesses are not stable platform primitives; provider/model abstraction and BYOK paths look increasingly mandatory.

Robotics and Embodied AI: Figure’s 24/7 Sorting Stream and the Broader Automation Signal

Figure’s livestream dominated robotics discussion. The company first showed** 8 hours of fully autonomous, unsupervised work**, then extended to a** 24/7 livestream**, eventually reporting** 24+ hours of continuous autonomous operation without failure**, around** human-parity throughputon small package sorting, and operation by Helix-02 running entirely onboardwith automatic resets for OOD cases—explicitly claiming no teleoperation**(Figure CEO Brett Adcock,24h update,detailed technical clarifications,Day 2 livestream). The repeated “Bob, Frank, and Gary” updates were fluffier, but the core signal was sustained autonomous operation at production-like uptime.Interpretation split between skepticism about Figure specifically and broader conviction about robotics acceleration. Some commenters argued that critics were underestimating what these demonstrations imply for near-term labor substitution, while others noted skepticism was directed more atFigure than atrobotics as a category(@cloneofsimo,@iScienceLuvr,@kimmonismus). Either way, this was one of the clearest “continuous uptime” demos in the batch.

Research, Benchmarks, and Open Models: Diffusion LMs, Time-Series FMs, Mechanistic Interpretability, and RL/Search

A few technically significant model/research releases stood out:** Zyphra’s ZAYA1-8B-Diffusion-Previewclaims a 4.6–7.7x decoding speedupversus autoregressive generation with limited quality loss, making the usual case that diffusion LMs enable cheaper rollouts and richer generation modes (Zyphra).Datadog’s Toto 2.0 released5 open-weights time-series forecasting models** from4M to 2.5B params underApache 2.0, claiming #1 on** BOOM, GIFT-Eval, and TIME**and, more importantly, evidence that scaling laws may finally hold cleanly for TSFMs (Datadog,@atalwalkar,@ClementDelangue).Goodfire’s interpretability post argued that Llama uses a geometric “shape-rotating calculator” / Fourier-feature-like mechanism for arithmetic, with steering-based evidence rather than pure post-hoc description (GoodfireAI,follow-up).

On RL/search and optimizer-style progress, several threads were notable: a survey framing LLM RL as** rollout engineeringacross Generate / Filter / Control / Replayrather than just PPO-vs-GRPO (The Turing Post); Pedagogical RLusing privileged information to actively find useful rollouts (Souradip Chakraborty,@lateinteraction); andPrime Intellect’s autonomous optimizer search** on the nanoGPT speedrun benchmark, whereOpus 4.7 reached 2930 steps andGPT-5.5 2950, beating the** 2990 human baselineafter ~10k runs / ~14k H200 hours (Prime Intellect,@eliebakouch). Also noteworthy:Kimi K2.6 was reported as#1 open-weight model on Finance Agent Benchmark V2**(Moonshot AI), and** Ring-2.6-1T**got day-0 vLLM support as an open release (vLLM).

Top Tweets (by engagement) OpenAI’s Codex mobile launch was the clearest product winner by engagement and practical relevance: remote control/review of running coding-agent sessions from ChatGPT mobile (OpenAI).Theo’s Claude Code backlash threads captured the strongest developer sentiment shift around platform risk and subscription-backed agent workflows (@theo,@theo donations thread).Figure’s autonomous humanoid sorting livestream remained one of the most discussed embodied-AI demos, especially once it crossed the 24-hour mark with detailed claims about onboard policy execution and no teleop (Brett Adcock).GitHub’s Copilot App andLangChain’s Engine/SmithDB/Labs were the most important non-OpenAI tooling launches for agent engineers this cycle (GitHub,LangChain,@hwchase17).Prime Intellect’s autonomous optimizer-search result is worth watching as a concrete example of coding agents being looped into open-ended ML optimization, not just app dev (Prime Intellect).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen 3.6 Local Inference Speedups and Quantization

(Activity: 514):Multi-Token Prediction (MTP) for Qwen on LLaMA.cpp + TurboQuantA patched llama.cpp fork adds Multi-Token Prediction (MTP) support for Qwen plus TurboQuant, reporting21 tok/s

→34 tok/s

on a MacBook Pro M5 Max 64GB, with a claimed90%

MTP acceptance rate; note the raw speedup is ~62%

, not40%

. Code is published atAtomicBot-ai/atomic-llama-cpp-turboquant , with GGUF MTP quantizations for Qwen 3.6 27B/35B in theAtomicChat/qwen-36-udt-mtp

HF collection. Commenters questioned the TurboQuant framing, arguing it is often slower thanf16

,q8

, orq4

; one noted a TurboQuant PR to llama.cpp was rejected because existing Q4 KV-quant rotation support already covered most benefits, with gains mainly at Q3 where quality degradation becomes a concern. Others asked for quality/eval data, since higher speculative/MTP acceptance and tokens/s do not alone establish output parity.Several commenters argued that

TurboQuant is not generally faster in llama.cpp, with one noting it can be slower thanf16

,q8

, orq4

. A prior TurboQuant PR tollama.cpp was reportedly rejected because llama.cpp already implements rotations forQ4

KV-cache quantization, where standardQ4

was faster and showed little gain; TurboQuant may only help aroundQ3

, but with notable quality degradation.Users distinguished between speed, quality, and context tradeoffs:

MTP without TurboQuant was suggested for speed, while standardQ4_1

orQ4_0

quantization was recommended for longer context/quality retention. One commenter questioned whether TurboQuant had any Mac-specific advantage, implying the benefit is hardware- or workload-dependent rather than broadly useful.A commenter recommended using

dflash instead of built-in MTP, claiming it is30–40%

faster. They also mentioned that a pull request for this already existed, suggesting the implementation work may duplicate prior llama.cpp integration efforts.

Keep reading with a 7-day free trial #

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.

source & further reading

latent.space — original article Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO [AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI [AINews] The Field Guide to Fable

[AINews] Everything is Conductor

an ultra quiet day lets us highlight a smaller trend.

Keep reading with a 7-day free trial #

Run your AI side-project on zahid.host