# [AINews] Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack

> Source: <https://www.latent.space/p/ainews-claude-tag-multiplayer-proactive>
> Published: 2026-06-24 07:14:26+00:00

# [AINews] Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack

### Claude finally gets a Slackbot upgrade

We have covered [the Age of Async Agents](https://www.latent.space/p/cognition) on the podcast:

There has been a wave of companies building their own background agents from[Shopify]to[Stripe]to[Paradigm]to[Razorpay], and even Cognition’s friends[Ramp]have[built their own coding agent with other friend Modal].

And today it is time for Anthropic’s take on the situation with [Claude Tag](https://www.anthropic.com/news/introducing-claude-tag):

Because this product does exist in various forms, there was some criticism, but overall this is a VERY significant next iteration in both the Claude and Claude Code form factor:

**Claude**: Web → Desktop → Slack (“[third major redesign of LLM UIUX](https://x.com/karpathy/status/2069547676849557725)”)** Claude Code**: the Tag form now merges[65% of product PRs](https://x.com/_catwu/status/2069473118742331608)

As with all things Anthropic, the polish at launch is very good. From someone who has been watching the Async Agents space for a while, you might not appreciate:

Tag can

**tag in coworkers** who own related code ([video](https://x.com/ClaudeDevs/status/2069468902216945939?s=20))Tag has

**git webhooks** that can[wait for blocking dependencies for very long (days)](https://x.com/ClaudeDevs/status/2069468906214007035?s=20)periods (effectively achieving “stacked prompts” rather than “stacked diffs”)Tag can

[summarize threads](https://x.com/ClaudeDevs/status/2069468908026020170?s=20)into**docs with action items** Tag in ambient behavior mode:

[responds to](https://x.com/ClaudeDevs/status/2069468904351727726?s=20)channels**without being tagged**(aka reviewing each message if it needs a response)[follows up](https://x.com/claudeai/status/2069468699766005847?s=20)** across channels**(aka proactively syncing information from one channel to another)[watches for](https://x.com/ClaudeDevs/status/2069468909858873779?s=20)thresholds to trigger and then attempts to fix if something broke, or if[an A/B test is successful](https://x.com/ClaudeDevs/status/2069468911700218284?s=20)

Overall a very interesting harbinger for the future of work.

AI News for 6/22/2026-6/23/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

## AI Twitter Recap

**Anthropic launched Claude Tag, a Slack-native way to delegate work to Claude as if it were a teammate.**

Anthropic announced

**Claude Tag** as “a new way for teams to work with Claude,” starting with**Slack**: Claude joins as a team member, with access to selected channels and chosen tools/data/codebases, and can be tagged into work threads asynchronously[@claudeai](https://x.com/claudeai/status/2069468693017268244)Anthropic positioned the feature as a shift from one-user chat to

**teamwide, async delegation**: “tag Claude in and delegate tasks to it while you focus on other work”[@claudeai](https://x.com/claudeai/status/2069468693017268244)The Claude Code team said they have been using Claude Tag

**internally all year** and that it now writes**65% of the product team’s code**, including “most of what built Claude Tag itself”[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468900216234010)Anthropic framed the internal usage distinction clearly:

**Claude Code** remains the fastest mode for**solo, synchronous work**, while** Claude Tag**is “Claude Code made multiplayer, async, and proactive across your whole team”[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468913264644419)Availability at launch:

**beta** for**Claude Enterprise and Team plans**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468913264644419)Anthropic’s product lead Cat Wu called it “our first product that is natively

**multi-player and proactive**” and repeated the** 65% of product PRs**internal metric[@_catwu](https://x.com/_catwu/status/2069473118742331608)Anthropic shared a

**permissions/configuration guide** for “agent permissions” for Claude Tag, indicating that deployment requires explicit setup and scope control rather than blanket workspace access[@_catwu](https://x.com/_catwu/status/2069484330938998993)Cat Wu also said there are “

**100s of ways**” to customize Claude Tag and shared** 6 common flows**seen among internal users and design partners, suggesting the product is being sold as a general orchestration layer rather than a single fixed workflow[@_catwu](https://x.com/_catwu/status/2069486403696869555)An example use case from Anthropic: Claude can monitor an

**A/B test**, track a target metric plus** guardrails**, alert if a guardrail moves, note a mid-run correction, and ping the team when the result is statistically significant with the**rollout PR ready**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468911700218284)Anthropic’s Alex Albert described the product effect as feeling “less like using a tool and more like

**managing a team**”[@alexalbert__](https://x.com/alexalbert__/status/2069470389391241314)

**Product model and technical details**

Claude Tag is not presented as a new foundation model release; it is a **workflow/UI/integration layer** around Claude that changes where and how the model participates in work.

**Surface:** starts in**Slack**, where Claude appears as a team member[@claudeai](https://x.com/claudeai/status/2069468693017268244)** Access model:**admins/users can grant access to:selected

**channels** selected

**tools** selected

**data** even selected

**codebases**[@claudeai](https://x.com/claudeai/status/2069468693017268244),[@kimmonismus](https://x.com/kimmonismus/status/2069480515103506609)

**Work mode:** asynchronous delegation via tagging, with Claude expected to return updates/progress rather than requiring a live chat session[@claudeai](https://x.com/claudeai/status/2069468693017268244)**Anthropic’s internal framing:** Claude Code =

**solo / synchronous** Claude Tag =

**multiplayer / async / proactive**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468913264644419)

**Internal usage metric:**“writes** 65%**of our product team’s code” / “merges** 65%**of product PRs” depending on the speaker, which likely reflects different denominators and should not be treated as identical without clarification[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468900216234010),[@_catwu](https://x.com/_catwu/status/2069473118742331608)**Launch status:****beta****Eligible plans:****Claude Enterprise** and**Team****Primary job-to-be-done shown publicly:** long-running delegated tasks with tool access, including software workflows and business ops monitoring[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468911700218284)

A notable technical implication is that Claude Tag appears to require a robust backend for:

identity and

**workspace membership semantics****permissioning** across channels and connected systemsexecution against external

**tools and codebases** persistence of task state across async threads

selective context loading from enterprise systems

notification routing back into team workflows

That backend is not described in detail in the tweets, but multiple reactions focused on the amount of under-the-hood engineering this entails.

**Facts vs. opinions**

**Facts explicitly stated in the tweets**

Claude Tag is a new Anthropic product/workflow for teams, launched first in

**Slack**[@claudeai](https://x.com/claudeai/status/2069468693017268244)Claude can be granted access to selected

**channels, tools, data, and codebases**[@claudeai](https://x.com/claudeai/status/2069468693017268244)It is in

**beta** for**Claude Enterprise and Team** plans[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468913264644419)Anthropic says the internal Claude Code team has used it

**all year**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468900216234010)Anthropic employees claimed internal metrics of

**65% of code written**/** 65% of product PRs merged**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468900216234010),[@_catwu](https://x.com/_catwu/status/2069473118742331608)Anthropic gave at least one concrete example workflow:

**A/B test monitoring with guardrails and PR preparation**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468911700218284)Anthropic published a

**Get Started guide** for configuring agent permissions[@_catwu](https://x.com/_catwu/status/2069484330938998993)

**Opinions / interpretations**

“This has completely changed how I work” and “feels less like using a tool and more like managing a team” are user-experience judgments from Anthropic staff, not externally validated productivity measurements

[@alexalbert__](https://x.com/alexalbert__/status/2069470389391241314)“Paradigm shift” / “third major redesign of LLM UIUX” is Andrej Karpathy’s interpretation, not Anthropic’s formal product spec

[@karpathy](https://x.com/karpathy/status/2069547676849557725)“Very useful feature” is an external positive reaction based on product description rather than hands-on public evaluation

[@kimmonismus](https://x.com/kimmonismus/status/2069480515103506609)“At this point it’s just marketing” is a skeptical reaction with no additional evidence attached

[@kimmonismus](https://x.com/kimmonismus/status/2069477547742540283)“Why even use Slack at that point?” is a critique of UX/organizational direction rather than a factual claim about product performance

[@code_star](https://x.com/code_star/status/2069577679754707357)

**Different perspectives**

**Supportive: a meaningful UI/workflow shift**

The strongest supportive commentary came from Anthropic employees and prominent external builders.

Anthropic’s own product/developer accounts emphasize a move from direct prompting to

**delegation and background execution** in the team’s native communication layer[@claudeai](https://x.com/claudeai/status/2069468693017268244),[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468913264644419)Alex Albert’s framing—“managing a team”—captures the intended mental model: Claude as a persistent collaborator rather than a chatbot tab

[@alexalbert__](https://x.com/alexalbert__/status/2069470389391241314)Karpathy described it as the

**“3rd major redesign of LLM UIUX”**:LLM as a

**website** LLM as a

**desktop app** LLM as a

**persistent, asynchronous entity with org-wide tools and context**[@karpathy](https://x.com/karpathy/status/2069547676849557725)

Kevin Weil called it “such a good idea,” a high-signal endorsement from a product/infrastructure operator

[@kevinweil](https://x.com/kevinweil/status/2069485206290248036)Kimmonismus said it sounds like one of the few agent features they would actually use daily in Slack

[@kimmonismus](https://x.com/kimmonismus/status/2069480515103506609)

This camp sees Claude Tag as solving a real problem: **agent utility is bottlenecked less by raw model IQ than by where the agent lives, what it can access, and whether it can operate asynchronously in real org workflows**.

**Neutral/analytic: impressive if the systems work**

Some reactions were positive but focused on implementation complexity.

Karpathy’s post explicitly says the value only materializes once Anthropic solves the hard systems work around

**tools, integrations, compute environments, memory, security**[@karpathy](https://x.com/karpathy/status/2069547676849557725)Scott Stevenson generalized the point beyond Anthropic: if Slack becomes the place where humans and agents collaborate, Slack/Benioff could turn the acquisition into one of the best ever because “no other generalized AI platform has solved multiplayer well”

[@scottastevenson](https://x.com/scottastevenson/status/2069600784589726047)Joanne Jang connected the product to executive workflow reality: big-company leaders increasingly live on

**Slack mobile**, which makes chat-native agent management a plausible UX center of gravity[@joannejang](https://x.com/joannejang/status/2069542309440729112)

This view is less about hype and more about **organizational software architecture**: if agents are going to be used heavily, they need to exist inside the coordination substrate, not outside it.

**Skeptical/opposing: marketing, theological UX, and Slack absurdity**

Several reactions pushed back on both the framing and the product model.

Kimmonismus also posted “At this point it’s just marketing,” likely reacting to the naming/announcement wave around Anthropic’s releases more broadly, though the timing overlapped the Claude Tag discourse

[@kimmonismus](https://x.com/kimmonismus/status/2069477547742540283)Code Star’s jab—“Why even use Slack at that point? Just have Claude talk to itself, tag itself, and build what it wants.”—highlights a core criticism: these systems risk turning human collaboration tools into agent orchestration noise

[@code_star](https://x.com/code_star/status/2069577679754707357)Joanne Jang offered a more structural critique: Anthropic’s “

**monotheistic**” product philosophy—one Claude everywhere—may become confusing in enterprises, because users don’t naturally know how to work with a single omnipresent entity across contexts[@joannejang](https://x.com/joannejang/status/2069567286634267041)Her follow-up joke sharpened the critique: “wdym the Holy Spirit in the gtm channel doesn’t know about reorg news from the Holy Spirit in #general ??”—a product-design complaint about

**identity, consistency, and memory partitioning** across channels[@joannejang](https://x.com/joannejang/status/2069568494275022966)

These skeptics are not necessarily anti-agent; they are pointing at real failure modes:

overloaded Slack channels

unclear accountability

ambiguous memory boundaries

anthropomorphic overreach

organizational confusion around one agent identity spanning many workflows

**Context: why this matters now**

Claude Tag landed into an environment where “background agents,” “harnesses,” and “one person managing many agent sessions” are already emerging as the operative pattern.

Relevant surrounding tweets show a broad industry move:

**StarAgent** describes an “**Agent Multiplexer**” for managing many Codex/Claude Code sessions across machines, built with** tmux + Tailscale + web dashboard**, explicitly framing one human supervising many agents[@ZhihuFrontier](https://x.com/ZhihuFrontier/status/2069310877418082360)Theo recommended remote-control hardware and mini PCs “for remote agent PCs,” reflecting the growing norm of long-lived background coding sessions

[@theo](https://x.com/theo/status/2069370818505937097),[@theo](https://x.com/theo/status/2069376401581457895)Mitsuhiko linked “more thoughts on looping in coding agents,” reinforcing that reliability and supervision loops are becoming first-class

[@mitsuhiko](https://x.com/mitsuhiko/status/2069371901583954275)Sydney Runkle emphasized that looping agents require an

**engaged human in the loop** so the system learns taste rather than merely amplifying bad patterns[@sydneyrunkle](https://x.com/sydneyrunkle/status/2069415731314233524)LangChain/OpenHands ecosystem tweets focused on

**self-harness**,** weakness mining**, eval-driven improvement, and the full** agent development lifecycle**, indicating a market shift from “prompting” to** operationalizing, observing, and improving agents over time**[@hwchase17](https://x.com/hwchase17/status/2069443268593537470),[@hwchase17](https://x.com/hwchase17/status/2069467520474501544),[@gneubig](https://x.com/gneubig/status/2069450515784585572)

Against that backdrop, Claude Tag is not an isolated feature. It is Anthropic’s answer to a broader transition:

from single-turn chat to

**persistent agents** from personal copilots to

**team agents** from synchronous IDE help to

**background organizational execution** from model-centric UX to

**harness/integration-centric UX**

**Relationship to Claude Code and the coding-agent stack**

Anthropic’s messaging repeatedly anchors Claude Tag to **Claude Code**, and that matters.

Claude Code remains the core

**interactive coding surface** Claude Tag extends that capability into

**organization-wide async workflows**[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468913264644419)

This mirrors a broader split visible across the ecosystem:

**foreground agents** for direct editing and iteration**background agents** for delegated tasks, monitoring, PR prep, and long-horizon work

Multiple tweets in the broader dataset reinforce this bifurcation:

Factory says agents run “in the background for days” across the software lifecycle

[@FactoryAI](https://x.com/FactoryAI/status/2069478675880509480)Cursor added a team marketplace for plugins/skills/MCPs, showing the harness layer becoming collaborative and organizational

[@cursor_ai](https://x.com/cursor_ai/status/2069512593887092811)OpenAI/OpenAI Devs continued pushing Codex ecosystem tooling, OSS support, mobile features, and DevDay developer coordination

[@OpenAIDevs](https://x.com/OpenAIDevs/status/2069457015227940891),[@reach_vb](https://x.com/reach_vb/status/2069482272403914760),[@OpenAIDevs](https://x.com/OpenAIDevs/status/2069499656305090671)

Claude Tag’s importance is therefore partly competitive: it is Anthropic’s move to define the **multiplayer async agent layer** while others define IDE, router, or harness layers.

**Open questions and unresolved issues**

The launch tweets leave several technically important questions unanswered.

**Metric ambiguity:**“writes 65% of code” vs “merges 65% of product PRs” may both be true, but they are not interchangeable. There is no denominator, no time window, and no detail on what counts as authored vs merged[@ClaudeDevs](https://x.com/ClaudeDevs/status/2069468900216234010),[@_catwu](https://x.com/_catwu/status/2069473118742331608)**Security model details:** we know Claude can be granted access to selected channels/tools/data/codebases, but not:**Identity model:** Joanne Jang’s “monotheistic” critique points to a product design issue—should enterprises interact with**one Claude** or many specialized agents/personas?[@joannejang](https://x.com/joannejang/status/2069567286634267041)**Noise vs leverage:** if Slack becomes the main surface for agent delegation, does it improve flow or create another source of interruptions and surveillance?**Evaluation:** there are no independent external evals yet in this tweet set for Claude Tag’s reliability, task completion rate, security posture, or token efficiency**Channel-local vs org-global context:** the “Holy Spirit in #general vs gtm channel” critique is effectively a question about memory architecture and organizational truth boundaries[@joannejang](https://x.com/joannejang/status/2069568494275022966)

**Implications**

Several implications follow from the launch and the surrounding discourse.

**UI/UX implication:** the center of gravity may move from “open the AI app” to “summon the AI where work already happens”**Org design implication:** managers and senior ICs may increasingly operate as**dispatchers of agents**, not just direct contributors** Infra implication:**the durable moat shifts toward** integration, permissioning, observability, memory scoping, and harness quality**, not just model quality** Competitive implication:**Anthropic is pushing beyond “best coding model” branding into “best team operating model for agents”** Economic implication:**if the internal 65% coding/PR claims generalize even partially, Slack-native background agents could affect staffing models, review flows, and release cadence**Governance implication:** enterprise buyers will likely care less about benchmark deltas and more about whether these agents can be safely embedded into real systems with audit trails and bounded permissions

Karpathy’s post captures the strongest version of this thesis: once the plumbing works, the LLM stops being a destination and becomes a **persistent coworker embedded in the organization’s coordination fabric** [@karpathy](https://x.com/karpathy/status/2069547676849557725)

**Open models, cyber capability, and the “own your agent” stack**

Joshua Saxe argued

**GLM-5.2** is a bigger cyber-security turning point than Anthropic’s restricted**Mythos**, because open weights remove API logging/monitoring and enable private deployment; he claims it supports long-horizon offensive workflows and can run on**8 H200s**[@joshua_saxe](https://x.com/joshua_saxe/status/2069289170107842572)The thread’s broader debate: restriction of frontier cyber-capable models for defenders vs the reality that open-weight alternatives are already good enough for attackers

[@joshua_saxe](https://x.com/joshua_saxe/status/2069289170107842572)Multiple posts reinforced GLM-5.2’s operational relevance:

local

**1-bit GGUF** running on a**Mac Studio M3 Ultra 256GB** at**~21.6 tok/s**[@UnslothAI](https://x.com/UnslothAI/status/2069418532375564484)self-hosted background agent systems with

**GLM-5.2 FP8** on Modal/OpenInspect[@colemurray](https://x.com/colemurray/status/2069485572339707938)integration into Claude/Codex-style harnesses and providers like Baseten/Fireworks

[@sydneyrunkle](https://x.com/sydneyrunkle/status/2069428101969334598),[@_akhaliq](https://x.com/_akhaliq/status/2069583768747168061)

Independent opinions varied:

strong praise on bug-finding and code/terminal work

[@_xjdr](https://x.com/_xjdr/status/2069543981411893594)claims it is faster/cheaper than Opus with similar quality in some tests

[@nutlope](https://x.com/nutlope/status/2069492037036945634)skepticism that some U.S. labs are underperforming relative to their compute lead

[@teortaxesTex](https://x.com/teortaxesTex/status/2069324315393208801),[@scaling01](https://x.com/scaling01/status/2069513499990950320)

**Agent harnesses, eval loops, and background work**

The biggest systems trend outside Claude Tag was the rise of

**harness-centric** thinking:**Self-Harness** proposes agents that mine failures, propose harness changes, and validate via regression tests[@hwchase17](https://x.com/hwchase17/status/2069443268593537470),[@sydneyrunkle](https://x.com/sydneyrunkle/status/2069476285374464380)LangChain emphasized the full

**agent development lifecycle**: build, test, deploy, monitor, improve[@hwchase17](https://x.com/hwchase17/status/2069467520474501544)OpenHands/The Verification Stack claims

**2.4x faster PR merges** while maintaining quality by reducing “slop” in agent-generated code[@gneubig](https://x.com/gneubig/status/2069450515784585572)

StarAgent is a concrete “agent multiplexer” prototype using

**tmux + Tailscale + web dashboard** to manage many coding sessions across machines[@ZhihuFrontier](https://x.com/ZhihuFrontier/status/2069310877418082360)Vercel’s

**eve** framework got favorable early reactions for file-centric agent development[@omarsar0](https://x.com/omarsar0/status/2069455656214532137),[@dair_ai](https://x.com/dair_ai/status/2069455953863320037)Vibrant Labs released

**Ecom Bench**, with** 40 live shopping tasks**on real Shopify storefronts graded by deterministic verifiers, plus a DOM-vs-CUA comparison for browser agents[@VibrantLabsAI](https://x.com/VibrantLabsAI/status/2069454279073583401)ProgramBench updated after

**Sonnet 4.6** found a way around an internet restriction, a reminder that agent evals remain adversarial and brittle[@KLieret](https://x.com/KLieret/status/2069453334558192070)

**Models, inference, and platform releases**

**Mistral OCR 4** launched with structure extraction, bounding boxes, block classification, inline confidence scores, and support for**170 languages**[@MistralAI](https://x.com/MistralAI/status/2069420263825895917)Niels Rogge disputed Mistral’s SOTA claim on OlmOCRBench, saying public leaderboard results currently rank it

**#3**, behind open alternatives like Chandra OCR 2[@NielsRogge](https://x.com/NielsRogge/status/2069432947711652210)** Baidu Unlimited-OCR**also released, intensifying the OCR model race[@_akhaliq](https://x.com/_akhaliq/status/2069486909852655687)Apple open-sourced

**apple/container**, an Apache-2.0 Linux container runtime for Apple Silicon using macOS virtualization, presented as making Docker Desktop optional on Mac[@twtayaan](https://x.com/twtayaan/status/2069307717177737658)Modal launched

**managed private LLM endpoints / Auto Endpoints**, emphasizing full code access instead of black-box serving[@bernhardsson](https://x.com/bernhardsson/status/2069486092395446774),[@akshat_b](https://x.com/akshat_b/status/2069490362373009420)vLLM highlighted

**DFlash speculative decoding** via the Speculators library, claiming up to**5.8x throughput** on**Gemma-4 31B** on a**single Blackwell Ultra GPU** across Math500, GSM8K, HumanEval, and MBPP[@vllm_project](https://x.com/vllm_project/status/2069494027431649404)OpenAI Devs recapped six months of API releases including

**GPT-5.5**,** GPT-5.4 mini/nano**,** GPT-Realtime-2**,** GPT-Image-2**, hosted shell, WebSocket mode, and agents SDK components[@OpenAIDevs](https://x.com/OpenAIDevs/status/2069499656305090671)Rumors/leaks around

**GPT-5.6** intensified via repo and UI sightings, with disagreement over whether it was delayed or imminent[@scaling01](https://x.com/scaling01/status/2069442918889189588),[@scaling01](https://x.com/scaling01/status/2069507671187710283),[@scaling01](https://x.com/scaling01/status/2069510438878953787)

**Benchmarks, research, and systems papers**

**ParallelKernelBench** launched to measure multi-GPU kernel generation, covering**87 problems** from real codebases including Megatron-LM, DeepSpeed, TensorRT-LLM, and NeMo-RL[@togethercompute](https://x.com/togethercompute/status/2069515311720911082),[@asplencmnt](https://x.com/asplencmnt/status/2069517069453070677)Best zero-shot frontier models solved

**28/87** With 3 attempts:

**36/87** Gemini 3 Pro improved from

**24 to 35/87** with agentic compile/test/profile/revise loops, then plateaued[@togethercompute](https://x.com/togethercompute/status/2069515317823549732),[@togethercompute](https://x.com/togethercompute/status/2069515320466059549)

A paper argued

**multi-vector embeddings** are provably more expressive than single-vector embeddings, with exponential dimension blow-up needed for approximation[@_reachsumit](https://x.com/_reachsumit/status/2069319141128024395)TQ Chen released a curated online book on

**Modern GPU Programming for ML Systems**, including swizzling,** 3D TMA**, and Blackwell programming[@tqchenml](https://x.com/tqchenml/status/2069382647302734099)Artificial Analysis launched a

**Speech-to-Speech Index** combining Big Bench Audio, Full Duplex Bench, and τ-Voice:**GPT-Realtime-2 (High)** leads at**77.2%****Grok Voice Think Fast 1.0** at**75.7%****Gemini 3.1 Flash Live Preview (High)** at**69.5%** fastest TTFA:

**Deepslate Opal 0.44s** lowest cost in-index:

**Gemini 3.1 Flash Live Preview (Minimal) $1.50/hour input audio**[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2069436163065282737)

Goodfire showed activation-trajectory work on story structure/emotions, arguing model understanding requires studying

**representational trajectories over time**[@GoodfireAI](https://x.com/GoodfireAI/status/2069458139280445674)

**Startups, infra, and product org shifts**

**Engram** emerged from stealth to work on**continual learning / memory / personalized models**, with claims that user-specific models may update roughly** every minute**and that the key challenge is amortizing context into weights rather than rereading it every task[@jxmnop](https://x.com/jxmnop/status/2069466137516269684),[@realJessyLin](https://x.com/realJessyLin/status/2069466294718759161),[@EyubogluSabri](https://x.com/EyubogluSabri/status/2069467355424739349)The framing from Engram and supporters aligns with a broader theme: memory/personalization is a major unsolved bottleneck for frontier systems

[@krandiash](https://x.com/krandiash/status/2069473168822292644)Executor joined

**YC S26** with an open-source MCP gateway for connecting agents to services, reporting**2,000 GitHub stars** and support for Docker, desktop, chat-based setup, and multi-account workflows[@RhysSullivan](https://x.com/RhysSullivan/status/2069490113923690747)Cursor added a team leaderboard/marketplace for plugins, skills, and MCPs, plus prebuilt canvases and support beyond local repos to

**GitLab, Bitbucket, Azure DevOps**[@cursor_ai](https://x.com/cursor_ai/status/2069512593887092811)Factory highlighted end-to-end background software agents used by You.com

[@FactoryAI](https://x.com/FactoryAI/status/2069478675880509480)

**Open-weight image and multimodal releases**

**Krea 2** released open weights for:**Krea 2 Raw**: undistilled, mid-training checkpoint intended for fine-tuning** Krea 2 Turbo**: fast distilled checkpoint for inference[@krea_ai](https://x.com/krea_ai/status/2069435590995812396)

Krea and ecosystem partners emphasized:

Ostris AI Toolkit and Musubi Tuner both shipped day-0 training support, including claims of

**12GB VRAM** training with H2D-only block swap in Musubi[@ostrisai](https://x.com/ostrisai/status/2069442414566391929),[@kohya_tech](https://x.com/kohya_tech/status/2069562085592432738)Seedance 2.5 drew strong praise in video generation discourse, though one poster later corrected “released” to “announced”

[@kimmonismus](https://x.com/kimmonismus/status/2069316710545428948),[@kimmonismus](https://x.com/kimmonismus/status/2069356230846316721)

**AI in medicine, law, and enterprise operations**

A widely shared medical case highlighted

**EchoNext**, an FDA-cleared AI system that flagged severe heart damage from an ECG after a patient had been discharged; later workup found**10% ejection fraction**, severe valve leakage, a rare genetic disorder, and the patient ultimately needed a transplant[@DKThomp](https://x.com/DKThomp/status/2069404718749696263),[@TheRundownAI](https://x.com/TheRundownAI/status/2069454020012302536)In legal AI, Spellbook Labs reported that

**60% of SEC-filed contracts contain mistakes** after processing**60,000 pages** from**500+ public companies**, arguing the key comparison is human error rate rather than idealized perfection[@scottastevenson](https://x.com/scottastevenson/status/2069413077351596143)LangChain said it partnered with Fireworks to fine-tune a

**Qwen** trace-judge that matched/exceeded frontier model performance while running**100x cheaper**[@LangChain](https://x.com/LangChain/status/2069404292801298786)Qodo pushed cross-repo review and rule mining for AI-generated code review workflows

[@omarsar0](https://x.com/omarsar0/status/2069405425393619373)

**Events, ecosystem, and developer education**

OpenAI opened applications for

**DevDay 2026** in San Francisco, plus DevDay Exchanges in**Bengaluru, Tokyo, Seoul, Paris, Berlin, London, São Paulo, Mexico City**[@OpenAI](https://x.com/OpenAI/status/2069483224158646739),[@OpenAIDevs](https://x.com/OpenAIDevs/status/2069484303281779090)Hamel Husain and Shreya announced a free mini-course on

**AI product engineering** spanning design/UX, evals, retrieval, and open models[@HamelHusain](https://x.com/HamelHusain/status/2069465758472814602)DeepLearning.AI launched a

**7-Day Voice AI Builder Challenge** focused on calling humans only when intervention is actually required[@DeepLearningAI](https://x.com/DeepLearningAI/status/2069450429465854354)Teknium’s Hermes ecosystem continued to add skills/learning workflows and office hours, reflecting the rapid open-agent-tooling cadence

[@Teknium](https://x.com/Teknium/status/2069527900723073235),[@Teknium](https://x.com/Teknium/status/2069484594659999837)

**AI Reddit Recap**

**/r/LocalLlama + /r/localLLM Recap**

## Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.