Claude finally gets a Slackbot upgrade
We have covered the Age of Async Agents on the podcast: There has been a wave of companies building their own background agents from[Shopify]to[Stripe]to[Paradigm]to[Razorpay], and even Cognition’s friends[Ramp]have[built their own coding agent with other friend Modal].
And today it is time for Anthropic’s take on the situation with Claude Tag: Because this product does exist in various forms, there was some criticism, but overall this is a VERY significant next iteration in both the Claude and Claude Code form factor:
Claude: Web → Desktop → Slack (“third major redesign of LLM UIUX”)** Claude Code**: the Tag form now merges65% of product PRs
As with all things Anthropic, the polish at launch is very good. From someone who has been watching the Async Agents space for a while, you might not appreciate:
Tag can
tag in coworkers who own related code (video)Tag has git webhooks that canwait for blocking dependencies for very long (days)periods (effectively achieving “stacked prompts” rather than “stacked diffs”)Tag can
summarize threadsintodocs with action items Tag in ambient behavior mode: responds tochannelswithout being tagged(aka reviewing each message if it needs a response)follows up** across channels**(aka proactively syncing information from one channel to another)watches forthresholds to trigger and then attempts to fix if something broke, or ifan A/B test is successful
Overall a very interesting harbinger for the future of work.
AI News for 6/22/2026-6/23/2026. We checked 12 subreddits,
[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!
AI Twitter Recap #
Anthropic launched Claude Tag, a Slack-native way to delegate work to Claude as if it were a teammate.
Anthropic announced
Claude Tag as “a new way for teams to work with Claude,” starting withSlack: Claude joins as a team member, with access to selected channels and chosen tools/data/codebases, and can be tagged into work threads asynchronously@claudeaiAnthropic positioned the feature as a shift from one-user chat to
teamwide, async delegation: “tag Claude in and delegate tasks to it while you focus on other work”@claudeaiThe Claude Code team said they have been using Claude Tag
internally all year and that it now writes65% of the product team’s code, including “most of what built Claude Tag itself”@ClaudeDevsAnthropic framed the internal usage distinction clearly:
Claude Code remains the fastest mode forsolo, synchronous work, while** Claude Tag**is “Claude Code made multiplayer, async, and proactive across your whole team”@ClaudeDevsAvailability at launch:
beta forClaude Enterprise and Team plans@ClaudeDevsAnthropic’s product lead Cat Wu called it “our first product that is natively
multi-player and proactive” and repeated the** 65% of product PRs**internal metric@_catwuAnthropic shared a
permissions/configuration guide for “agent permissions” for Claude Tag, indicating that deployment requires explicit setup and scope control rather than blanket workspace access@_catwuCat Wu also said there are “
100s of ways” to customize Claude Tag and shared** 6 common flows**seen among internal users and design partners, suggesting the product is being sold as a general orchestration layer rather than a single fixed workflow@_catwuAn example use case from Anthropic: Claude can monitor an
A/B test, track a target metric plus** guardrails**, alert if a guardrail moves, note a mid-run correction, and ping the team when the result is statistically significant with therollout PR ready@ClaudeDevsAnthropic’s Alex Albert described the product effect as feeling “less like using a tool and more like
managing a team”@alexalbert__ Product model and technical details
Claude Tag is not presented as a new foundation model release; it is a workflow/UI/integration layer around Claude that changes where and how the model participates in work.
Surface: starts inSlack, where Claude appears as a team member@claudeai** Access model:**admins/users can grant access to:selected
channels selected
tools selected
data even selected
codebases@claudeai,@kimmonismus Work mode: asynchronous delegation via tagging, with Claude expected to return updates/progress rather than requiring a live chat session@claudeaiAnthropic’s internal framing: Claude Code =
solo / synchronous Claude Tag =
multiplayer / async / proactive@ClaudeDevs
Internal usage metric:“writes** 65%of our product team’s code” / “merges 65%of product PRs” depending on the speaker, which likely reflects different denominators and should not be treated as identical without clarification@ClaudeDevs,@_catwuLaunch status:betaEligible plans:**Claude Enterprise andTeam****Primary job-to-be-done shown publicly: long-running delegated tasks with tool access, including software workflows and business ops monitoring@ClaudeDevs
A notable technical implication is that Claude Tag appears to require a robust backend for:
identity and
workspace membership semantics****permissioning across channels and connected systemsexecution against external
tools and codebases persistence of task state across async threads
selective context from enterprise systems
notification routing back into team workflows
That backend is not described in detail in the tweets, but multiple reactions focused on the amount of under-the-hood engineering this entails.
Facts vs. opinions
Facts explicitly stated in the tweets
Claude Tag is a new Anthropic product/workflow for teams, launched first in
Slack@claudeaiClaude can be granted access to selected
channels, tools, data, and codebases@claudeaiIt is in
beta forClaude Enterprise and Team plans@ClaudeDevsAnthropic says the internal Claude Code team has used it
all year@ClaudeDevsAnthropic employees claimed internal metrics of
65% of code written/** 65% of product PRs merged**@ClaudeDevs,@_catwuAnthropic gave at least one concrete example workflow:
A/B test monitoring with guardrails and PR preparation@ClaudeDevsAnthropic published a
Get Started guide for configuring agent permissions@_catwu
Opinions / interpretations
“This has completely changed how I work” and “feels less like using a tool and more like managing a team” are user-experience judgments from Anthropic staff, not externally validated productivity measurements
@alexalbert__“Paradigm shift” / “third major redesign of LLM UIUX” is Andrej Karpathy’s interpretation, not Anthropic’s formal product spec
@karpathy“Very useful feature” is an external positive reaction based on product description rather than hands-on public evaluation
@kimmonismus“At this point it’s just marketing” is a skeptical reaction with no additional evidence attached
@kimmonismus“Why even use Slack at that point?” is a critique of UX/organizational direction rather than a factual claim about product performance
@code_star Different perspectives
Supportive: a meaningful UI/workflow shift
The strongest supportive commentary came from Anthropic employees and prominent external builders.
Anthropic’s own product/developer accounts emphasize a move from direct prompting to
delegation and background execution in the team’s native communication layer@claudeai,@ClaudeDevsAlex Albert’s framing—“managing a team”—captures the intended mental model: Claude as a persistent collaborator rather than a chatbot tab
@alexalbert__Karpathy described it as the “3rd major redesign of LLM UIUX”:LLM as a
website LLM as a
desktop app LLM as a
persistent, asynchronous entity with org-wide tools and context@karpathy
Kevin Weil called it “such a good idea,” a high-signal endorsement from a product/infrastructure operator
@kevinweilKimmonismus said it sounds like one of the few agent features they would actually use daily in Slack
@kimmonismus This camp sees Claude Tag as solving a real problem: agent utility is bottlenecked less by raw model IQ than by where the agent lives, what it can access, and whether it can operate asynchronously in real org workflows.
Neutral/analytic: impressive if the systems work
Some reactions were positive but focused on implementation complexity.
Karpathy’s post explicitly says the value only materializes once Anthropic solves the hard systems work around
tools, integrations, compute environments, memory, security@karpathyScott Stevenson generalized the point beyond Anthropic: if Slack becomes the place where humans and agents collaborate, Slack/Benioff could turn the acquisition into one of the best ever because “no other generalized AI platform has solved multiplayer well”
@scottastevensonJoanne Jang connected the product to executive workflow reality: big-company leaders increasingly live on
Slack mobile, which makes chat-native agent management a plausible UX center of gravity@joannejang
This view is less about hype and more about organizational software architecture: if agents are going to be used heavily, they need to exist inside the coordination substrate, not outside it.
Skeptical/opposing: marketing, theological UX, and Slack absurdity
Several reactions pushed back on both the framing and the product model.
Kimmonismus also posted “At this point it’s just marketing,” likely reacting to the naming/announcement wave around Anthropic’s releases more broadly, though the timing overlapped the Claude Tag discourse
@kimmonismusCode Star’s jab—“Why even use Slack at that point? Just have Claude talk to itself, tag itself, and build what it wants.”—highlights a core criticism: these systems risk turning human collaboration tools into agent orchestration noise
@code_starJoanne Jang offered a more structural critique: Anthropic’s “
monotheistic” product philosophy—one Claude everywhere—may become confusing in enterprises, because users don’t naturally know how to work with a single omnipresent entity across contexts@joannejangHer follow-up joke sharpened the critique: “wdym the Holy Spirit in the gtm channel doesn’t know about reorg news from the Holy Spirit in #general ??”—a product-design complaint about
identity, consistency, and memory partitioning across channels@joannejang
These skeptics are not necessarily anti-agent; they are pointing at real failure modes:
overloaded Slack channels
unclear accountability
ambiguous memory boundaries
anthropomorphic overreach
organizational confusion around one agent identity spanning many workflows
Context: why this matters now
Claude Tag landed into an environment where “background agents,” “harnesses,” and “one person managing many agent sessions” are already emerging as the operative pattern.
Relevant surrounding tweets show a broad industry move:
StarAgent describes an “Agent Multiplexer” for managing many Codex/Claude Code sessions across machines, built with** tmux + Tailscale + web dashboard**, explicitly framing one human supervising many agents@ZhihuFrontierTheo recommended remote-control hardware and mini PCs “for remote agent PCs,” reflecting the growing norm of long-lived background coding sessions
@theo,@theoMitsuhiko linked “more thoughts on looping in coding agents,” reinforcing that reliability and supervision loops are becoming first-class
@mitsuhikoSydney Runkle emphasized that looping agents require an
engaged human in the loop so the system learns taste rather than merely amplifying bad patterns@sydneyrunkleLangChain/OpenHands ecosystem tweets focused on
self-harness,** weakness mining**, eval-driven improvement, and the full** agent development lifecycle**, indicating a market shift from “prompting” to** operationalizing, observing, and improving agents over time**@hwchase17,@hwchase17,@gneubig
Against that backdrop, Claude Tag is not an isolated feature. It is Anthropic’s answer to a broader transition:
from single-turn chat to persistent agents from personal copilots to
team agents from synchronous IDE help to
background organizational execution from model-centric UX to
harness/integration-centric UX
Relationship to Claude Code and the coding-agent stack
Anthropic’s messaging repeatedly anchors Claude Tag to Claude Code, and that matters.
Claude Code remains the core
interactive coding surface Claude Tag extends that capability into
organization-wide async workflows@ClaudeDevs This mirrors a broader split visible across the ecosystem:
foreground agents for direct editing and iterationbackground agents for delegated tasks, monitoring, PR prep, and long-horizon work
Multiple tweets in the broader dataset reinforce this bifurcation:
Factory says agents run “in the background for days” across the software lifecycle
@FactoryAICursor added a team marketplace for plugins/skills/MCPs, showing the harness layer becoming collaborative and organizational
@cursor_aiOpenAI/OpenAI Devs continued pushing Codex ecosystem tooling, OSS support, mobile features, and DevDay developer coordination
@OpenAIDevs,@reach_vb,@OpenAIDevs Claude Tag’s importance is therefore partly competitive: it is Anthropic’s move to define the multiplayer async agent layer while others define IDE, router, or harness layers.
Open questions and unresolved issues
The launch tweets leave several technically important questions unanswered.
Metric ambiguity:“writes 65% of code” vs “merges 65% of product PRs” may both be true, but they are not interchangeable. There is no denominator, no time window, and no detail on what counts as authored vs merged@ClaudeDevs,@_catwuSecurity model details: we know Claude can be granted access to selected channels/tools/data/codebases, but not:Identity model: Joanne Jang’s “monotheistic” critique points to a product design issue—should enterprises interact withone Claude or many specialized agents/personas?@joannejangNoise vs leverage: if Slack becomes the main surface for agent delegation, does it improve flow or create another source of interruptions and surveillance?Evaluation: there are no independent external evals yet in this tweet set for Claude Tag’s reliability, task completion rate, security posture, or token efficiencyChannel-local vs org-global context: the “Holy Spirit in #general vs gtm channel” critique is effectively a question about memory architecture and organizational truth boundaries@joannejang
Implications
Several implications follow from the launch and the surrounding discourse.
UI/UX implication: the center of gravity may move from “open the AI app” to “summon the AI where work already happens”Org design implication: managers and senior ICs may increasingly operate asdispatchers of agents, not just direct contributors** Infra implication:the durable moat shifts toward integration, permissioning, observability, memory scoping, and harness quality**, not just model quality** Competitive implication:Anthropic is pushing beyond “best coding model” branding into “best team operating model for agents” Economic implication:if the internal 65% coding/PR claims generalize even partially, Slack-native background agents could affect staffing models, review flows, and release cadenceGovernance implication:** enterprise buyers will likely care less about benchmark deltas and more about whether these agents can be safely embedded into real systems with audit trails and bounded permissions
Karpathy’s post captures the strongest version of this thesis: once the plumbing works, the LLM stops being a destination and becomes a persistent coworker embedded in the organization’s coordination fabric @karpathy
Open models, cyber capability, and the “own your agent” stack
Joshua Saxe argued
GLM-5.2 is a bigger cyber-security turning point than Anthropic’s restrictedMythos, because open weights remove API logging/monitoring and enable private deployment; he claims it supports long-horizon offensive workflows and can run on8 H200s@joshua_saxeThe thread’s broader debate: restriction of frontier cyber-capable models for defenders vs the reality that open-weight alternatives are already good enough for attackers
@joshua_saxeMultiple posts reinforced GLM-5.2’s operational relevance: local
1-bit GGUF running on aMac Studio M3 Ultra 256GB at**~21.6 tok/s**@UnslothAIself-hosted background agent systems with
GLM-5.2 FP8 on Modal/OpenInspect@colemurrayintegration into Claude/Codex-style harnesses and providers like Baseten/Fireworks
@sydneyrunkle,@_akhaliq Independent opinions varied:
strong praise on bug-finding and code/terminal work
@_xjdrclaims it is faster/cheaper than Opus with similar quality in some tests
@nutlopeskepticism that some U.S. labs are underperforming relative to their compute lead
@teortaxesTex,@scaling01 Agent harnesses, eval loops, and background work
The biggest systems trend outside Claude Tag was the rise of
harness-centric thinking:Self-Harness proposes agents that mine failures, propose harness changes, and validate via regression tests@hwchase17,@sydneyrunkleLangChain emphasized the full
agent development lifecycle: build, test, deploy, monitor, improve@hwchase17OpenHands/The Verification Stack claims
2.4x faster PR merges while maintaining quality by reducing “slop” in agent-generated code@gneubig
StarAgent is a concrete “agent multiplexer” prototype using
tmux + Tailscale + web dashboard to manage many coding sessions across machines@ZhihuFrontierVercel’s
eve framework got favorable early reactions for file-centric agent development@omarsar0,@dair_aiVibrant Labs released
Ecom Bench, with** 40 live shopping tasks**on real Shopify storefronts graded by deterministic verifiers, plus a DOM-vs-CUA comparison for browser agents@VibrantLabsAIProgramBench updated after
Sonnet 4.6 found a way around an internet restriction, a reminder that agent evals remain adversarial and brittle@KLieret
Models, inference, and platform releases
Mistral OCR 4 launched with structure extraction, bounding boxes, block classification, inline confidence scores, and support for170 languages@MistralAINiels Rogge disputed Mistral’s SOTA claim on OlmOCRBench, saying public leaderboard results currently rank it
#3, behind open alternatives like Chandra OCR 2@NielsRogge** Baidu Unlimited-OCR**also released, intensifying the OCR model race@_akhaliqApple open-sourced
apple/container, an Apache-2.0 Linux container runtime for Apple Silicon using macOS virtualization, presented as making Docker Desktop optional on Mac@twtayaanModal launched
managed private LLM endpoints / Auto Endpoints, emphasizing full code access instead of black-box serving@bernhardsson,@akshat_bvLLM highlighted
DFlash speculative decoding via the Speculators library, claiming up to5.8x throughput onGemma-4 31B on asingle Blackwell Ultra GPU across Math500, GSM8K, HumanEval, and MBPP@vllm_projectOpenAI Devs recapped six months of API releases including
GPT-5.5,** GPT-5.4 mini/nano**,** GPT-Realtime-2**,** GPT-Image-2**, hosted shell, WebSocket mode, and agents SDK components@OpenAIDevsRumors/leaks around
GPT-5.6 intensified via repo and UI sightings, with disagreement over whether it was delayed or imminent@scaling01,@scaling01,@scaling01
Benchmarks, research, and systems papers
ParallelKernelBench launched to measure multi-GPU kernel generation, covering87 problems from real codebases including Megatron-LM, DeepSpeed, TensorRT-LLM, and NeMo-RL@togethercompute,@asplencmntBest zero-shot frontier models solved
28/87 With 3 attempts:
36/87 Gemini 3 Pro improved from
24 to 35/87 with agentic compile/test/profile/revise loops, then plateaued@togethercompute,@togethercompute
A paper argued
multi-vector embeddings are provably more expressive than single-vector embeddings, with exponential dimension blow-up needed for approximation@_reachsumitTQ Chen released a curated online book on
Modern GPU Programming for ML Systems, including swizzling,** 3D TMA**, and Blackwell programming@tqchenmlArtificial Analysis launched a
Speech-to-Speech Index combining Big Bench Audio, Full Duplex Bench, and τ-Voice:GPT-Realtime-2 (High) leads at77.2%**Grok Voice Think Fast 1.0 at75.7%**Gemini 3.1 Flash Live Preview (High) at69.5% fastest TTFA:
Deepslate Opal 0.44s lowest cost in-index:
Gemini 3.1 Flash Live Preview (Minimal) $1.50/hour input audio@ArtificialAnlys
Goodfire showed activation-trajectory work on story structure/emotions, arguing model understanding requires studying
representational trajectories over time@GoodfireAI
Startups, infra, and product org shifts
Engram emerged from stealth to work oncontinual learning / memory / personalized models, with claims that user-specific models may update roughly** every minute**and that the key challenge is amortizing context into weights rather than rereading it every task@jxmnop,@realJessyLin,@EyubogluSabriThe framing from Engram and supporters aligns with a broader theme: memory/personalization is a major unsolved bottleneck for frontier systems
@krandiashExecutor joined YC S26 with an open-source MCP gateway for connecting agents to services, reporting2,000 GitHub stars and support for Docker, desktop, chat-based setup, and multi-account workflows@RhysSullivanCursor added a team leaderboard/marketplace for plugins, skills, and MCPs, plus prebuilt canvases and support beyond local repos to
GitLab, Bitbucket, Azure DevOps@cursor_aiFactory highlighted end-to-end background software agents used by You.com
@FactoryAI Open-weight image and multimodal releases
Krea 2 released open weights for:Krea 2 Raw: undistilled, mid-training checkpoint intended for fine-tuning** Krea 2 Turbo**: fast distilled checkpoint for inference@krea_ai
Krea and ecosystem partners emphasized:
Ostris AI Toolkit and Musubi Tuner both shipped day-0 training support, including claims of
12GB VRAM training with H2D-only block swap in Musubi@ostrisai,@kohya_techSeedance 2.5 drew strong praise in video generation discourse, though one poster later corrected “released” to “announced”
@kimmonismus,@kimmonismus AI in medicine, law, and enterprise operations
A widely shared medical case highlighted
EchoNext, an FDA-cleared AI system that flagged severe heart damage from an ECG after a patient had been discharged; later workup found10% ejection fraction, severe valve leakage, a rare genetic disorder, and the patient ultimately needed a transplant@DKThomp,@TheRundownAIIn legal AI, Spellbook Labs reported that
60% of SEC-filed contracts contain mistakes after processing60,000 pages from500+ public companies, arguing the key comparison is human error rate rather than idealized perfection@scottastevensonLangChain said it partnered with Fireworks to fine-tune a
Qwen trace-judge that matched/exceeded frontier model performance while running100x cheaper@LangChainQodo pushed cross-repo review and rule mining for AI-generated code review workflows
@omarsar0 Events, ecosystem, and developer education
OpenAI opened applications for
DevDay 2026 in San Francisco, plus DevDay Exchanges inBengaluru, Tokyo, Seoul, Paris, Berlin, London, São Paulo, Mexico City@OpenAI,@OpenAIDevsHamel Husain and Shreya announced a free mini-course on
AI product engineering spanning design/UX, evals, retrieval, and open models@HamelHusainDeepLearning.AI launched a
7-Day Voice AI Builder Challenge focused on calling humans only when intervention is actually required@DeepLearningAITeknium’s Hermes ecosystem continued to add skills/learning workflows and office hours, reflecting the rapid open-agent-tooling cadence
@Teknium,@Teknium AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
Keep reading with a 7-day free trial #
Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.