# [AINews] Founders and Forward Deployed Engineers

> Source: <https://www.latent.space/p/ainews-founders-and-forward-deployed>
> Published: 2026-05-30 01:57:15+00:00

# [AINews] Founders and Forward Deployed Engineers

### a quiet day lets us highlight the new AIE WF focuses

Most people are still digesting the [massive Anthropic news](https://www.latent.space/p/ainews-anthropic-raises-965b-series) from yesterday.

We’re taking the opportunity to solicit [the leading AI FDE’s](https://ai.engineer/cfp) in the world for AIE’s new Forward Deployed Engineer track, mirroring similar pushes from both [OpenAI DeployCo](https://www.latent.space/p/ainews-thinking-machines-native-interaction) and [Anthropic DeployCo](https://www.blackstone.com/news/press/anthropic-partners-with-blackstone-hellman-friedman-and-goldman-sachs-to-launch-enterprise-ai-services-firm/):

as well as AIE’s new Founders program, where we are doing our version of the Startup Battlefield, a competitive pitch contest anchored by YCombinator’s Garry Tan and Howie Lu’s [$10 Million dollar Hyperagent ](https://x.com/howietl/status/2057823823526014990)contest. Sign up (and [book hotel](https://www.ai.engineer/worldsfair/2026#venue)!) for details today if you are keen.

AI News for 5/28/2026-5/29/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

**AI Twitter Recap**

**Claude Opus 4.8 Rollout, Benchmark Friction, and API Ergonomics**

**Opus 4.8 landed into a noisy, mixed eval landscape**: multiple independent benches converged on “incremental but not dominant.”[@arena](https://x.com/arena/status/2060160804767584512)pushed**200+ frontend/code tests** comparing Opus 4.8 against prior Opus variants, Gemini, and GLM;[@theo](https://x.com/theo/status/2060172445592789064)reported CursorBench shows it as**more efficient but slightly worse than 4.7 within margin of error**;[@jerryjliu0](https://x.com/jerryjliu0/status/2060196252642648427)and[@llama_index](https://x.com/llama_index/status/2060165358569337102)found**small gains on tables/layout** but regressions on**content faithfulness/charts** in document parsing;[@scaling01](https://x.com/scaling01/status/2060335738172911766)said**no progress on ALE-Bench** and separately flagged interesting failure modes on LisanBench. On the positive side,[@jeremyphoward](https://x.com/jeremyphoward/status/2060195641847107722)found 4.8**less over-agentic and more cooperative** than 4.7/GPT-5.5 in coding, while[@leo_linsky](https://x.com/leo_linsky/status/2060205310871326894)called it a tangible product improvement over prior Anthropic releases.**Anthropic also shipped useful platform-level changes**:[@ClaudeDevs](https://x.com/ClaudeDevs/status/2060432688281251998)announced** mid-conversation system instructions without breaking prompt cache**, plus authoritative mid-conversation system-role updates, which matters for long-running agent sessions and cost control. But pricing remains a major complaint:[@jeremyphoward](https://x.com/jeremyphoward/status/2060198836963061998)argued Anthropic has done little for**API affordability**, preferring GPT-5.5 partly because subscription/API economics are easier to justify. Overall takeaway: 4.8 looks like a meaningful quality-of-life release for real use, not a clean benchmark reset.

**Agent Harnesses, Multi-Turn RL Bugs, and the Infrastructure Around Autonomy**

**A subtle but important RL failure mode got called out**:[@ClementDelangue](https://x.com/ClementDelangue/status/2060175330665508917)highlighted a Hugging Face deep-dive on why many**tool-using, multi-turn RL training loops are silently broken**. The core bug: decoding model output, parsing tool calls, then** re-tokenizing**the updated conversation can change tokenization, so gradients are applied to sequences the model never actually sampled. The proposed fix is a strict**“Token-In, Token-Out”** rule: never re-encode sampled tokens; keep a single token buffer across turns.[@johnschulman2](https://x.com/johnschulman2/status/2060392679528337714)reinforced the broader point that**renderers are foundational** infrastructure between messages and tokens, with failure modes spanning train/test mismatch, caching inefficiency, and prompt injection risk.**Harness design is becoming its own optimization discipline**:[@omarsar0](https://x.com/omarsar0/status/2060371848010019001)surfaced work on** Effective Feedback Compute (EFC)**, claiming raw token/tool counts explain agent success poorly while EFC reaches** R² up to 0.99**, implying harness quality matters more than gross activity. This lines up with productized tuning efforts like[@LangChain](https://x.com/LangChain/status/2060349231722852680), where**Deep Agents v0.6** makes**harness profiles** first-class to get strong performance from Qwen/Kimi/DeepSeek at**20x+ lower cost** than frontier APIs, and[@hwchase17](https://x.com/hwchase17/status/2060355016989585919)explicitly framing “different models need different prompts/tools.”[@vllm_project](https://x.com/vllm_project/status/2060208480292843720)shipped**native weight syncing APIs** and improved pause/resume for async RL, and later added[fastokens](https://x.com/vllm_project/status/2060414393666679229), a**Rust BPE tokenizer** to reduce CPU tokenization bottlenecks in long-context/agentic workloads.**Debate is shifting from “single vs multi-agent” to where the abstraction pays**:[@OfirPress](https://x.com/OfirPress/status/2060352260723392658)argued current multi-agent systems are mostly** speedups, not capability unlocks**;[@scaling01](https://x.com/scaling01/status/2060363050272653625)took the opposite view, expecting swarm-style training to yield better planning and superintelligence-like behavior. Either way, the practical trend is clear: more teams are building around**agent observability, traces, and continual improvement loops**, e.g.[@Vtrivedy10](https://x.com/Vtrivedy10/status/2060406006329278970)on mining production traces for SFT/distillation and long-horizon continual learning.

**Open Models, Local AI, and the OSS Toolchain Tightening Up**

**Local-first and open-weight momentum continues to rise**:[@LangChain](https://x.com/LangChain/status/2060405874993115532)said** 1 in 3 AI teams**ran an open-weights model in April 2026, up from** 1 in 5**nine months earlier;[@EpochAIResearch](https://x.com/EpochAIResearch/status/2060451576779886942)estimated open-weight models now lag frontier proprietary models by about**four months**. On the toolchain side,[@ggerganov](https://x.com/ggerganov/status/2060394400237109567)launched** llama.app**, giving llama.cpp an official website, a unified installer, and a single`llama`

entrypoint aimed at easier local deployment and third-party agent integration.[@ollama](https://x.com/ollama/status/2060428074102206496)announced**OpenJarvis** as a local-first personal AI via Ollama, explicitly tied to Stanford/Hazy’s “Intelligence Per Watt” framing.**Open infrastructure is getting more enterprise-shaped**:[@ClementDelangue](https://x.com/ClementDelangue/status/2060378354931388837)noted that**~50% of models and datasets on Hugging Face are now private**, rising with HF’s storage/buckets offering; this is an important correction to the idea that HF is only public OSS infrastructure.[@abidlabs](https://x.com/abidlabs/status/2060404002341462044)showed**Hugging Face Jobs** replacing GitHub runners for CPU/serverless GPU CI.[@DSPyOSS](https://x.com/DSPyOSS/status/2060186371902587119),[@dbreunig](https://x.com/dbreunig/status/2060187833084870746), and others shipped a redesigned**DSPy docs/front page** ahead of a coming 4.0, focused on onboarding into programmable AI systems rather than pure prompting.**Licensing and permissiveness are becoming strategic levers**:[@kimmonismus](https://x.com/kimmonismus/status/2060458698930016378)highlighted NVIDIA moving its four open model families to**Linux Foundation OpenMDW-1.1**, reducing legal fragmentation across weights/code/docs/data. New permissive data releases also matter:[@keshigeyan](https://x.com/keshigeyan/status/2060398262591668315)introduced**GPIC**, a** 100M-pair permissive image corpus**plus** 1M-pair benchmark**for visual generation, with explicit research + commercial usability.

**Google/OpenAI Product Surface Expands: Managed Agents, Gemini Spark/Omni, and Codex on Windows**

**Google is widening the “managed agent” stack from API to consumer product**:[@_philschmid](https://x.com/_philschmid/status/2060359976325992528)showed** Managed Agents in the Gemini API**: a single API call provisioning a sandboxed Linux environment with code execution, web access, and file I/O. On the consumer side,[@GeminiApp](https://x.com/GeminiApp/status/2060405496872579115)rolled out**Gemini Spark** to U.S. AI Ultra subscribers as a**24/7 personal agent** that can operate across a user’s digital ecosystem under direction. Google also kept pushing**Gemini Omni** multimodal generation/editing demos ([example](https://x.com/alexanderchen/status/2060322611586834518),[product thread](https://x.com/GeminiApp/status/2060473816393150965)) and announced**Google Flow Agent** for creative workflows in video/film production ([thread](https://x.com/Google/status/2060473826362732611)).**OpenAI’s Codex is moving closer to a persistent remote dev operator**:[@OpenAI](https://x.com/OpenAI/status/2060428604727771421)and[@OpenAIDevs](https://x.com/OpenAIDevs/status/2060429591655927942)added**computer use on Windows**, including remote steering from the ChatGPT mobile app. Follow-on UX improvements included** stable identicons for background agents**and search across prior chat content ([@OpenAIDevs](https://x.com/OpenAIDevs/status/2060478367921831936));[@reach_vb](https://x.com/reach_vb/status/2060430024537178215)summarized broader Codex updates around Windows control, mobile remote access, and profile/task stats. Separately, OpenAI updated**gpt-5.5 instant** to improve**sycophancy, factuality, and multilingual performance** per[@michpokrass](https://x.com/michpokrass/status/2060219759682330970).**This all points to more vertically integrated agent stacks**: model + harness + sandbox + UI + remote control + pricing/quotas. Google is smoothing quotas on Gemini ([@joshwoodward](https://x.com/joshwoodward/status/2060171610922058142)); OpenAI is expanding Codex’s operating surface; Cursor added**auto-review mode** with subagent-based approval routing ([tweet](https://x.com/cursor_ai/status/2060406013098897765)). The common pattern is less “chatbot,” more**managed execution environment with policy and memory**.

**Research and Systems Papers Worth Attention**

**Search, retrieval, and memory**:[@TheTuringPost](https://x.com/TheTuringPost/status/2060194173505155358)highlighted** Bidirectional Evolutionary Search (BES)**from Harvard/MIT, combining forward search with backward decomposition and evolutionary operators; reported gains include**Llama-3.2-3B-Instruct on MuSiQue from 4.0% to 7.0%**. In retrieval,[@_reachsumit](https://x.com/_reachsumit/status/2060214762626306512)pointed to** Latent Terms**, showing sparse BM25-ready features can be extracted from frozen dense retrievers via SAEs.[@topk_io](https://x.com/topk_io/status/2060383255153569938)open-sourced**Iso-ModernColBERT** for more efficient late-interaction inference.**Continual learning and belief/state management**:[@HuggingPapers](https://x.com/HuggingPapers/status/2060312560323182657)summarized** BeliefTrack**, claiming optimized belief-state management cuts long-horizon reasoning failures by** 70%+**.[@AndrewLampinen](https://x.com/AndrewLampinen/status/2060460827199599026)argued the continual learning field over-focused on interference instead of positive transfer;[@victor207755822](https://x.com/victor207755822/status/2060315686329778432)presented a second**DeliAutoResearch SKILL** paper focused on self-iteration and CL.**Multimodal/world models/robotics**: NVIDIA-affiliated work included**γ-World**, a generative multi-agent world model streaming at** 24 FPS**([tweet](https://x.com/fangfu0830/status/2060233093894869499)), and** minWM**, a real-time interactive video world model framework ([tweet](https://x.com/_akhaliq/status/2060392729473860026)). In robotics,[@_akhaliq](https://x.com/_akhaliq/status/2060388349425119540)shared**Qwen-VLA**, and[@inventorOli](https://x.com/inventorOli/status/2060357909561622885)demoed Robostral’s language-following and manipulation improvements. For always-on proactive agents,[@dair_ai](https://x.com/dair_ai/status/2060373102119555191)surfaced work replacing LLM wake-up decisions with a**220MiB temporal-graph encoder**, gaining**+16.7 mean F1** while running**4–83x faster**.

**Top tweets (by engagement)**

**OpenAI / biology**:[@OpenAI on Rosalind Biodefense](https://x.com/OpenAI/status/2060376598642405492)announced trusted-access biology tooling for public health and biodefense.**Google / consumer agents**:[@GeminiApp on Spark](https://x.com/GeminiApp/status/2060405496872579115)rolled out its always-on personal agent to AI Ultra users in the U.S.**OpenAI / dev tools**:[@OpenAI on Codex Windows support](https://x.com/OpenAI/status/2060428604727771421)and[@OpenAIDevs](https://x.com/OpenAIDevs/status/2060429591655927942)expanded computer use to Windows plus mobile remote steering.**llama.cpp UX milestone**:[@ggerganov](https://x.com/ggerganov/status/2060394400237109567)launched** llama.app**with a unified installer and CLI entrypoint for local AI.** HF / RL correctness**:[@ClementDelangue](https://x.com/ClementDelangue/status/2060175330665508917)amplified the** Token-In, Token-Out**warning for multi-turn RL with tools.** Open vs closed timing gap**:[@EpochAIResearch](https://x.com/EpochAIResearch/status/2060451576779886942)estimated open-weight models are now about** 4 months behind**the frontier.

**AI Reddit Recap**

**/r/LocalLlama + /r/localLLM Recap**

**1. Local LLM Performance: MoE Releases, Quants, VRAM Savings**

(Activity: 637):[StepFun 3.7 Flash](https://www.reddit.com/r/LocalLLaMA/comments/1tqloii/stepfun_37_flash/)**StepFun released**[Step 3.7 Flash](https://static.stepfun.com/blog/step-3.7-flash/), a multimodal MoE with`196B`

**total parameters,**`11B`

**active, and a built-in**`1.8B`

**ViT, advertised for high-throughput agent workflows up to**`400 TPS`

**and reportedly runnable locally with ~**`128GB`

**RAM. Reported benchmarks position it unusually strongly for a flash-class/local model: SWE-Bench Pro**`56.26%`

**, DeepSearchQA F1**`92.82%`

**, HLE w/tools**`47.2`

**, plus large gains over Step 3.5 Flash on Terminal-Bench, Toolathlon, ClawEval, and other agentic/tool-use tasks. Direct model artifacts are available on Hugging Face in**[BF16](https://huggingface.co/stepfun-ai/Step-3.7-Flash/),[FP8](https://huggingface.co/stepfun-ai/Step-3.7-Flash-FP8),[NVFP4](https://huggingface.co/stepfun-ai/Step-3.7-Flash-NVFP4), and[GGUF](https://huggingface.co/stepfun-ai/Step-3.7-Flash-GGUF), with day-0`llama.cpp`

[support PR](https://github.com/ggml-org/llama.cpp/pull/23845)and related MTP work in`llama.cpp#23274`

**.** Commenters characterize the model as technically odd: its hidden/thinking traces are described as nearly incoherent, but final answers can be*“perfect”*and competitive with much larger`>1TB`

models; one user says the prior Step 3.5*“infinite thinking”*issue appears fixed. There is cautious enthusiasm around local deployment, especially for users with`4x3090`

-class hardware, and appreciation that StepFun upstreamed`llama.cpp`

support instead of only maintaining a fork.StepFun released multiple Step-3.7-Flash checkpoints on Hugging Face:

**BF16**([Step-3.7-Flash](https://huggingface.co/stepfun-ai/Step-3.7-Flash/)),** FP8**([Step-3.7-Flash-FP8](https://huggingface.co/stepfun-ai/Step-3.7-Flash-FP8)),** NVFP4**([Step-3.7-Flash-NVFP4](https://huggingface.co/stepfun-ai/Step-3.7-Flash-NVFP4)), and** GGUF**([Step-3.7-Flash-GGUF](https://huggingface.co/stepfun-ai/Step-3.7-Flash-GGUF)). One user reports the prior Step 3.5 Flash “infinite thinking” issue appears fixed, making 3.7 more usable despite still having an odd intermediate reasoning style.There is day-0

`llama.cpp`

enablement via StepFun’s upstream PR:[ggml-org/llama.cpp#23845](https://github.com/ggml-org/llama.cpp/pull/23845), contrasting with Step 3.5’s fork-based support. A separate community PR for**MTP support** exists at[ggml-org/llama.cpp#23274](https://github.com/ggml-org/llama.cpp/pull/23274), though commenters note it needs updating for Step 3.7 and current`master`

.A vLLM nightly test of the

**NVFP4** checkpoint on`2x Pro 6k`

with`64`

concurrent shallow-context requests reached about`2200 tok/s`

. The reported config used`tensor-parallel-size 2`

,`--enable-expert-parallel`

,`--quantization modelopt`

,`--kv-cache-dtype fp8`

,`--reasoning-parser step3p5`

, and StepFun tool-call parsing; vLLM reported**GPU KV cache size**`1,667,645`

**tokens** and**max concurrency**`6.36x`

**for**`262,144`

**tokens/request**.

## Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.
