[AINews] How to land a job at a frontier lab (on Pretraining)

Vlad Feinberg published a guide on how to land a job at a frontier AI lab focused on pretraining, emphasizing kernel-level performance work as the most direct path into the labs. The guide recommends deriving Chinchilla scaling laws for dense versus MoE architectures, coding solutions from scratch in JAX, and writing a Pallas kernel that beats ragged_dot for fused up/down projections. The post also highlights agent work and offers a workshop speaking opportunity for those who can teach these skills to the community.

AINews How to land a job at a frontier lab on Pretraining a quiet day before google i/o lets us amplify a notable blogpost It is the day before Google I/O, when the next major Gemini releases are expected to be previewed, and it will probably be a quiet week from competitors, though Anthropic https://news.ycombinator.com/item?id=48182281 and OpenAI https://news.ycombinator.com/item?id=48182754 both had minor wins today, and Cursor shipped their first SpaceXAI model https://news.ycombinator.com/item?id=48182516 with some nice detail on synthetic data/reward hacking and continued pretraining with Muon https://news.smol.ai/issues/25-07-11-kimi-k2 . However the probable lasting title story candidate from today will be Vlad Feinberg’s understandably Google/TPU centric notes on job preparation, specifically on Pretraining https://vladfeinberg.com/2026/05/10/how-to-land-a-job-at-a-frontier-lab.html : Specifically he references last year’s Scaling handbook from DeepMind https://jax-ml.github.io/scaling-book/ , and kernel work is an important part: The biggest bottleneck and innermost loop of all LLM work isperformance work that makes abstract, logical changes to the LLM practical to run. Every project needs people who cantune the LLMs at the kernel level. It is a skill you can pick up and is the most direct path into the labs. There’s a surprise mention of DSLs for kernel dev, of which there is a concise history https://x.com/yaroslavvb/status/2053669022684877076 : For someone at this level of the stack, surprisingly he also calls out Agent Work like autoresearch https://www.latent.space/p/ainews-ai-engineer-worlds-fair-autoresearch and AlphaEvolve. He ends with a surprisingly simple exercise: But the real hiring test is in the bottom paragraphs: Derive Chinchilla laws for this; see how they differ for dense vs MoE architectures. Code your solution from scratch in jax by hand if you actually want the learning experience. Next, assuming you used jax.lax.ragged dot for the MoE layer; write a pallas kernel that beats ragged dot for F D by fusing the up/down projections. Find a setting where you notice a measurable forward pass speedup and explain why it’s there. If you can teach this to the rest of the community, we’d love to feature you as a workshop speaker. https://ai.engineer/cfp AI News for 5/16/2026-5/18/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space . You can opt in/out of email frequencies AI Twitter Recap Coding Agents, Agent Ops, and the Move from Chat to Automation Agent infrastructure is converging on observability + automation loops : Several posts point to a maturing stack for production agents. LangSmith Engine is framed as the missing CI/CD loop for agents, automatically detecting failures from production traces, clustering issues, and drafting fixes/evals, with LangChain also highlighting SmithDB as a purpose-built data layer for agent observability/eval workloads with low-latency querying over large traces and self-hosting/multi-cloud requirements @krishdpi https://x.com/krishdpi/status/2056102370434798034 , @LangChain https://x.com/LangChain/status/2056414104445747371 . In parallel, Cognition launched Devin Auto-Triage , positioning it as an always-on “first responder” for bugs, alerts, and incidents with long-term memory, manager/subagent structure, and PR generation; early users like Modal describe it as more useful than typical homegrown triage automations @cognition https://x.com/cognition/status/2056396941181727210 , @walden yan https://x.com/walden yan/status/2056409599000068193 , @russelljkaplan https://x.com/russelljkaplan/status/2056457452661719277 . The common pattern is less “chat with an agent” and more persistent automation tied to traces, memory, and evals . Operational patterns for coding agents are getting more concrete : Anthropic published best practices for running Claude Code across multi-million-line monorepos, legacy systems, and microservices, while adding prompt cache diagnostics and making Fast mode default to Opus 4.7 for lower-latency coding workflows @ClaudeDevs https://x.com/ClaudeDevs/status/2056403446056784288 , @ClaudeDevs https://x.com/ClaudeDevs/status/2056434422229123106 , @ClaudeDevs https://x.com/ClaudeDevs/status/2056454359685476491 . OpenAI expanded Codex workflows with a Zoom plugin , mobile/desktop remote execution, and “keep your Mac awake” support so longer-running jobs continue from the phone app @coreyching https://x.com/coreyching/status/2056422748763914274 , @OpenAIDevs https://x.com/OpenAIDevs/status/2056442456800141424 . Microsoft pushed remote control for GitHub Copilot CLI and VS Code to GA @code https://x.com/code/status/2056460035278962738 . Across these, the product direction is clear: background execution, remote supervision, and agent fan-out , not just interactive completions. Practitioners are converging on the same mental model: constrain, verify, decompose : François Chollet’s framing of coding agents as “blind squirrels” that need carefully placed verifiable constraints succinctly matches a broader shift toward harness-centric engineering @fchollet https://x.com/fchollet/status/2056401102485266620 . Related advice includes using asserts heavily in Python/ML code to fail fast @gabriberton https://x.com/gabriberton/status/2056381648707735875 , building both end-to-end and incremental evals for long-running agents @palashshah https://x.com/palashshah/status/2056449711767265420 , and structuring multi-agent systems in staged maturity levels rather than maximizing agent count prematurely @shannholmberg https://x.com/shannholmberg/status/2056410242330874349 . The practical consensus: agent quality depends more on verification surfaces, decomposition, and feedback loops than on prompt cleverness alone. Model Releases, Ranking Shifts, and Frontier Coding Models Cursor’s Composer 2.5 is the standout model launch in this batch : Cursor announced Composer 2.5 as its strongest model yet, emphasizing better sustained work on long-running tasks and more reliable instruction following, then disclosed a deeper strategic move: training a much larger model from scratch with “SpaceXAI,” using 10× more total compute and access to Colossus 2’s million H100-equivalents @cursor ai https://x.com/cursor ai/status/2056415413077233983 , @cursor ai https://x.com/cursor ai/status/2056415419536461836 . Community reactions centered on its efficiency/cost-performance profile and strong coding quality, with users calling it a major step up from Composer 2 and noting better collaboration behavior in messages/updates, not just raw benchmark gains @mntruell https://x.com/mntruell/status/2056418797473640681 , @jonas nelle https://x.com/jonas nelle/status/2056422317740466192 , @kimmonismus https://x.com/kimmonismus/status/2056494027189751842 . Alibaba’s Qwen line continues to climb : Qwen3.7 Preview landed on Arena with Qwen3.7 Max Preview at 13 overall in text, including 7 Math , 9 Expert , 9 Software & IT , and 10 Coding ; Qwen3.7 Plus Preview reached 16 overall in vision, making Alibaba the 6 lab in text and 5 in vision by Arena’s counts @arena https://x.com/arena/status/2056400044862111757 , @Alibaba Qwen https://x.com/Alibaba Qwen/status/2056403591464984753 . That reinforces the broader trend of Chinese labs steadily improving across both general and specialist arenas rather than only headline chat benchmarks. Open model and multimodal releases continue below the mega-frontier : ByteDance open-sourced Lance , described as a unified multimodal model for image/video understanding, generation, and editing, with 3B video + 3B image + 3B decoder components @bdsqlsz https://x.com/bdsqlsz/status/2056353648779907115 . Perplexity released a small open multilingual ColBERT model as a continued-training variant of pplx-embed-0.6b , with notes on using the MaxSim kernel @bo wangbo https://x.com/bo wangbo/status/2056421369387094301 . These are not frontier-scale launches, but they are technically meaningful because they target retrieval quality and native multimodal unification , two areas where open tooling still matters. Inference, Deployment, and Local/Enterprise Serving Local inference got a notable speed boost via MTP in llama.cpp : Georgi Gerganov announced MTP support for the Qwen3.6 family in llama.cpp , calling it a significant milestone for local AI @ggerganov https://x.com/ggerganov/status/2056391115469689330 . Follow-on reports showed meaningful throughput gains, including a Qwen3.6-27B dense jump from 25 tok/s to 45 tok/s +78% on an A10G using draft-MTP flags @victormustar https://x.com/victormustar/status/2056456757786869793 . This matters because it narrows the usability gap between local and hosted coding/general assistants on commodity hardware. Enterprise/on-prem deployment momentum remains strong : Hugging Face and Dell promoted one-click access to models including Kimi K2.6 , DeepSeek V4 Pro/Flash , GLM 5.1 , and MiniMax M2.7 through Dell Enterprise Hub optimized for PowerEdge XE9780 with NVIDIA B300 @jeffboudier https://x.com/jeffboudier/status/2056436625522266265 . Clement Delangue argued that on-prem/local AI based on open-source models will be an important answer to GPU shortages , with advantages in cost, latency, and safety/data control @ClementDelangue https://x.com/ClementDelangue/status/2056439359784530252 . Cross-hardware inference optimization is becoming more sophisticated : Zyphra published end-to-end inference benchmarks on AMD Instinct MI355X , claiming strong outperformance over AMD’s baseline and a narrowed gap to NVIDIA B200 when serving Kimi K2.6, GLM 5.1, and DeepSeek V3.2 @ZyphraAI https://x.com/ZyphraAI/status/2056404622483562623 . Complementing that, Quentin Anthony posted a useful thread on why benchmarking needs to distinguish hardware ceilings vs current software state , arguing that many cross-stack comparisons conflate vendor maxes, achievable GEMM performance, and software maturity @QuentinAnthon15 https://x.com/QuentinAnthon15/status/2056450379932647533 . For infra engineers, that’s a strong reminder to treat benchmark charts as stack-dependent snapshots , not absolute truths. Research: MoEs, RL/Data Mixing, Architecture Search, and Agent Evaluation Several papers this week focused on better training signals rather than bigger models : A summary of LeCun/Timor et al.’s “On Training in Imagination” highlighted that in model-based RL, smoother world/reward models with low Lipschitz constants tighten error bounds; reward models often scale faster than dynamics models; and many noisy reward labels can beat fewer high-quality ones , while biased rewards are especially dangerous @TheTuringPost https://x.com/TheTuringPost/status/2056182805412098431 . A separate thread on Pedagogical RL argued that even correct reasoning traces can be poor training data if they are too surprising relative to the student policy; the method uses a privileged teacher plus spike-aware rewards and surprisal-gated imitation to generate trajectories the student can actually learn from @blc 16 https://x.com/blc 16/status/2056411251186815104 , @NoahZiems https://x.com/NoahZiems/status/2056454054092419568 . Architecture and scaling studies remain highly actionable : Meta’s AIRA work on agentic neural architecture discovery drew attention because it beats Llama 3.2 at 350M, 1B, and 3B scales within a 24-hour compute budget by splitting search into a planning agent AIRA-Compose and an implementation agent AIRA-Design @omarsar0 https://x.com/omarsar0/status/2056434731508703607 , @dair ai https://x.com/dair ai/status/2056435283910865265 . Separately, “Slicing and Dicing MoEs” reports training 2,000+ MoE LMs and concludes that much of the design space reduces to expert size and expert count rather than the noisier discourse around MoE configuration knobs @margs li https://x.com/margs li/status/2056355079188627862 . Data selection/eval methodology are emerging as first-class research problems : On-Policy Mix targets the unsolved problem of finding the right data mix as data distributions keep shifting, with applicability across pretraining, midtraining, and instruction tuning @michahu8 https://x.com/michahu8/status/2056393112621043964 . On evals, Cameron Wolfe published a guide to agent evaluation , and a longer Zhihu summary argued that the agent era requires measuring delegation intelligence —when to search, code, reason, or call tools—rather than only static knowledge or internal chain-of-thought prowess @cwolferesearch https://x.com/cwolferesearch/status/2056399847553409301 , @ZhihuFrontier https://x.com/ZhihuFrontier/status/2056408194801635391 . That aligns closely with current product practice: the hard part is increasingly tool choice and verification policy , not text-only reasoning. Ecosystem Moves: SDKs, Revenue Capture, and Open Tooling Anthropic acquired Stainless : Anthropic announced the acquisition of Stainless , the SDK and MCP server platform that has powered Anthropic SDKs since early API days @AnthropicAI https://x.com/AnthropicAI/status/2056419620643541012 . Strategically, this points to continued vertical integration around developer ergonomics, SDK generation, and protocol surfaces , not just model quality. Revenue concentration around foundation model providers appears to be increasing : One post claimed that Anthropic and OpenAI’s share of AI model/application revenues generated by 34 top AI startups is rising , a signal that the ecosystem may be consolidating economically even as model choices proliferate @amir https://x.com/amir/status/2056041152500142259 . Tooling and deployment curation remains in demand : The Turing Post’s roundup of 13 open-source tools for foundation model deployment —including vLLM, TGI, SGLang, llama.cpp, Ollama, BentoML, Kubeflow, MLflow and others—was one of the more practically useful curation posts in the set @TheTuringPost https://x.com/TheTuringPost/status/2056102301811781848 . Meanwhile, Papers With Code is being revived with AI-agent-assisted parsing of methods, leaderboards, and SOTA tracking, underscoring renewed focus on research discoverability @NielsRogge https://x.com/NielsRogge/status/2056366395605078252 . Top Tweets by engagement Cursor’s Composer 2.5 + bigger training push : The highest-signal high-engagement product news was Composer 2.5 and Cursor’s disclosure that it is training a much larger model from scratch with 10× more compute @cursor ai https://x.com/cursor ai/status/2056415413077233983 , @cursor ai https://x.com/cursor ai/status/2056415419536461836 . OpenAI/Anthropic product updates with developer impact : Sam Altman said ChatGPT improved significantly with the latest update @sama https://x.com/sama/status/2056435834333934051 , while Anthropic shipped Fast mode defaulting to Opus 4.7 and prompt cache diagnostics in Claude Console @ClaudeDevs https://x.com/ClaudeDevs/status/2056454359685476491 , @ClaudeDevs https://x.com/ClaudeDevs/status/2056434422229123106 . Enduring research/engineering framing : Richard Sutton’s 26-word condensation of the Bitter Lesson —focus on methods for creating knowledge that scale with compute, like search and learning—was among the most engaged research-adjacent posts and resonated with many of the week’s themes around agent harnesses, search, and verifier-driven systems @RichardSSutton https://x.com/RichardSSutton/status/2056419165502935198 . AI Reddit Recap /r/LocalLlama + /r/localLLM Recap 1. LLM Safety Benchmarks and Abliteration Forensics Keep reading with a 7-day free trial Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.