{"slug": "ainews-how-to-land-a-job-at-a-frontier-lab-on-pretraining", "title": "[AINews] How to land a job at a frontier lab (on Pretraining)", "summary": "Vlad Feinberg published a guide on how to land a job at a frontier AI lab focused on pretraining, emphasizing kernel-level performance work as the most direct path into the labs. The guide recommends deriving Chinchilla scaling laws for dense versus MoE architectures, coding solutions from scratch in JAX, and writing a Pallas kernel that beats ragged_dot for fused up/down projections. The post also highlights agent work and offers a workshop speaking opportunity for those who can teach these skills to the community.", "body_md": "# [AINews] How to land a job at a frontier lab (on Pretraining)\n\n### a quiet day before google i/o lets us amplify a notable blogpost\n\nIt is the day before Google I/O, when the next major Gemini releases are expected to be previewed, and it will probably be a quiet week from competitors, though [Anthropic](https://news.ycombinator.com/item?id=48182281) and [OpenAI](https://news.ycombinator.com/item?id=48182754) both had minor wins today, and Cursor shipped their [first SpaceXAI model](https://news.ycombinator.com/item?id=48182516) with some nice detail on synthetic data/reward hacking and continued pretraining with [Muon](https://news.smol.ai/issues/25-07-11-kimi-k2). However the probable lasting title story candidate from today will be Vlad Feinberg’s (understandably Google/TPU centric) [notes on job preparation, specifically on Pretraining](https://vladfeinberg.com/2026/05/10/how-to-land-a-job-at-a-frontier-lab.html):\n\nSpecifically he references last year’s [Scaling handbook from DeepMind](https://jax-ml.github.io/scaling-book/), and kernel work is an important part:\n\nThe biggest bottleneck and innermost loop of all LLM work isperformance work that makes abstract, logical changes to the LLM practical to run. Every project needs people who cantune the LLMs at the kernel level. It is a skill you can pick up and is the most direct path into the labs.\n\nThere’s a surprise mention of DSLs for kernel dev, of which there is a [concise history](https://x.com/yaroslavvb/status/2053669022684877076):\n\nFor someone at this level of the stack, surprisingly he also calls out Agent Work like [autoresearch](https://www.latent.space/p/ainews-ai-engineer-worlds-fair-autoresearch) and AlphaEvolve. He ends with a surprisingly simple exercise:\n\nBut the real hiring test is in the bottom paragraphs:\n\n*Derive Chinchilla laws for this; see how they***differ for dense vs MoE** architectures.*Code your solution from scratch in jax by hand if you actually want the learning experience.*\n\n*Next, assuming you used jax.lax.ragged_dot for the MoE layer;***write a pallas kernel** that beats ragged dot for F > D by fusing the up/down projections.*Find a setting where you notice a measurable forward pass speedup and explain why it’s there.*\n\nIf you can teach this to the rest of the community, we’d [love to feature you as a workshop speaker.](https://ai.engineer/cfp)\n\nAI News for 5/16/2026-5/18/2026. We checked 12 subreddits,\n\n[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!\n\n**AI Twitter Recap**\n\n**Coding Agents, Agent Ops, and the Move from Chat to Automation**\n\n**Agent infrastructure is converging on observability + automation loops**: Several posts point to a maturing stack for production agents.** LangSmith Engine**is framed as the missing CI/CD loop for agents, automatically detecting failures from production traces, clustering issues, and drafting fixes/evals, with LangChain also highlighting**SmithDB** as a purpose-built data layer for agent observability/eval workloads with low-latency querying over large traces and self-hosting/multi-cloud requirements[@krishdpi](https://x.com/krishdpi/status/2056102370434798034),[@LangChain](https://x.com/LangChain/status/2056414104445747371). In parallel,**Cognition** launched**Devin Auto-Triage**, positioning it as an always-on “first responder” for bugs, alerts, and incidents with long-term memory, manager/subagent structure, and PR generation; early users like Modal describe it as more useful than typical homegrown triage automations[@cognition](https://x.com/cognition/status/2056396941181727210),[@walden_yan](https://x.com/walden_yan/status/2056409599000068193),[@russelljkaplan](https://x.com/russelljkaplan/status/2056457452661719277). The common pattern is less “chat with an agent” and more**persistent automation tied to traces, memory, and evals**.** Operational patterns for coding agents are getting more concrete**: Anthropic published best practices for running** Claude Code**across multi-million-line monorepos, legacy systems, and microservices, while adding** prompt cache diagnostics**and making** Fast mode default to Opus 4.7**for lower-latency coding workflows[@ClaudeDevs](https://x.com/ClaudeDevs/status/2056403446056784288),[@ClaudeDevs](https://x.com/ClaudeDevs/status/2056434422229123106),[@ClaudeDevs](https://x.com/ClaudeDevs/status/2056454359685476491). OpenAI expanded**Codex** workflows with a**Zoom plugin**, mobile/desktop remote execution, and “keep your Mac awake” support so longer-running jobs continue from the phone app[@coreyching](https://x.com/coreyching/status/2056422748763914274),[@OpenAIDevs](https://x.com/OpenAIDevs/status/2056442456800141424). Microsoft pushed**remote control** for GitHub Copilot CLI and VS Code to GA[@code](https://x.com/code/status/2056460035278962738). Across these, the product direction is clear:**background execution, remote supervision, and agent fan-out**, not just interactive completions.** Practitioners are converging on the same mental model: constrain, verify, decompose**: François Chollet’s framing of coding agents as “blind squirrels” that need carefully placed** verifiable constraints**succinctly matches a broader shift toward harness-centric engineering[@fchollet](https://x.com/fchollet/status/2056401102485266620). Related advice includes using**asserts** heavily in Python/ML code to fail fast[@gabriberton](https://x.com/gabriberton/status/2056381648707735875), building both**end-to-end and incremental evals** for long-running agents[@palashshah](https://x.com/palashshah/status/2056449711767265420), and structuring multi-agent systems in staged maturity levels rather than maximizing agent count prematurely[@shannholmberg](https://x.com/shannholmberg/status/2056410242330874349). The practical consensus: agent quality depends more on**verification surfaces, decomposition, and feedback loops** than on prompt cleverness alone.\n\n**Model Releases, Ranking Shifts, and Frontier Coding Models**\n\n**Cursor’s Composer 2.5 is the standout model launch in this batch**: Cursor announced** Composer 2.5**as its strongest model yet, emphasizing better sustained work on long-running tasks and more reliable instruction following, then disclosed a deeper strategic move: training a much larger model from scratch with**“SpaceXAI,”** using**10× more total compute** and access to**Colossus 2’s million H100-equivalents**[@cursor_ai](https://x.com/cursor_ai/status/2056415413077233983),[@cursor_ai](https://x.com/cursor_ai/status/2056415419536461836). Community reactions centered on its**efficiency/cost-performance profile** and strong coding quality, with users calling it a major step up from Composer 2 and noting better collaboration behavior in messages/updates, not just raw benchmark gains[@mntruell](https://x.com/mntruell/status/2056418797473640681),[@jonas_nelle](https://x.com/jonas_nelle/status/2056422317740466192),[@kimmonismus](https://x.com/kimmonismus/status/2056494027189751842).**Alibaba’s Qwen line continues to climb**:** Qwen3.7 Preview**landed on Arena with** Qwen3.7 Max Preview**at**#13 overall** in text, including**#7 Math**,**#9 Expert**,**#9 Software & IT**, and**#10 Coding**;** Qwen3.7 Plus Preview**reached**#16 overall** in vision, making Alibaba the**#6 lab in text** and**#5 in vision** by Arena’s counts[@arena](https://x.com/arena/status/2056400044862111757),[@Alibaba_Qwen](https://x.com/Alibaba_Qwen/status/2056403591464984753). That reinforces the broader trend of Chinese labs steadily improving across both general and specialist arenas rather than only headline chat benchmarks.**Open model and multimodal releases continue below the mega-frontier**: ByteDance open-sourced** Lance**, described as a** unified multimodal model**for image/video understanding, generation, and editing, with** 3B video + 3B image + 3B decoder**components[@bdsqlsz](https://x.com/bdsqlsz/status/2056353648779907115). Perplexity released a small open** multilingual ColBERT**model as a continued-training variant of** pplx-embed-0.6b**, with notes on using the** MaxSim kernel**[@bo_wangbo](https://x.com/bo_wangbo/status/2056421369387094301). These are not frontier-scale launches, but they are technically meaningful because they target**retrieval quality** and**native multimodal unification**, two areas where open tooling still matters.\n\n**Inference, Deployment, and Local/Enterprise Serving**\n\n**Local inference got a notable speed boost via MTP in llama.cpp**: Georgi Gerganov announced** MTP support for the Qwen3.6 family**in** llama.cpp**, calling it a significant milestone for local AI[@ggerganov](https://x.com/ggerganov/status/2056391115469689330). Follow-on reports showed meaningful throughput gains, including a**Qwen3.6-27B dense** jump from**25 tok/s to 45 tok/s (+78%)** on an A10G using draft-MTP flags[@victormustar](https://x.com/victormustar/status/2056456757786869793). This matters because it narrows the usability gap between local and hosted coding/general assistants on commodity hardware.**Enterprise/on-prem deployment momentum remains strong**: Hugging Face and Dell promoted one-click access to models including** Kimi K2.6**,** DeepSeek V4 Pro/Flash**,** GLM 5.1**, and** MiniMax M2.7**through** Dell Enterprise Hub**optimized for** PowerEdge XE9780 with NVIDIA B300**[@jeffboudier](https://x.com/jeffboudier/status/2056436625522266265). Clement Delangue argued that** on-prem/local AI based on open-source models**will be an important answer to** GPU shortages**, with advantages in** cost, latency, and safety/data control**[@ClementDelangue](https://x.com/ClementDelangue/status/2056439359784530252).** Cross-hardware inference optimization is becoming more sophisticated**: Zyphra published end-to-end inference benchmarks on** AMD Instinct MI355X**, claiming strong outperformance over AMD’s baseline and a narrowed gap to** NVIDIA B200**when serving** Kimi K2.6, GLM 5.1, and DeepSeek V3.2**[@ZyphraAI](https://x.com/ZyphraAI/status/2056404622483562623). Complementing that, Quentin Anthony posted a useful thread on why benchmarking needs to distinguish**hardware ceilings vs current software state**, arguing that many cross-stack comparisons conflate vendor maxes, achievable GEMM performance, and software maturity[@QuentinAnthon15](https://x.com/QuentinAnthon15/status/2056450379932647533). For infra engineers, that’s a strong reminder to treat benchmark charts as**stack-dependent snapshots**, not absolute truths.\n\n**Research: MoEs, RL/Data Mixing, Architecture Search, and Agent Evaluation**\n\n**Several papers this week focused on better training signals rather than bigger models**: A summary of LeCun/Timor et al.’s**“On Training in Imagination”** highlighted that in model-based RL, smoother world/reward models with**low Lipschitz constants** tighten error bounds; reward models often scale faster than dynamics models; and**many noisy reward labels can beat fewer high-quality ones**, while biased rewards are especially dangerous[@TheTuringPost](https://x.com/TheTuringPost/status/2056182805412098431). A separate thread on**Pedagogical RL** argued that even correct reasoning traces can be poor training data if they are too surprising relative to the student policy; the method uses a privileged teacher plus**spike-aware rewards** and**surprisal-gated imitation** to generate trajectories the student can actually learn from[@blc_16](https://x.com/blc_16/status/2056411251186815104),[@NoahZiems](https://x.com/NoahZiems/status/2056454054092419568).**Architecture and scaling studies remain highly actionable**: Meta’s** AIRA**work on** agentic neural architecture discovery**drew attention because it beats** Llama 3.2**at** 350M, 1B, and 3B**scales within a** 24-hour compute budget**by splitting search into a planning agent (** AIRA-Compose**) and an implementation agent (** AIRA-Design**)[@omarsar0](https://x.com/omarsar0/status/2056434731508703607),[@dair_ai](https://x.com/dair_ai/status/2056435283910865265). Separately,**“Slicing and Dicing MoEs”** reports training**2,000+ MoE LMs** and concludes that much of the design space reduces to**expert size and expert count** rather than the noisier discourse around MoE configuration knobs[@margs_li](https://x.com/margs_li/status/2056355079188627862).**Data selection/eval methodology are emerging as first-class research problems**:** On-Policy Mix**targets the unsolved problem of finding the right data mix as data distributions keep shifting, with applicability across pretraining, midtraining, and instruction tuning[@michahu8](https://x.com/michahu8/status/2056393112621043964). On evals, Cameron Wolfe published a guide to**agent evaluation**, and a longer Zhihu summary argued that the agent era requires measuring** delegation intelligence**—when to search, code, reason, or call tools—rather than only static knowledge or internal chain-of-thought prowess[@cwolferesearch](https://x.com/cwolferesearch/status/2056399847553409301),[@ZhihuFrontier](https://x.com/ZhihuFrontier/status/2056408194801635391). That aligns closely with current product practice: the hard part is increasingly**tool choice and verification policy**, not text-only reasoning.\n\n**Ecosystem Moves: SDKs, Revenue Capture, and Open Tooling**\n\n**Anthropic acquired Stainless**: Anthropic announced the acquisition of** Stainless**, the SDK and MCP server platform that has powered Anthropic SDKs since early API days[@AnthropicAI](https://x.com/AnthropicAI/status/2056419620643541012). Strategically, this points to continued vertical integration around**developer ergonomics, SDK generation, and protocol surfaces**, not just model quality.** Revenue concentration around foundation model providers appears to be increasing**: One post claimed that** Anthropic and OpenAI’s share of AI model/application revenues generated by 34 top AI startups is rising**, a signal that the ecosystem may be consolidating economically even as model choices proliferate[@amir](https://x.com/amir/status/2056041152500142259).**Tooling and deployment curation remains in demand**: The Turing Post’s roundup of** 13 open-source tools for foundation model deployment**—including** vLLM, TGI, SGLang, llama.cpp, Ollama, BentoML, Kubeflow, MLflow**and others—was one of the more practically useful curation posts in the set[@TheTuringPost](https://x.com/TheTuringPost/status/2056102301811781848). Meanwhile,**Papers With Code** is being revived with AI-agent-assisted parsing of methods, leaderboards, and SOTA tracking, underscoring renewed focus on**research discoverability**[@NielsRogge](https://x.com/NielsRogge/status/2056366395605078252).\n\n**Top Tweets (by engagement)**\n\n**Cursor’s Composer 2.5 + bigger training push**: The highest-signal high-engagement product news was** Composer 2.5**and Cursor’s disclosure that it is training a much larger model from scratch with** 10× more compute**[@cursor_ai](https://x.com/cursor_ai/status/2056415413077233983),[@cursor_ai](https://x.com/cursor_ai/status/2056415419536461836).**OpenAI/Anthropic product updates with developer impact**: Sam Altman said** ChatGPT improved significantly with the latest update**[@sama](https://x.com/sama/status/2056435834333934051), while Anthropic shipped** Fast mode defaulting to Opus 4.7**and** prompt cache diagnostics**in Claude Console[@ClaudeDevs](https://x.com/ClaudeDevs/status/2056454359685476491),[@ClaudeDevs](https://x.com/ClaudeDevs/status/2056434422229123106).**Enduring research/engineering framing**: Richard Sutton’s 26-word condensation of the** Bitter Lesson**—focus on methods for creating knowledge that scale with compute, like search and learning—was among the most engaged research-adjacent posts and resonated with many of the week’s themes around agent harnesses, search, and verifier-driven systems[@RichardSSutton](https://x.com/RichardSSutton/status/2056419165502935198).\n\n**AI Reddit Recap**\n\n**/r/LocalLlama + /r/localLLM Recap**\n\n**1. LLM Safety Benchmarks and Abliteration Forensics**\n\n## Keep reading with a 7-day free trial\n\nSubscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.", "url": "https://wpnews.pro/news/ainews-how-to-land-a-job-at-a-frontier-lab-on-pretraining", "canonical_source": "https://www.latent.space/p/ainews-how-to-land-a-job-at-a-frontier", "published_at": "2026-05-19 07:31:40+00:00", "updated_at": "2026-05-25 00:19:14.007077+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-research", "ai-infrastructure"], "entities": ["Google", "Anthropic", "OpenAI", "Cursor", "Vlad Feinberg", "DeepMind", "Gemini", "Muon"], "alternates": {"html": "https://wpnews.pro/news/ainews-how-to-land-a-job-at-a-frontier-lab-on-pretraining", "markdown": "https://wpnews.pro/news/ainews-how-to-land-a-job-at-a-frontier-lab-on-pretraining.md", "text": "https://wpnews.pro/news/ainews-how-to-land-a-job-at-a-frontier-lab-on-pretraining.txt", "jsonld": "https://wpnews.pro/news/ainews-how-to-land-a-job-at-a-frontier-lab-on-pretraining.jsonld"}}