Three Trends from MLSys 2026
At MLSys 2026, Modular identified three key trends in AI inference, highlighted by keynotes on agentic kernel development and the need for "zero trust" verification to prevent benchmark cheating. Lido…
At MLSys 2026, Modular identified three key trends in AI inference, highlighted by keynotes on agentic kernel development and the need for "zero trust" verification to prevent benchmark cheating. Lido…
Modular has built a new data layer for LLM inference routing that solves the problem of querying cached blocks across hundreds of pods in microseconds. The company's architecture uses a specialized da…
Hippocratic AI partnered with Modular to integrate the MAX framework into its inference pipelines, achieving sub-500ms mean time to first token and approximately 30% faster P99 end-to-end latency for …
Modular released AI agent skills for its Mojo language that enable coding assistants to translate existing GPU kernels from CUDA and Triton into Mojo code, addressing the challenge that large language…
Modular announced that traditional HTTP-era load balancing algorithms like round-robin, consistent hashing, and least-connections are inadequate for large language model inference because GPU pods are…