{"slug": "modular-modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2", "title": "Modular: Modular 26.4: SOTA MoE Serving, Model Bringup via Agent Skills, Mojo Beta 2 and More", "summary": "Modular released version 26.4 of its platform, adding state-of-the-art mixture-of-experts (MoE) serving on Modular Cloud, support for new open-weight models like MiniMax M3 and GLM 5.2, and advancing toward Mojo 1.0 with Mojo Beta 2. The update also enhances quantization, speculative decoding, OpenAI API compatibility, and Apple silicon GPU support, while introducing agentic model bring-up via modular/skills.", "body_md": "Modular 26.4 brings state-of-the-art mixture-of-experts (MoE) serving to Modular Cloud, expands MAX support for the newest open-weight models, and takes another step toward Mojo 1.0.\n\nModular Cloud is expanding and now supports the latest frontier models such as [MiniMax M3](https://www.modular.com/blog/day-zero-minimax-m3-open-weights-on-modular-cloud), [GLM 5.2](https://www.modular.com/models/glm-5-2), and [Kimi 2.7](https://www.modular.com/models/kimi-k2-6). Modular Cloud is built on top of the 26.4 release which adds support for new model architectures, enhances quantization and speculative decoding capabilities, improves OpenAI API compatibility, extends Apple silicon GPU support, and makes MAX more accessible via `modular/skills`\n\nfor agentic model bring-up.\n\n💡We’ll share much more about what’s coming for Mojo, MAX and Modular Cloud at our\n\n[ModCon conference](https://www.modular.com/modcon): join us August 18th in San Francisco.\n\n**Serve SOTA MoEs on Modular Cloud**\n\nAll frontier models today are MoE based. The MoE architecture means that while the model is large (in the hundreds of billions or trillions of parameters), only a few of those parameters are active at any time. This sparse activation is further extended by relying on sparse activations of KVCache blocks. The large size and sparsity makes these models more difficult to serve, since it requires cross stack optimizations from the cloud to the kernels. In Modular Cloud we've carefully tuned modules such as Gemma 4, Deepseek, GLM, MiniMax, and Kimi to ensure peak performance of these models.\n\nNew MoE models available through Modular Cloud include:\n\nModular Cloud gives you access to [500+ model architectures](https://www.modular.com/models) for different use cases from agentic coding, multi-turn chat, to vision and video generation. [Request access](https://console.modular.com/signup?utm_source=26.4releaseBlog) to Modular Cloud today.\n\n## MAX: New models, more capabilities, faster bring-up\n\nIn the prior [26.3](https://www.modular.com/blog/modular-26-3-mojo-1-0-beta-max-video-gen-and-more?utm_source=26.4releaseBlog) release, we introduced distributed-aware tensors and initial pieces of Modular native agentic tooling. In MAX 26.4, we continued our investments to expand the capabilities of the MAX framework and improve development experience.\n\nMAX underpins the capabilities of Modular Cloud and with MAX 26.4, we've added additional model coverage and serving machinery. This includes:\n\n- New model architectures such as\n`GlmMoeDsaForCausalLM`\n\n, `LFM2ForCausalLM `\n\nand `HYV3ForCausalLM `\n\nare now supported in MAX. `KimiK25ForConditionalGeneration `\n\nextends to support both Kimi 2.6 and 2.7 as well as support for different speculative decoders such as Eagle3 and DFlash.**Improve OpenAI API compatibility:** MAX 26.4 adds support for the `developer`\n\nrole, aligns reasoning output with the Responses API, improves structured-output handling, and adds compatibility flags so real-world requests is less likely to fail on minor request differences.**Wider quantization coverage:** Models such as Gemma4 and FLUX.2-Klein can now run using eitherboth FP8 or FP4 weights. **Apple silicon GPU:** MAX now supports many common model architectures, including Qwen 3.6 and Gemma 4, on M3 and newer Apple silicon GPUs. We’re continuing to improve Apple silicon support across Mojo and MAX, so check out the nightlies for the latest capabilities and best performance.**Cleaner MAX APIs and migration notes:** 26.4 moves [MAX modules](https://docs.modular.com/max/api/python/) into clearer homes and removes older legacy types. Most changes include deprecation shims, so existing code should keep working while you migrate.\n\nSee [the MAX changelog](https://docs.modular.com/max/changelog/#v264-2026-06-18) for the full list.\n\n### Agentic model bring up with `modular/skills`\n\nDevelopers often ask us how they can bring their own models into MAX to enjoy these features. In 26.4, we’ve released the [import-model](https://github.com/modular/skills/tree/main/import-model) and [debug-model](https://github.com/modular/skills/blob/main/debug-model/SKILL.md) skills, which enable importing your models into MAX with agents. The skill can be installed via:\n\nThese skills guide an AI coding agent through a repeatable model bring-up workflow:\n\n**Decide and plan** by inspecting the Hugging Face config and modeling code.**Implement the model** by scaffolding from the closest existing MAX architecture.**Verify its outputs** by running a layer-by-layer logit-divergence hunt against the reference implementation until outputs match.\n\nThe result is a fast and practical path from a Hugging Face model ID to a working MAX architecture that’s ready to deploy. To demonstrate this, we've brought up Tencent’s [Hunyuan Hy3-preview](https://huggingface.co/tencent/Hy3-preview) model into MAX using these agent skills. The model uses 192 routed experts with sigmoid plus correction-bias routing, and runs in MAX with multi-GPU tensor-parallel attention and expert-parallel MoE.\n\nRead more about model bringup with our agentic skills in the new [guide.](/p/2ff1044d37bb80759de3f913b30984d7)\n\n## Mojo 1.0 beta 2: stabilizing on path to release\n\nAs another step in our [path to Mojo 1.0,](https://www.modular.com/blog/the-path-to-mojo-1-0) this release includes Mojo 1.0 beta 2. This update focuses on refinement and stabilization You’ll soon start to see markers in the nightlies for the Mojo standard library stating which interfaces are stable, and we’ll be expanding that surface as we draw closer to the 1.0 release.\n\nThere are also several language improvements since beta 1:\n\n- Many common collection types like\n`List[T]`\n\nno longer require their contents to be `Copyable`\n\n. This makes collections more generic containers across a broader set of element types.. - We've removed the redundant function argument for the\n`enqueue_function`\n\n. This makes accelerator programming more succinct by only requiring the kernel to be specified once. - We are continuing to invest in the Python -> Mojo interop and part of that is reducing the overhead when Python is calling into Mojo for many common use cases.\n- Mojo’s reflections are now more ergonomic and more tightly integrated with the standard library.\n\n## Get started with 26.4\n\nModular 26.4 is available now, with new model support, new agent skills, SOTA MoE in MAX, Mojo 1.0 Beta 2, and more.\n\nInstall or upgrade to get started in minutes:\n\nWe only touched on the highlights in this release, for a deeper look at all the changes please check out our changelog:\n\nIf you’re building with Modular, join us on:\n\nShare your feedback on the Mojo 1.0 beta:\n\nWe’re excited to hear about what you build with 26.4, and with the Mojo beta - and [join us at ModCon on August 18th](https://www.modular.com/modcon) for much more.", "url": "https://wpnews.pro/news/modular-modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2", "canonical_source": "https://www.modular.com/blog/modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2-and-more", "published_at": "2026-06-18 00:00:00+00:00", "updated_at": "2026-06-18 16:35:43.840955+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-infrastructure", "ai-products"], "entities": ["Modular", "Modular Cloud", "MAX", "Mojo", "MiniMax M3", "GLM 5.2", "Kimi 2.7", "ModCon"], "alternates": {"html": "https://wpnews.pro/news/modular-modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2", "markdown": "https://wpnews.pro/news/modular-modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2.md", "text": "https://wpnews.pro/news/modular-modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2.txt", "jsonld": "https://wpnews.pro/news/modular-modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2.jsonld"}}