# Modular: Modular 26.4: SOTA MoE Serving, Model Bringup via Agent Skills, Mojo Beta 2 and More

> Source: <https://www.modular.com/blog/modular-26-4-sota-moe-serving-model-bringup-via-agent-skills-mojo-beta-2-and-more>
> Published: 2026-06-18 00:00:00+00:00

Modular 26.4 brings state-of-the-art mixture-of-experts (MoE) serving to Modular Cloud, expands MAX support for the newest open-weight models, and takes another step toward Mojo 1.0.

Modular Cloud is expanding and now supports the latest frontier models such as [MiniMax M3](https://www.modular.com/blog/day-zero-minimax-m3-open-weights-on-modular-cloud), [GLM 5.2](https://www.modular.com/models/glm-5-2), and [Kimi 2.7](https://www.modular.com/models/kimi-k2-6). Modular Cloud is built on top of the 26.4 release which adds support for new model architectures, enhances quantization and speculative decoding capabilities, improves OpenAI API compatibility, extends Apple silicon GPU support, and makes MAX more accessible via `modular/skills`

for agentic model bring-up.

💡We’ll share much more about what’s coming for Mojo, MAX and Modular Cloud at our

[ModCon conference](https://www.modular.com/modcon): join us August 18th in San Francisco.

**Serve SOTA MoEs on Modular Cloud**

All frontier models today are MoE based. The MoE architecture means that while the model is large (in the hundreds of billions or trillions of parameters), only a few of those parameters are active at any time. This sparse activation is further extended by relying on sparse activations of KVCache blocks. The large size and sparsity makes these models more difficult to serve, since it requires cross stack optimizations from the cloud to the kernels. In Modular Cloud we've carefully tuned modules such as Gemma 4, Deepseek, GLM, MiniMax, and Kimi to ensure peak performance of these models.

New MoE models available through Modular Cloud include:

Modular Cloud gives you access to [500+ model architectures](https://www.modular.com/models) for different use cases from agentic coding, multi-turn chat, to vision and video generation. [Request access](https://console.modular.com/signup?utm_source=26.4releaseBlog) to Modular Cloud today.

## MAX: New models, more capabilities, faster bring-up

In the prior [26.3](https://www.modular.com/blog/modular-26-3-mojo-1-0-beta-max-video-gen-and-more?utm_source=26.4releaseBlog) release, we introduced distributed-aware tensors and initial pieces of Modular native agentic tooling. In MAX 26.4, we continued our investments to expand the capabilities of the MAX framework and improve development experience.

MAX underpins the capabilities of Modular Cloud and with MAX 26.4, we've added additional model coverage and serving machinery. This includes:

- New model architectures such as
`GlmMoeDsaForCausalLM`

, `LFM2ForCausalLM `

and `HYV3ForCausalLM `

are now supported in MAX. `KimiK25ForConditionalGeneration `

extends to support both Kimi 2.6 and 2.7 as well as support for different speculative decoders such as Eagle3 and DFlash.**Improve OpenAI API compatibility:** MAX 26.4 adds support for the `developer`

role, aligns reasoning output with the Responses API, improves structured-output handling, and adds compatibility flags so real-world requests is less likely to fail on minor request differences.**Wider quantization coverage:** Models such as Gemma4 and FLUX.2-Klein can now run using eitherboth FP8 or FP4 weights. **Apple silicon GPU:** MAX now supports many common model architectures, including Qwen 3.6 and Gemma 4, on M3 and newer Apple silicon GPUs. We’re continuing to improve Apple silicon support across Mojo and MAX, so check out the nightlies for the latest capabilities and best performance.**Cleaner MAX APIs and migration notes:** 26.4 moves [MAX modules](https://docs.modular.com/max/api/python/) into clearer homes and removes older legacy types. Most changes include deprecation shims, so existing code should keep working while you migrate.

See [the MAX changelog](https://docs.modular.com/max/changelog/#v264-2026-06-18) for the full list.

### Agentic model bring up with `modular/skills`

Developers often ask us how they can bring their own models into MAX to enjoy these features. In 26.4, we’ve released the [import-model](https://github.com/modular/skills/tree/main/import-model) and [debug-model](https://github.com/modular/skills/blob/main/debug-model/SKILL.md) skills, which enable importing your models into MAX with agents. The skill can be installed via:

These skills guide an AI coding agent through a repeatable model bring-up workflow:

**Decide and plan** by inspecting the Hugging Face config and modeling code.**Implement the model** by scaffolding from the closest existing MAX architecture.**Verify its outputs** by running a layer-by-layer logit-divergence hunt against the reference implementation until outputs match.

The result is a fast and practical path from a Hugging Face model ID to a working MAX architecture that’s ready to deploy. To demonstrate this, we've brought up Tencent’s [Hunyuan Hy3-preview](https://huggingface.co/tencent/Hy3-preview) model into MAX using these agent skills. The model uses 192 routed experts with sigmoid plus correction-bias routing, and runs in MAX with multi-GPU tensor-parallel attention and expert-parallel MoE.

Read more about model bringup with our agentic skills in the new [guide.](/p/2ff1044d37bb80759de3f913b30984d7)

## Mojo 1.0 beta 2: stabilizing on path to release

As another step in our [path to Mojo 1.0,](https://www.modular.com/blog/the-path-to-mojo-1-0) this release includes Mojo 1.0 beta 2. This update focuses on refinement and stabilization You’ll soon start to see markers in the nightlies for the Mojo standard library stating which interfaces are stable, and we’ll be expanding that surface as we draw closer to the 1.0 release.

There are also several language improvements since beta 1:

- Many common collection types like
`List[T]`

no longer require their contents to be `Copyable`

. This makes collections more generic containers across a broader set of element types.. - We've removed the redundant function argument for the
`enqueue_function`

. This makes accelerator programming more succinct by only requiring the kernel to be specified once. - We are continuing to invest in the Python -> Mojo interop and part of that is reducing the overhead when Python is calling into Mojo for many common use cases.
- Mojo’s reflections are now more ergonomic and more tightly integrated with the standard library.

## Get started with 26.4

Modular 26.4 is available now, with new model support, new agent skills, SOTA MoE in MAX, Mojo 1.0 Beta 2, and more.

Install or upgrade to get started in minutes:

We only touched on the highlights in this release, for a deeper look at all the changes please check out our changelog:

If you’re building with Modular, join us on:

Share your feedback on the Mojo 1.0 beta:

We’re excited to hear about what you build with 26.4, and with the Mojo beta - and [join us at ModCon on August 18th](https://www.modular.com/modcon) for much more.