{"slug": "modular-translating-to-mojo-via-ai-agents", "title": "Modular: Translating to Mojo via AI Agents", "summary": "Modular released AI agent skills for its Mojo language that enable coding assistants to translate existing GPU kernels from CUDA and Triton into Mojo code, addressing the challenge that large language models lack training data on the young programming language. The skills, which can be installed with a single command, provide a lightweight adapter that corrects misconceptions and non-idiomatic patterns, allowing developers to leverage Mojo's performance advantages across NVIDIA, AMD, and Apple silicon GPUs while maintaining a familiar syntax.", "body_md": "Hippocratic AI + Modular to power real-time patient conversations. Read More →\n\nInference Products\n\nShared Endpoints\n\nAccess frontier models via an API\n\nDedicated Endpoints\n\nMission critical reliability\n\nCustom models\n\nYour model, peak performance\n\nDeployment Options\n\nOur Cloud\n\nFully managed, pay by usage\n\nYour Cloud\n\nModular stack in your VPC\n\nPricing\n\nFlexible plans for every team\n\nModels\n\nDeepSeek V4 Pro\n\nFLUX.2 Klein 9B\n\nKimi K2.6\n\nMiniMax M2.7\n\nWan 2.2 T2V A14B\n\nView All\n\nText to audio\n\nTurn text into natural speech\n\nImage generation\n\nGenerate images from text prompts\n\nCode generation\n\nGenerate production-ready code\n\nVideo generation\n\nGenerate video from text + image\n\nAgentic\n\nDeploy AI agents anywhere\n\nCustom Models\n\nKernel-level model control\n\nCase Studies\n\nProven results from real customers\n\nMAX Framework\n\nGenAI native modeling & serving\n\nMojo Language\n\nThe best GPU & CPU performance\n\nSelf-Hosted\n\nMAX+Mojo self-hosted by you\n\nCommunity\n\nBuild the future of AI together\n\nMojo Agent Skills\n\nOfficial AI agent skills from Modular\n\nDocs\n\nDeploy GenAI models, our cloud or yours\n\nModel Library\n\nLatest supported open models\n\nMojo Docs\n\nWrite high-performance kernels for CPUs and GPUs\n\nAbout\n\nBuild AI for anyone, anywhere.\n\nCareers\n\n👋 We’re currently hiring!\n\nCulture\n\nWhat we believe\n\nContact Us\n\nRequest a demo\n\nMay 13, 2026\n\nBrad Larson\n\nModular Team\n\nProduct\n\nAt Modular, we’re always experimenting with the latest agentic programming tools, integrating the best ones into our workflows, and learning quite a few lessons along the way. One thing we realized is that the Mojo language is ideally suited to the needs of modern AI coding agents.\n\nMojo has a familiar syntax with minimal boilerplate, so it’s token-efficient for agents to read and write. Its type system and constraint model catch many common errors at compile time. Rather than having an agent chew tons of tokens to build something that may or may not work, then spend hours debugging it when it doesn’t, Mojo catches problems early and provides clear error messages to the agents. This tighter feedback loop is one reason typed languages are increasingly favored for agentic workflows.\n\nMojo also doesn't trade ergonomics for performance. The same code that reads cleanly can target the full range of hardware Mojo supports, including NVIDIA, AMD, and Apple silicon GPUs.\n\nThe only challenge is that Mojo is still a young language and LLMs haven’t been trained on lots of Mojo code. That’s why we’ve created tools to help agents produce correct, idiomatic Mojo from the start. This starts with open-sourcing of all of the Mojo code we’ve written at Modular, along with all of our Python APIs for MAX.\n\nTaking this to the next level, Mojo now has coding agent skills that plug into most AI coding assistants. These skills act as a lightweight adapter that corrects misconceptions, out-of-date patterns, and non-idiomatic code that AI models might produce.\n\nTo get started with the skills, you can install them in any supported coding agent with a single command:\n\n```\nnpx skills add modular/skills\n```\n\nMojo’s killer application today is the ability to write high-performance and portable GPU kernels. However, most of the world’s kernel code today is written in CUDA and Triton (primarily for NVIDIA GPUs). Using our AI agent skills, you can quickly translate these existing kernels to Mojo.\n\nLet’s see an example of this in action.\n\nIn this example, we’ll start from Szymon Ożóg’s exploration of Softmax optimizations in CUDA. Clone this repository locally as a reference to follow along at home:\n\n```\ngit clone git@github.com:SzymonOzog/FastSoftmax.git\n```\n\nIn this tutorial, I’ll be using Claude Code with the Opus 4.7 model, but the same steps work with any frontier coding agent.\n\nFirst, let’s create a new Mojo project. Our new-modular-project skill handles project setup and walks you through choosing the right configuration.\n\nnew-modular-project\n\nStart a new session in your AI agent and begin by giving it this prompt:\n\n```\nI'd like to start a new Mojo project named mojo_softmax using a quick uv\nenvironment and the nightly build of Mojo.\n```\n\nYour agent should activate the new-modular-project skill and handle the project setup for you:\n\n```\n⏺ Skill(new-modular-project)                                                    \n  ⎿  Successfully loaded skill                                                \n                                                                                \n⏺ All options specified. Creating a quick uv environment with nightly Mojo.\n                                                                                \n  Listed 1 directory (ctrl+o to expand)                                         \n                                                                               \n⏺ Bash(mkdir mojo_softmax && cd mojo_softmax && uv venv && uv pip install mojo  \n      --index https://whl.modular.com/nightly/simple/ --prerelease allow)     \n  ⎿  Running…\n```\n\nand then report back when the new project is ready:\n\n```\n⏺ Bash(source .venv/bin/activate && mojo --version)                             \n  ⎿  Mojo 0.26.3.0.dev2026042105 (cfe16c5b)                                     \n                                                                                \n⏺ Project mojo_softmax is set up at ./mojo_softmax with a quick uv venv and nightly Mojo  \n  0.26.3.0.dev2026042105 installed.                                             \n                                                                                \n  To work in it:                                                                \n  cd mojo_softmax                                                               \n  source .venv/bin/activate\n```\n\nOnce the project is configured with the Mojo compiler and tools, we can begin translating the CUDA reference code to Mojo. Use a prompt like the following, adjusting the path to match where you cloned FastSoftmax:\n\n```\nIn the directory ./FastSoftmax, there's a CUDA example of a fast softmax\nI'd like to port to Mojo. When done, build it and test it to verify\nthat it works correctly.\n```\n\nYour output may vary by agent, but in a local run of this I saw:\n\n```\n⏺ I've read all the CUDA kernels. The file contains 10 progressive optimizations, with\n `softmax_kernel8` as the default - it uses online normalization (single-pass max+sum), float4\n vectorized loads, warp shuffles, and shared-memory cross-warp reduction.\n \n I'll port this to Mojo using shared-memory tree reductions (universal across GPU backends\n including Apple silicon) with coalesced strided access, matching the algorithm of kernels 3-8.\n```\n\nThe agent will then activate the mojo-syntax and mojo-gpu-fundamentals skills to write a fast_softmax.mojo file:\n\nmojo-syntax\n\nmojo-gpu-fundamentals\n\nfast_softmax.mojo\n\n```\n⏺ Now I'll load the Mojo skills to guide the port.                              \n                                                                                \n⏺ Skill(mojo-gpu-fundamentals)                                                  \n  ⎿  Successfully loaded skill                                                  \n                                                                                \n⏺ Skill(mojo-syntax)                                                            \n  ⎿  Successfully loaded skill                            \n\n⏺ Now I have enough understanding to write the Mojo port. I'll implement the    \n  softmax kernel, plus a test harness that verifies against a CPU       \n  reference.                                                                    \n                                                                                \n⏺ Write(fast_softmax.mojo)\n```\n\nWhen complete, the agent will build and test the Mojo implementation to verify its correct operation:\n\n```\n⏺ Bash(source .venv/bin/activate && mojo build fast_softmax.mojo 2>&1 | head -80)\n  ⎿  (No output)                                          \n                                  \n⏺ Build succeeded. Running the test now.                      \n                                                          \n⏺ Bash(source .venv/bin/activate && ./fast_softmax 2>&1 | head -40)                  \n  ⎿  row 0 sum = 0.9999997 drift from 1.0 = 2.9802322e-07 \n     row 1 sum = 0.99999994 drift from 1.0 = 5.9604645e-08                      \n     row 2 sum = 1.0000001 drift from 1.0 = 1.1920929e-07                       \n     … +8 lines (ctrl+o to expand)                        \n                                                                                \n⏺ The port is complete and verified working on the local GPU.\n```\n\nAlthough your exact project format might differ from mine, your mojo_softmax project should now include a correct Mojo kernel and tests that exercise it (fast_softmax.mojo in my case here).\n\nmojo_softmax\n\nOne key thing to note is that the original CUDA code for this example only worked on NVIDIA GPUs, but this new Mojo version runs on NVIDIA, AMD, and Apple silicon GPUs, as well as other future accelerators supported by Mojo. This is an incredibly fast way to bootstrap kernels, algorithms, and even models for new hardware.\n\nThis newly translated Mojo version should be functionally identical to the source, so you can read through the code to understand how concepts map from CUDA to Mojo. As a next step, you can also attempt to perform further Mojo-specific optimizations to this code, or tune it for the exact hardware you are working with. This also isn’t limited to using CUDA as a reference, the same process will work with Triton kernels in Python or code in various other languages.\n\nAs you’ve seen, you can rapidly translate existing reference code you may have in Python, CUDA, or many other sources to Mojo. Out of the box, this can even lead to concrete improvements. For example, when Automatika Robotics translated some CUDA and SYCL kernels used for autonomous navigation to Mojo, they saw immediate performance gains. In their own words:\n\n“Same workload we use in EMOS kompass-core: 5,001 trajectories × 1,000 points, 10 s horizon, 4 cost functions enabled.\n\nI should note that I have used Claude to translate my mojo kernels, using the official skills and no optimization work on the mojo side has been done yet. Hence the initial result is quite impressive.”\n\nMojo 1.0 beta 1 has just been released, and using a frontier AI coding agent with these skills is a great way to get your older Mojo projects up-to-date for the official 1.0 release later this year. We know that LLMs benefit from languages that don’t change much over time, which is one reason we’re stabilizing Mojo for 1.0.\n\nIn fact, we took a random sample of five community projects, installed these skills, and prompted Claude Opus 4.7:\n\nI’d like to update this project to the latest version of Mojo.\n\nIn all five cases, the agent correctly updated the entire project to build on the latest Mojo 1.0 beta 1 release with no other assistance.\n\nThese Mojo coding skills are available now. Here are three ways to put them to use:\n\nSpeed up your Python. If you have a Python function that's become a bottleneck, an agent with the Mojo skills can translate it to Mojo. Point it at the slow code, and it will produce an initial Mojo port you can drop in, profile, and tune. If you want to go further, the same code can target a GPU with minimal changes.\n\nReplace CUDA or Triton with Mojo. As the softmax demo shows, the skills handle the structural translation from CUDA to Mojo. The same process works for Triton kernels, as well as other kernel domain-specific languages. You get a portable starting point that runs on NVIDIA, AMD, and Apple silicon GPUs, without rewriting from scratch.\n\nGet involved. The skills themselves are open source. If you hit a pattern the current skills don't handle well, open an issue or contribute a fix. The more real-world Mojo code agents encounter in the wild, the better they get at writing it.\n\nInstall the skills with npx skills add modular/skills and let us know what you build in the Modular forum.\n\nnpx skills add modular/skills\n\nInkwell: Why Your Inference Platform Matters As Much As Your Model\n\nMay 12, 2026\n\nModular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more\n\nMay 7, 2026\n\nDay Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD\n\nApril 2, 2026\n\nBuild the future of AI with Modular\n\nSign up today\n\nSignup to our Cloud Platform today to get started easily.\n\nBrowse open models\n\nBrowse our model catalog, or deploy your own custom model\n\nGet all our latest news, announcements and updates delivered directly to your inbox. Unsubscribe at anytime.\n\n⚠️ This form requires JavaScript to function. Please enable JavaScript in your browser to continue.\n\nThanks for signing up to our newsletter! 🚀\n\nThank you,\n\nModular Sales Team", "url": "https://wpnews.pro/news/modular-translating-to-mojo-via-ai-agents", "canonical_source": "https://www.modular.com/blog/translating-to-mojo-via-ai-agents", "published_at": "2026-05-13 00:00:00+00:00", "updated_at": "2026-05-29 23:57:51.714178+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools", "ai-products", "generative-ai"], "entities": ["Modular", "Hippocratic AI", "Mojo", "MAX Framework", "DeepSeek V4 Pro", "FLUX.2 Klein 9B", "Kimi K2.6", "MiniMax M2.7"], "alternates": {"html": "https://wpnews.pro/news/modular-translating-to-mojo-via-ai-agents", "markdown": "https://wpnews.pro/news/modular-translating-to-mojo-via-ai-agents.md", "text": "https://wpnews.pro/news/modular-translating-to-mojo-via-ai-agents.txt", "jsonld": "https://wpnews.pro/news/modular-translating-to-mojo-via-ai-agents.jsonld"}}