A tour of MLIR: The Dialect Stack Everyone Depends On MLIR, a compiler infrastructure framework, has become the foundation for numerous machine learning compilers including XLA, Triton, Mojo, Torch-MLIR, IREE, and ONNX-MLIR. It provides a reusable IR construction kit with dialects, enabling progressive lowering of tensor operations to machine code. The framework's design allows different domain-specific representations to share common infrastructure like SSA, pass management, and verification. A tour of MLIR: The Dialect Stack Everyone Depends On If you train or serve models, you depend on MLIR whether or not you have ever written a line of it. XLA lowers through it, Triton is built on it, Mojo is MLIR-native, and Torch-MLIR, IREE, and ONNX-MLIR exist to funnel their respective frontends into it. The reason a single piece of infrastructure ended up underneath so many otherwise-competing stacks is worth understanding, because it explains a lot about how modern ML compilers are actually built, and where their seams are. This post is a tour of MLIR: what it is, the dialect idea that makes it different, how a tensor operation is progressively lowered to machine code, and what the infrastructure does and does not give you. What MLIR Actually Is The common misconception is that MLIR is “another IR like LLVM IR.” It is better described as an IR construction kit . LLVM IR is a single, fixed, low-level representation: roughly a typed assembly with SSA values. That is the right abstraction for the last mile to machine code, and the wrong one for a matrix multiply over tensors. Historically, every domain compiler that needed a higher-level representation invented its own from scratch: XLA had HLO, Halide its own IR, TensorFlow its graph, each shipping with a separate pass manager, serialization format, verifier, and pile of bugs. 1 fn:1 MLIR’s premise is that those representations have far more in common than not, and that the common parts can be built once and shared: SSA, a CFG of regions and blocks, a pass infrastructure, a pattern rewriter, location tracking, and verification. What differs between domains is then expressed as a dialect . The unit of everything in MLIR is the Operation . An op has operands and results SSA values , a set of typed attributes compile-time constants like shapes or strides , and zero or more regions , which themselves contain blocks of further ops. That last property is what makes the IR genuinely multi-level: a single op can carry a whole nested computation, so a high-level linalg.generic and a low-level llvm.add are the same kind of object at different altitudes. Every op belongs to a dialect, which is simply a namespace for a related family of ops, types, and attributes. Stated as a grammar, the relationship is small and recursive. A dialect supplies vocabulary , the op names, types, and attributes, while the shape of an operation is universal. The form below is simplified from MLIR’s textual grammar to show the essential structure; the authoritative productions are in the Language Reference 2: ; A dialect is a namespace that contributes a family of ops, types, and attributes. dialect ::= operation-def | type-def | attribute-def ; The grammar of an operation is identical across every dialect. operation ::= result "," result "=" op-name " " operand "," operand " " attr-dict? region ":" type-signature op-name ::= dialect-name "." mnemonic ; e.g. linalg.matmul; quoted as a string in the generic form attr-dict ::= "{" attr-entry "," attr-entry "}" ; compile-time constants region ::= "{" block+ "}" block ::= operation+ ; ops hold regions hold ops - it recurses result ::= ssa-value ; %C operand ::= ssa-value ; %A, %B type ::= " " dialect-name "." mnemonic | builtin-type ; e.g. llvm.ptr; builtins: tensor<... , memref<... attr-entry ::= name "=" attribute-value ; e.g. 1 : i64, "foo", dialect.attr<... Two consequences fall out of this. First, the op name, the types, and the attributes are all namespaced by a dialect linalg.matmul , tensor<128x256xf32 , so “adding a dialect” extends the vocabulary without touching the grammar, which is exactly why the surrounding infrastructure can be dialect-agnostic. Second, because an operation may contain a region , and a region contains block s of further operation s, the structure nests without bound, and that recursion is what lets a single op carry an entire computation rather than one instruction. Dialects: One IR, Many Altitudes The defining feature is that dialects coexist . A module mid-compilation routinely holds ops from several dialects at once, and lowering is the gradual replacement of higher-level ops with lower-level ones until only a target dialect remains. The dialects an ML pipeline passes through, from high to low: High level what you mean : stablehlo and tosa whole-tensor operator sets , linalg structured ops over tensors/buffers , tensor value-semantic tensor manipulation . 3 fn:4 Mid level how it’s structured : memref buffers with layout/strides , affine and scf loop nests and structured control flow , vector SIMD , arith scalar math . 4 fn:3 Low level where it runs : llvm translated to LLVM IR for CPUs , gpu plus nvvm / rocdl GPU targets , spirv Vulkan/compute . The skill MLIR encodes is choosing when to drop from one altitude to the next. Stay high too long and you cannot express a hardware-specific schedule; drop low too early and you have thrown away the structure an optimizer needs. The word coexist is easy to gloss over, so here is a single function that uses four dialects at once. Nothing has been lowered yet; these ops simply live side by side in one SSA region, and the verifier checks them together: 1 2 3 4 5 6 7 8 9 10 11 12 // One region, four dialects; each line is tagged with the dialect it comes from. func.func @scale in place %buf: memref<1024xf32 , %a: f32 { %c0 = arith.constant 0 : index // 1. arith: loop bounds are scalar index constants %c1 = arith.constant 1 : index // 2. arith %n = arith.constant 1024 : index // 3. arith scf.for %i = %c0 to %n step %c1 { // 4. scf: a structured loop that carries a region %x = memref.load %buf %i : memref<1024xf32 // 5. memref: read from an explicit buffer %y = arith.mulf %x, %a : f32 // 6. arith: the scalar multiply memref.store %y, %buf %i : memref<1024xf32 // 7. memref: write the result back } return // 8. func: terminator } Four dialects, func , arith , scf , and memref , appear in one region with no impedance mismatch between them. A later pass might rewrite the scf.for into cf branches, or vectorize the body into the vector dialect, but at this altitude they simply compose. That composability is the point: dialects are not separate IRs you translate between, they are vocabularies you mix in a single program. graph TD subgraph FE "Frontends" SH "StableHLO / TOSA" TM "Torch-MLIR" end SH -- LIN TM -- LIN LIN "linalg + tensor