From Julia to Rust: a differentiable tensor stack for scientific computing The tensor4all team, led by Hiroshi Shinaoka of Saitama University, has released tenferro-rs, a Rust-native dense tensor stack for scientific computing that supports linear algebra, autodiff, einsum, FFT, and CPU/CUDA backends. The project was motivated by the team's shift from Julia to Rust as AI-generated code becomes more prevalent, prioritizing correctness guarantees through Rust's ownership and type system over human readability. From Julia to Rust: a differentiable tensor stack for scientific computing in the agentic AI era tenferro-rs https://github.com/tensor4all/tenferro-rs is a Rust-native dense tensor stack: linear algebra, PyTorch-style eager autodiff, JAX-style traced transforms, NumPy-style einsum, FFT, extensible operation crates, and explicit CPU/CUDA backends. The first crates are on crates.io as of June 23, 2026 JST . by Hiroshi Shinaoka Saitama University , for the tensor4all team Most tensor-network code has been written in Julia, and ours was no exception. ITensors and the surrounding ecosystem are good for prototyping: the code stays close to the math, and it is easy to iterate. Our own work on the IR basis, sparse modeling, and the tensor4all https://tensor4all.org tensor-cross-interpolation and quantics stack started there. Once the codebase gets large, though, Julia development starts to slow down: type instability that only shows up at run time, compile and precompile times that stretch the edit/test loop, and the sense that correctness gets harder to check as the code grows. When we started fitting the tensor-network stack into a larger system, that became hard to ignore. We began moving the compute engine to Rust. That immediately exposed a second problem. The tensor library we wanted to build on was not there yet. Rust has libraries for individual jobs: ndarray for arrays, Burn for deep learning, faer for linear algebra. What was missing was a tensor layer that could cover autodiff through einsum and still feel usable for scientific computing. The goal was not to replace those libraries. It was to connect the pieces that already exist. The Rust ecosystem has changed a lot in the last few years. crates.io went from 602 crates in 2015 to roughly 210,000 in 2026 data https://github.com/shinaoka/rust crate count . For dense linear algebra there is faer; for GPU kernels, CubeCL https://github.com/tracel-ai/cubecl ; for generic numerics, num-traits and num-complex . There are also libraries at nearby layers: ndarray for arrays, nalgebra and faer for linear algebra, Burn and candle for deep learning, and numr for a NumPy-style array API. What we needed was the layer between them: a scientific-computing tensor stack with column-major storage, dynamic shapes, eager and traced autodiff, einsum, FFT, CPU/CUDA backends, and extensible operations. That is what tenferro-rs is for. We build on faer and CubeCL and add the missing parts instead of reinventing them. Porting SparseIR.jl and Julia tensor-network code made it clearer where that missing layer was. That is the background for tenferro-rs https://github.com/tensor4all/tenferro-rs . This post explains why we are building it, and why we chose Rust now that code is no longer written only by humans. Why Rust now, when Julia was fine before? A couple of years ago I would probably have told students to start with Julia. Julia code can stay close to the math, memory management is easy, and the numerical libraries were already there. Rust had more to learn, and the ecosystem was still missing pieces. I would not give the same advice now. Not because Rust changed, but because I am no longer the one writing most of the code. Fortran, Python, and Julia all developed around lowering the cost for humans to write, read, and maintain code by hand. Readability, a REPL, notation close to the math, and a low barrier to entry all matter for that. When AI writes more of the code, the tradeoff changes. Writing speed matters less. Much of the learning cost can be handled by the agent. But “it reads like the math” still does not guarantee correctness: aliasing, mutation, and allocation are not visible from the surface of a line. For us, the question stopped being “how fast can a human write this?” and became “how confident can we be that it is correct?”. That reframing is why Rust became the more practical choice. Concretely: - Ownership and types rule out a wide range of errors at compile time. cargo check answers in seconds, so when the agent gets something wrong, we find out before running the program. - Cargo handles builds, dependency resolution, tests, and benchmarks in one place. No CMake, no link-time version conflicts. A from-scratch build of the full stack plus dependencies takes a couple of minutes on a laptop, and the edit/test loop is tens of seconds. - Rust controls symbol visibility along module and crate boundaries. An agent can only work inside a layer; it cannot reach into another crate’s internals and quietly break the abstraction. In an AI-written codebase of about 130K lines, that boundary matters. - Lifetimes and ownership mechanics can largely be left to the agent, so human attention goes to algorithms, design, and correctness. The early learning cost that used to count against Rust is less of a problem now. In C++, Python, and Julia, large codebases tend to come with the worry that they are becoming too hard to verify. With Rust, that worry is noticeably smaller. From a port to a stack We did not set out to build a general tensor library. We wanted to port the pieces we needed and spend less time fighting the tools. But as the implementation progressed, it became clear that autodiff, backends, and the way new operations are added should not be trapped inside a tensor-network-specific layer. We designed the shared parts as an independent tensor stack. - Operation families live in their own crates rather than inside one all-in-one tensor type. - Autodiff rules live outside the tensor type. Following the Julia/ChainRules lesson, the derivative rule belongs to the operation itself, not to one concrete tensor class. The AD substrate, tidu-rs https://github.com/tensor4all/tidu-rs , is generic, and the tensor type is just one consumer of it. - Backends and devices are explicit. Nothing silently moves data between CPU and GPU. We also keep separate the question of which backend can execute an operation and which devices are available at run time. - Storage is column-major, matching Fortran, Julia, MATLAB, and LAPACK/BLAS. Row-major data can still be handled through strided views, without unnecessary eager copies. That design makes the stack useful outside tensor networks too. tenferro-rs in two minutes The stack gives you typed tensors, immediate eager execution with backward , traced graphs with grad / vjp / jvp /HVP, linear algebra, einsum, FFT, and explicit CPU and CUDA backends plus experimental WebGPU . Here is PyTorch-style eager autodiff for sum x² , whose gradient is 2x , copied verbatim from the repo’s eager autodiff pytorch style.rs https://github.com/tensor4all/tenferro-rs/blob/main/docs/tutorial-code/src/bin/eager autodiff pytorch style.rs : use tenferro ad::{EagerRuntime, Tensor}; fn assert close actual: & f64 , expected: & f64 { assert eq actual.len , expected.len ; for index, actual, expected in actual.iter .zip expected .enumerate { let error = actual - expected .abs ; assert error < 1.0e-12, "value {index}: actual={actual}, expected={expected}, error={error}" ; } } fn main - Result< , Box