How a pure-TypeScript flex layout engine closed the last WASM-Yoga gap

The article describes how the author built Pilates, a pure TypeScript flex layout engine for terminal UIs, which outperforms WASM Yoga (the engine used by Ink) across all nine benchmark scenarios. The performance gains were achieved through two key algorithmic improvements—replacing cumulative sum calculations with a linear recurrence and folding default style values into compile-time constants—rather than through native or WASM bindings. The author concludes that pure TypeScript can be competitive with native code on this specific workload, as the bottleneck was algorithmic inefficiency rather than raw computation speed.

TL;DR I've been building Pilates https://github.com/pilatesjs/pilates , a flex layout engine for terminal UIs in pure TypeScript. As of last week, across the 9 scenarios in my bench suite, the pure-TS engine is faster than WASM Yoga the engine Ink uses on each — including the structural-mutation workload append + remove a row per frame Yoga led on by ~5× until phases 15–17 closed it. That flipped to a ~1.7× Pilates win, in pure TypeScript. No native bindings. No WASM port. The fix was algorithmic, and the algorithmic fix worked in TS. The numbers Median latency, win32-x64, Node 22, ~5s tinybench windows with bootstrap CI95: | Scenario | Pilates | yoga-layout WASM | Ratio | |---|---|---|---| | tiny 10 nodes | 4.5µs | 19.0µs | 4.2× faster | | realistic ~100 | 121µs | 328µs | 2.7× faster | | stress ~1000 | 601µs | 1.94ms | 3.2× faster | | big ~5000 | 3.32ms | 9.17ms | 2.8× faster | | huge ~10000 | 8.62ms | 18.5ms | 2.1× faster | | hot-relayout | 16.3µs | 83.0µs | 5.1× faster | | hot-relayout + boundaries | 15.8µs | 77.8µs | 4.9× faster | | hot-relayout text mutation | 8.9µs | 90.6µs | 10× faster | hot-structural | 71.3µs | 118.3µs | 1.7× faster | Caveats up front: 9 hand-picked scenarios, not a universal claim. Reproduce with pnpm bench — about 5 minutes on a recent machine. Why pure TS can beat WASM here Terminal UI is a curiously hostile workload for a WASM engine. Trees are small 10–10,000 nodes , but updates are frequent — one keystroke, one tick, one frame. The crossing cost from JS into WASM dominates: Yoga's per-call kernel is a few microseconds, but node.setWidth N from JS to WASM is also a few microseconds. A pure-TS engine pays no crossing cost. That was the thesis going in. Phases 15–17 are evidence the thesis holds even in the worst case — the workload where Yoga's compute kernel is exactly what's being measured, with the tree pre-built and only the structural-mutation layout timed. How hot-structural went from ~450µs to ~70µs Two algorithmic changes did the work. 1. Linear-recurrence main-axis positions The original main-axis position rule was a cumulative sum: each cell's position depended on the size of every prior sibling. A 100-cell row in the stress fixture meant ~300 dependency edges per row. // Old rule — every cell reads every prior sibling mainPos N = sum siblings 0..N-1 .mainSize + margin + gap Replaced with a linear recurrence — each cell only reads the cell immediately before it: // New rule — each cell only reads the previous one mainPos N = mainPos N-1 + prev.mainSize + prev.marginEnd + me.marginStart + gap Reverse-direction row-reverse / column-reverse keeps the cumulative-sum fallback because the recurrence depends on the prior cell's already-resolved position, which doesn't hold when iteration is reversed. 2. Fold default-valued style inputs Observation: roughly half of all input fields in the grammar were sitting at default values forever — margin: 0 , minWidth: 0 , maxWidth: undefined , etc. They still consumed dirty-flag slots, propagated through dependents, and appeared in dependency sets. Phase 17 folds these defaults into compile-time constants at grammar-build time. Each per-cell node went from ~15 fields to ~7. The classifier's nodeSig was extended with fold-predicate bits so that mutating from default → non-default correctly triggers a structural rebuild. Combined, hot-structural went from ~450µs to ~70µs. Why pure TS over a native rewrite I considered porting the engine to a native-compiled-to-WASM language before doing the algorithmic work. Glad I didn't. Yoga's advantage wasn't speed of arithmetic — its C++ kernel is fast and well-tuned, but speed of arithmetic wasn't the bottleneck on this workload. The advantage was the structural-mutation algorithm: Yoga handled it natively, the pure-TS engine was redoing too much work per mutation. A native-compiled port from my side would have inherited the same algorithmic shape and reached parity at best. The fix was algorithmic, and the algorithmic fix worked in TypeScript. "Pure TS is competitive with native code on this workload" is the actually-interesting result. Validation, including a same-day hotfix story - 1,470 unit + integration tests pass - Structural-differential fuzzer green at 3,000 runs - 33 Yoga oracle fixtures cell-for-cell comparison - Byte-identical cached-vs-cold differential mode at 833 runs A small incident worth mentioning: within hours of publishing 2.0.0, the fast-check property fuzzer caught a real bug — createStyleDirtier was throwing on a node whose entire style had been folded out, a case my analysis said couldn't happen. The fuzzer immediately found it. 2.0.1 shipped same day with the fix and a pinned regression test, and 2.0.0 was deprecated on npm pointing at 2.0.1. Property-based fuzzing earns its keep. I had been on the fence about whether the fuzzer was worth maintaining; this answered it. API stability Public calculateLayout is byte-identical between 1.x and 2.x. The SemVer-major bump reflects internal API and memory-characteristic shifts: - Typed-array runtime Field.id integer + array storage replacing Map<Field, X - LayoutPool grows unbounded tried FinalizationRegistry-based recycling in phase 15C; caused 2× regression so removed - Per-property dirty bitmask replacing single dirty bool - Linear recurrence + fold default values the algorithmic changes above If you're using only the documented public API, you upgrade and the speedup is transparent. Try it git clone https://github.com/pilatesjs/pilates cd pilates pnpm install pnpm bench ~5 min Or install the engine directly: npm install @pilates/core Full React stack reconciler + widgets : npm install @pilates/react @pilates/widgets react Adversarial benchmarks are very welcome — if there's a workload where this approach breaks down, I'd genuinely like to find it. That's the most valuable feedback the project can get right now. Repo MIT : https://github.com/pilatesjs/pilates https://github.com/pilatesjs/pilates npm: https://www.npmjs.com/package/@pilates/core https://www.npmjs.com/package/@pilates/core