{"slug": "optimizing-llvm-s-bump-allocator", "title": "Optimizing LLVM's Bump Allocator", "summary": "LLVM merged three optimizations to its BumpPtrAllocator, reducing overhead in the fast path by skipping unnecessary realignment, removing a null check via a sentinel end pointer, and eliminating per-allocation byte accounting. The changes improve performance for Clang, lld, and other LLVM tools that rely on arena allocation.", "body_md": "`BumpPtrAllocator`\n\nis LLVM's bump allocator (arena\nallocator): each allocation bumps a pointer within a slab, and\neverything is freed at once when the allocator dies. It backs Clang's\n`ASTContext`\n\n, lld's `make<T>`\n\nobject pools,\nTableGen records, and many other arenas.\n\nHere is the fast path before three recent changes:\n\n```\n12345678910111213141516\n```\n\n | \n\n```\n__attribute__((returns_nonnull)) void *Allocate(size_t Size, Align Alignment) {  BytesAllocated += Size;                               // (3) accounting RMW  uintptr_t AlignedPtr = alignAddr(CurPtr, Alignment);  // (1) always realign  size_t SizeToAllocate = Size;#if LLVM_ADDRESS_SANITIZER_BUILD  SizeToAllocate += RedZoneSize;#endif  uintptr_t AllocEndPtr = AlignedPtr + SizeToAllocate;  if (LLVM_LIKELY(AllocEndPtr <= uintptr_t(End)                  && CurPtr != nullptr)) {              // (2) bound + null check    CurPtr = reinterpret_cast<char *>(AllocEndPtr);    ...    return reinterpret_cast<char *>(AlignedPtr);  }  return AllocateSlow(Size, SizeToAllocate, Alignment);}\n```\n\n |\n\nThree changes streamline the three marked lines.\n\n## A minimum\nalignment skips the realign ([#205240](https://github.com/llvm/llvm-project/pull/205240))\n\n`alignAddr(CurPtr, Alignment)`\n\nis wasteful: a\nfreshly-bumped pointer is usually aligned enough already. [#205240](https://github.com/llvm/llvm-project/pull/205240)\nrounds each size up to `MinAlign`\n\n(default 8), so the fast\npath realigns only for over-aligned requests. I've learned the trick\nfrom [Bump\nAllocation: Up or Down?](https://coredumped.dev/2024/03/25/bump-allocation-up-or-down/):\n\n```\n1234567\n```\n\n | \n\n```\n// Optimized to a constantSizeToAllocate = alignToPowerOf2(SizeToAllocate, MinAlign);uintptr_t AlignedPtr = uintptr_t(CurPtr);// For the common `alignof(T) <= 8` case the branch drops out entirely.if (Alignment.value() > MinAlign)  AlignedPtr = alignAddr(CurPtr, Alignment);\n```\n\n |\n\n`SpecificBumpPtrAllocator<T>`\n\nuses\n`MinAlign = 1`\n\ninstead — `DestroyAll`\n\nstrides at\n`sizeof(T)`\n\n, so it needs tight packing, not rounding.\n\nI made a mistake in the first attempt: `nullptr`\n\nplus a\nnon-zero offset triggered a UBSan diagnostic. Fixed by keeping the math\nin the `uintptr_t`\n\ndomain.\n\n## A sentinel End drops\nthe null check ([#205485](https://github.com/llvm/llvm-project/pull/205485))\n\n`__attribute__((returns_nonnull))`\n\nspecifies the return\nvalue is non-null. In a fresh allocator whose `CurPtr`\n\nand\n`End`\n\nare both null, `Allocate(0)`\n\nused to return\nnull. In 2022, [https://reviews.llvm.org/D125040](https://reviews.llvm.org/D125040) added the\n`&& CurPtr != nullptr`\n\ncheck to the fast path\ncondition, which was not ideal.\n\nI tried \n\n```\n123\n// Fast path check. The condition also fails for a fresh allocator (End ==// nullptr) to avoid a separate null check.if (LLVM_LIKELY(AlignedPtr + SizeToAllocate - 1 < uintptr_t(End))) { ... }\n```\n\nbut then adopted aengelke's suggestion. Storing the end as a sentinel\none past the real end (`EndSentinel = realEnd + 1`\n\n, and\n`0`\n\nwhen there is no slab) folds both conditions into one\nunsigned compare:\n\n```\n1\n```\n\n | \n\n```\nif (LLVM_LIKELY(AllocEndPtr < EndSentinel)) { ... }\n```\n\n |\n\nAn empty allocator has `EndSentinel == 0`\n\n, so\n`AllocEndPtr < 0`\n\nis always false and the null case falls\nthrough to the slow path with no separate branch.\n\n## Dropping the\nper-allocation accounting ([#205711](https://github.com/llvm/llvm-project/pull/205711))\n\n`BytesAllocated += Size`\n\nwas a read-modify-write to a\nmember on every allocation, backing a `getBytesAllocated()`\n\nthat reported *requested* bytes — distinct from\n`getTotalMemory()`\n\n's slab capacity. It had only\nstats/diagnostic consumers: lldb's ConstString memory report, a clangd\ndebug log, TableGen's `dumpAllocationStats`\n\n, and one clang\nregression test. Dropping the member and migrating those consumers\n(mostly to `getTotalMemory()`\n\n) removes the hot-path\nstore.\n\n**A detail: the red zone and ABI.** The ASan red-zone\nsize is also a member. Gating it on\n`#if LLVM_ADDRESS_SANITIZER_BUILD`\n\nto drop it in release\nbuilds would be an ABI footgun: that macro is *per translation\nunit*, so an ASan-instrumented TU and a non-ASan\n`libLLVM`\n\nwould silently disagree on the struct layout. The\nmember is instead gated on `LLVM_ENABLE_ABI_BREAKING_CHECKS`\n\n,\nwhich is fixed per library build and link-time-enforced (via the\n`EnableABIBreakingChecks`\n\nsymbol); the red-zone arithmetic is\nthen gated on both macros.\n\nCombined, the fast path becomes:\n\n```\n1234567891011121314151617\n```\n\n | \n\n```\nvoid *Allocate(size_t Size, Align Alignment) {  size_t SizeToAllocate = Size;#if LLVM_ADDRESS_SANITIZER_BUILD && LLVM_ENABLE_ABI_BREAKING_CHECKS  SizeToAllocate += RedZoneSize;#endif  SizeToAllocate = alignToPowerOf2(SizeToAllocate, MinAlign);  uintptr_t AlignedPtr = uintptr_t(CurPtr);  if (Alignment.value() > MinAlign)    AlignedPtr = alignAddr(CurPtr, Alignment);  uintptr_t AllocEndPtr = AlignedPtr + SizeToAllocate;  if (LLVM_LIKELY(AllocEndPtr < EndSentinel)) {    CurPtr = reinterpret_cast<char *>(AllocEndPtr);    ...    return reinterpret_cast<char *>(AlignedPtr);  }  return AllocateSlow(Size, SizeToAllocate, Alignment);}\n```\n\n |\n\n## Generated assembly\n\nAllocating a typical arena object — a 24-byte, 8-aligned node via\n`Allocate<T>()`\n\n— compiles to a six-instruction fast\npath (`clang -O2`\n\n, release):\n\n```\n123456\n```\n\n | \n\n```\nmov  rax, [rdi]        # CurPtr (also the return value)lea  rcx, [rax + 0x18] # new = CurPtr + 24cmp  rcx, [rdi + 0x8]  # vs EndSentineljae  .slowmov  [rdi], rcx        # CurPtr = newret\n```\n\n |\n\nThat matches the canonical bump fast path. A\n*downward*-bumping allocator would not need the\n`rax`\n\n/`rcx`\n\ndistinction — one fewer live value,\nbut the instruction count stays the same. LLVM bumps upward by design:\n`identifyObject`\n\n, allocation order, and\n`SpecificBumpPtrAllocator::DestroyAll`\n\n's forward\n`sizeof(T)`\n\nstride all assume it. The remaining gap is space,\nnot instructions.\n\n## Aggregate compile-time impact\n\nThese changes shrink `Allocate`\n\nbelow the inliner's cost\nthreshold, so its callers (e.g. `new (Context) T`\n\n) inline at\nsites that previously called out of line. Executed instructions fall —\nbut as a *redistribution*: object files where the chain now\ninlines grow, while the rest shrink slightly from the dropped store.\n\nThe performance win is larger at stage2 (built by stage1 Clang) than at stage1 (built by system GCC).\n\nReverting all three on top of `main`\n\nisolates their\ncombined effect ([compare](https://llvm-compile-time-tracker.com/compare.php?from=dbd070fbd793c8a9129044abd669466e87d2ea8e&to=3a7d64a882421052101899d7d9c23685db5fd355&stat=instructions:u)):\n\nSignificant (≥3σ vs. measured noise): 🟢 improvement. Unmarked = within noise.\n\n| Configuration | instructions:u | max-rss |\n|---|---|---|\n| stage1-O3 | −0.04% | +0.04% |\n| stage1-ReleaseThinLTO | −0.04% | −0.01% |\n| stage1-ReleaseLTO-g | −0.04% | +0.06% |\n| stage1-O0-g | 🟢 −0.09% | +0.25% |\n| stage1-aarch64-O3 | −0.04% | +0.04% |\n| stage1-aarch64-O0-g | 🟢 −0.12% | −0.01% |\n| stage2-O3 | 🟢 −0.14% | −0.15% |\n| stage2-O0-g | 🟢 −0.36% | −0.06% |\n\n## Takeaways\n\n- A bump allocator's fast path is a few instructions of real work wrapped in a realign and accounting; each can be hoisted out of the common case.\n- Encoding \"empty\" as a\n`0`\n\nsentinel folds a null check into the bound compare. - The measurable instruction-count win is the inlining a cheaper\n`Allocate`\n\nunlocks, not the removed micro-op — and it appears as a size*redistribution*, not a uniform shrink. - A layout-affecting member may key on\n`LLVM_ENABLE_ABI_BREAKING_CHECKS`\n\n(link-enforced) but never on the per-TU`LLVM_ADDRESS_SANITIZER_BUILD`\n\n.", "url": "https://wpnews.pro/news/optimizing-llvm-s-bump-allocator", "canonical_source": "https://maskray.me/blog/2026-06-28-optimizing-llvm-bump-allocator", "published_at": "2026-06-29 04:25:09+00:00", "updated_at": "2026-06-29 04:58:24.695953+00:00", "lang": "en", "topics": ["developer-tools"], "entities": ["LLVM", "Clang", "lld", "TableGen", "lldb", "clangd", "ASan", "aengelke"], "alternates": {"html": "https://wpnews.pro/news/optimizing-llvm-s-bump-allocator", "markdown": "https://wpnews.pro/news/optimizing-llvm-s-bump-allocator.md", "text": "https://wpnews.pro/news/optimizing-llvm-s-bump-allocator.txt", "jsonld": "https://wpnews.pro/news/optimizing-llvm-s-bump-allocator.jsonld"}}