{"slug": "making-devenv-start-fast-and-the-whole-nixpkgs-with-it-devenv", "title": "Making devenv start fast, and the whole nixpkgs with it - devenv", "summary": "Devenv startup overhead is traced to Nixpkgs' dynamic linker stat storm, where programs like `devenv hook-should-activate` suffer 70ms delays from 486 failing `openat()` calls per invocation. The Nix package manager's per-dependency `DT_RUNPATH` design forces the loader to search dozens of directories for each shared library, a problem that has remained unfixed for a decade. The article explores potential fixes, including static linking, to eliminate the dynamic loader entirely.", "body_md": "# Making devenv start fast, and the whole nixpkgs with it\n\nI'm sitting here next to [Farid Zakaria](https://github.com/fzakaria) at [Tacosprint](https://tacosprint.org) where we looked at the stat storm that has been haunting nixpkgs for a decade.\n\n[devenv auto activation](../../../../../auto-activation/) runs ```\ndevenv\nhook-should-activate\n```\n\non every shell prompt to decide whether you've stepped\ninto a project directory. It does almost nothing: discover the project, check\nthe trust database, print a path. So its runtime is pure startup overhead, and\nit runs on every single prompt redraw.\n\n``` bash\n$ time devenv hook-should-activate\n/home/domen/dev/myproject\nreal    0m0.070s\n...\n```\n\n70ms before a prompt, every prompt.\n\nAnd this isn't devenv's tax to pay, it's nixpkgs'. Every program pays it before it runs a line of its own code: the dynamic loader has to find each shared library, and the way Nix scatters packages across the store makes that search slow. This is not news. The cost has been measured, written up, and partly fixed more than once, and yet it has sat in limbo for the better part of a decade with no general fix merged into nixpkgs.\n\nMost of that is the dynamic loader looking for a shared object that is sitting\nright there in the store, just not in the first directory it tried. The loader\nknocks on 486 wrong doors before it finds the right ones, and almost all of it\nhappens before `main`\n\neven starts.\n\nThat number is the whole game. Above ~30ms you have to bolt a caching layer on top of the hook; in single digit milliseconds you just run it on every prompt and throw the cache away.\n\nAnd it scales with the closure: `imagemagick`\n\n's `magick --version`\n\nmakes **1225** failing opens:\n\n``` bash\n$ strace -f -e openat magick --version 2>&1 >/dev/null | grep '\\.so' | grep -c ENOENT\n1225\n```\n\nThe community has been circling a real fix for years. This post walks through the problem, the approaches people have tried with their tradeoffs, and a more radical one we spiked for devenv to see if it was even possible: deleting the dynamic loader altogether by linking the whole program into one static binary.\n\nThe umbrella tracking issue for the general problem is\n[NixOS/nixpkgs#481620](https://github.com/NixOS/nixpkgs/issues/481620).\n\n## Why Nix makes the loader work so hard\n\nOn a traditional distribution every shared library lives in a handful of global\ndirectories such as `/usr/lib`\n\n. The dynamic loader has a short, mostly cached\nsearch path, and `ld.so.cache`\n\n(built by `ldconfig`\n\n) turns soname lookups into a\nhash table hit.\n\nNix is different by design. Every package lives in its own\n`/nix/store/<hash>-name/lib`\n\ndirectory, and there is no global `ld.so.cache`\n\nfor\nstore libraries. To make a binary find its dependencies, Nix records a\n`DT_RUNPATH`\n\nin the ELF header that lists **one directory per dependency**. A\nprogram linked against fifty libraries gets a `DT_RUNPATH`\n\nwith dozens of\nentries.\n\nNow recall how glibc resolves a `DT_NEEDED`\n\nsoname with `DT_RUNPATH`\n\npresent: it\nwalks every `DT_RUNPATH`\n\ndirectory in order, trying to open `dir/soname`\n\nin\neach, until one succeeds. So resolving N libraries against a path of M\ndirectories costs on the order of N times M `openat()`\n\nattempts, almost all of\nwhich fail. That is the stat storm.\n\nIt gets worse. For every directory it searches, glibc first probes the\n`glibc-hwcaps`\n\nsubdirectories for your CPU (`x86-64-v3`\n\n, `x86-64-v2`\n\n, and so on),\nwhich adds roughly three more failing opens per directory on a modern machine.\nOn a fast SSD with a warm cache none of this is noticeable. On a slow disk, a\nnetwork filesystem, a cold cache, or a low power ARM board, it is the difference\nbetween snappy and sluggish, and it multiplies across every process a shell\nscript spawns.\n\nConcretely, the two workloads we traced most closely:\n\n| Workload | Loaded libraries | `DT_RUNPATH` dirs |\nFailing `.so` opens |\n|---|---|---|---|\n`devenv version` |\n83 | 12 (leaf binary) | ~486 |\n`imagemagick magick --version` |\n91 | 35 | ~1225 |\n\nThe wider a binary's own `DT_RUNPATH`\n\nand the deeper its transitive graph, the\nworse the storm.\n\n## What a good fix has to preserve\n\nThe reason this problem has stayed open so long is that the obvious fixes break things people rely on. Any serious solution is judged against a checklist:\n\nNixOS injects the GPU driver by putting`LD_LIBRARY_PATH`\n\noverride.`/run/opengl-driver/lib`\n\non`LD_LIBRARY_PATH`\n\n. If a fix stops that from winning, graphics break.Interposers and shims must still load first.`LD_PRELOAD`\n\n.**The libGL / glvnd runtime swap.** A program built against Mesa must be able to pick up the vendor driver at runtime.**Two libraries with the same soname.** This is the heart of the Nix model: different parts of one closure can legitimately depend on different builds of the same soname, and resolution must stay per object.Plugins loaded at runtime are a related but separate problem.`dlopen`\n\n.**Cross compilation.** A fix that has to run the target loader cannot cross compile cleanly.**Disk and closure size.** Whatever metadata you add ships in every NAR.**Maintenance burden.** A glibc or loader patch has to be rebased onto every new glibc release, and patching glibc rebuilds the world.\n\nNo approach so far ticks every box. The interesting part is how each one chooses which boxes to give up.\n\n## Approach 1: freeze the resolution with absolute paths\n\nThe simplest idea: rewrite every `DT_NEEDED`\n\nentry from a bare soname like\n`libfoo.so.1`\n\nto the absolute store path of the library it resolves to. glibc\nhas a \"slash short circuit\": a `DT_NEEDED`\n\ncontaining a `/`\n\nis opened directly,\nskipping all search. No search means no storm, and not even the `glibc-hwcaps`\n\nprobes happen.\n\nThis is well trodden ground:\n\n[Farid Zakaria](https://github.com/fzakaria)'s**shrinkwrap** and the**nix-harden-needed** tool do exactly this as external post processing. Shrinkwrap is described in the paper(Zakaria, Scogland, Gamblin, Maltzahn, 2022; arXiv:2211.05118), which measures the storm directly: an Emacs launch drops from 1823*Mapping Out the HPC Dependency Chaos*`stat`\n\n/`openat`\n\nsyscalls to 104, a 36 times speedup, and a 900 library MPI application starting across 2048 processes on NFS goes from 344.6s to 47.8s, 7.2 times faster. Those NFS numbers are the clearest evidence that this overhead, invisible on a warm local cache, becomes brutal on a network or cold filesystem.- patchelf\n[PR #357](https://github.com/NixOS/patchelf/pull/357)(`--shrink-wrap`\n\n, open since 2021) pulls all transitive`DT_NEEDED`\n\nup onto the top binary and rewrites them to absolute paths. - Spack has a similar\n`bind`\n\nfeature in the HPC world. - Inside nixpkgs, this mechanism is already used ad hoc in dozens of packages.\n\nThe cost is steep on the checklist. Absolute paths **lose the LD_LIBRARY_PATH\noverride**, so the glvnd driver swap stops working, and you need an exemption\nlist for libc, the loader itself, the GL stack, and initrd. There is also no\nruntime fallback: if the pinned path is gone, the program does not start.\n\nThere is also a build time fork in the road here. To rewrite a soname to an\nabsolute path you first have to resolve it, and there are two ways to do that:\n**run the binary's own loader** and record what glibc actually picks, or walk\n`DT_RUNPATH`\n\nstatically and resolve it yourself. The first is exact but executes\ntarget code, so it cannot cross compile; the second cross compiles cleanly. The\nabsolute path tooling only ever did the first, which is why it stays a manual,\nper package tool rather than a default. The static walk is the same technique\nthe ELF note cache (approach 3) later builds on.\n\nSo absolute paths are the zero disk, maximum speed option, attractive for self contained leaf applications, but wrong as a default because of the override semantics.\n\n## Approach 2: the RUNPATH symlink farm\n\nIf the problem is that the loader searches many directories, give it one. The\nfarm idea, floated early on by [Linus Heckemann](https://github.com/lheckemann)\n(see [#24844](https://github.com/NixOS/nixpkgs/pull/24844)), is: for each ELF,\ncreate\na single directory of symlinks pointing at exactly the libraries that ELF needs,\nand set its `DT_RUNPATH`\n\nto that one directory.\n\nThe crucial detail is that the sonames in `DT_NEEDED`\n\nstay short. The farm only\nchanges where they are found, not how. Because the farm lives in `DT_RUNPATH`\n\n,\nwhich the loader consults after `LD_LIBRARY_PATH`\n\n, every override keeps working.\nAnd it builds with nothing but stock `patchelf --set-rpath`\n\nand symlinks, with\nno glibc or patchelf fork, and never executes the target binary, so it cross\ncompiles.\n\nBut keeping the sonames short is also where it breaks the Nix model. A farm\ndirectory is a flat namespace keyed by soname, so it can hold exactly one\n`libfoo.so.1`\n\n. When a closure legitimately pulls two different builds of the\nsame soname (the case Nix exists to allow), the farm cannot represent both, and\nglibc's soname based dedup collapses them to whichever loads first. Absolute\npaths (approach 1) sidestep this because the store path becomes the key; the\nfarm, which deliberately keeps the bare soname, cannot.\n\nThe remaining costs are store pollution and the hwcaps floor. Every ELF gains\nits own extra directory of symlinks, so the store fills up with farm directories\nthat shadow the real libraries. And the farm collapses the per directory\nmultiplier but **not** the per hwcaps multiplier: the loader still probes\n`glibc-hwcaps`\n\ninside the one farm directory. So it is a large constant factor\nwin, not an asymptotic one.\n\nHow large depends entirely on how much of the graph you farm:\n\n| Farmed scope | Failing opens | Reduction |\n|---|---|---|\n`imagemagick` , binary only (wide 35 dir `DT_RUNPATH` ) |\n1225 → ~213 | 83% |\n`devenv` , leaf binary only (narrow 12 dir `DT_RUNPATH` ) |\n486 → 392 | 19% |\n`devenv` , whole graph (every dep built with the hook) |\n486 → 88 | 82% |\n\nThe two devenv rows are the lesson. Farming the leaf alone barely moves the\nneedle because the storm there is dominated by the 83 libraries resolving *each\nother*, which a leaf only farm never touches. Only whole graph adoption reaches\n82%, and the residual 88 are irreducible hwcaps probes rather than real library\nsearches. So the farm pays off immediately when a package's own binary has a\nwide `DT_RUNPATH`\n\n, but needs whole graph adoption for closure heavy\napplications.\n\n## Approach 3: a per DSO resolution cache in an ELF note\n\nThis is the most ambitious approach and, on the checklist, the best. The idea,\ndesigned by [pennae](https://github.com/pennae) in\n[#207893](https://github.com/NixOS/nixpkgs/pull/207893): have `patchelf`\n\nwrite a\nsmall `PT_NOTE`\n\ninto each library that records, for each `DT_NEEDED`\n\nsoname,\nwhere the loader should find it. A patched glibc reads that note during loading,\nbetween the `LD_LIBRARY_PATH`\n\nstep and the `DT_RUNPATH`\n\nwalk, and resolves the\ndependency straight from it.\n\nPlacing the read after `LD_LIBRARY_PATH`\n\nis what makes it safe: overrides,\n`LD_PRELOAD`\n\n, and the glvnd swap all keep winning, and soname based dedup is\nunchanged because the sonames stay short. Each cache entry is either an exact\npath, which is opened directly with no search and therefore no hwcaps probing,\nor a directory hint for the rare cases that cannot be resolved at build time\n(`$ORIGIN`\n\nrelative entries, or directories that themselves contain a\n`glibc-hwcaps`\n\ntree).\n\nThis is the only approach that preserves every semantic, adds zero closure\nreferences, and eliminates the hwcaps floor as well. pennae's original benchmark\nshowed an armv7 workload dropping from 44s to 29s (seconds, not ms, measured\nunder `strace -cf`\n\n) with about 24000 fewer syscalls. In our own end to end test of a revived, cleaned up version, a note\nbearing binary resolved its dependency with **zero** failing search probes,\nversus the full storm for the same binary without the note, while the\n`LD_LIBRARY_PATH`\n\noverride still took precedence.\n\nThe price is the heaviest of any approach. It needs **two** source changes: a\nglibc patch so the loader understands the note, and a patchelf change to write\nit. It is a staging mass rebuild, because patching glibc rebuilds the world.\npennae's draft was closed for lack of a go or no go decision rather than any\ntechnical failure; the main worry raised was the long term maintenance of a\nglibc patch.\n\n## Approach 4: a Guix style per package ld.so.cache\n\nGuix solves the same problem in production by shipping a per package\n`ld.so.cache`\n\n, the same binary format `ldconfig`\n\nproduces, and having a patched\nloader consult it (written up in their\n[ Taming the 'stat' storm with a loader cache](https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with-a-loader-cache/);\n\n[#207061](https://github.com/NixOS/nixpkgs/issues/207061)proposed it for nixpkgs). It preserves\n\n`LD_LIBRARY_PATH`\n\nand is proven at scale,\nbut building the cache needs `ldconfig`\n\n/`ldd`\n\nfor the target architecture, which\nbreaks cross compilation, and it hits `buildEnv`\n\ncollisions and `dlmopen`\n\nnamespace issues. The ELF note (approach 3) was in part a response: it reads\n`DT_NEEDED`\n\nand `DT_RUNPATH`\n\nstatically and never runs a foreign binary, so it\nkeeps the same `LD_LIBRARY_PATH`\n\nguarantee without those costs.## Approach 5: delete the loader with static linking\n\nThe four approaches above all make the loader's job easier. Static linking\nremoves the loader instead. For devenv, a self contained CLI, we spiked it:\nbuilding the whole closure through `pkgsStatic`\n\n(which means musl, since glibc\ndoesn't support a complete static link) drops `devenv version`\n\nand\n`hook-should-activate`\n\nfrom about 70ms to about 16ms.\n\n| Build | Loaded libraries | startup |\n|---|---|---|\n| Baseline (all dynamic, glibc) | 83 | ~70ms |\n| Fully static (musl) | 0 | ~16ms |\n\nThis is not a nixpkgs fix and was never meant to be. Deleting the loader also deletes everything the loader does at runtime: loading plugins on demand, honouring driver and interposer overrides, swapping in the GPU vendor's GL stack. A lot of nixpkgs depends on that, so static linking can never be a general default. It works for devenv only because devenv is a self contained CLI that talks to Nix through its own linked in C API and needs none of it.\n\nOne thing surprised us: at 16ms, with the loader gone, devenv is still far above\nthe ~2ms a static musl hello world starts in, the rest being `execve`\n\nmapping\nthe image and devenv's own startup work. Even so, 16ms is fast enough for the\nshell hook to drop its per directory activation cache and just run the check\nevery prompt.\n\n## What about macOS?\n\nmacOS uses a different loader, `dyld`\n\n, and the storm isn't there. Nix on Darwin\nalready ships approach 1: every Mach-O records its dependencies as absolute\nstore paths in `LC_LOAD_DYLIB`\n\nrather than bare sonames, and carries no\n`LC_RPATH`\n\n. So `dyld`\n\nopens each library directly on the first path it tries,\nand system frameworks come straight from the in memory dyld shared cache without\ntouching disk. Where the glibc `devenv`\n\nmade ~486 failing opens, the macOS one\nmakes essentially none.\n\nThe startup cost macOS does have is specific to Nix. To decide whether to\nadvertise `x86_64-darwin`\n\nas an extra platform, libstore forked a child running\n`arch -arch x86_64 /usr/bin/true`\n\non startup, costing ~13ms on every Nix process\non Apple silicon. The fix answers the same question with a `stat`\n\nof Rosetta 2's\nfixed install path in ~0.01ms\n([NixOS/nix#16067](https://github.com/NixOS/nix/pull/16067)).\n\n## Side by side\n\nEvery column is framed as a property you *want*, so ✅ is always good and ❌\nalways a cost. Legend: ✅ yes · ⚠️ with caveats · ❌ no · ➖ not applicable.\nCaveats marked ⚠️ or worth a word are footnoted below.\n\n| Approach | No glibc fork | No patchelf change | Cheap on disk | Keeps `LD_LIBRARY_PATH` / glvnd |\nKeeps dup sonames | Kills hwcaps floor | Cross safe |\n|---|---|---|---|---|---|---|---|\nAbsolute `DT_NEEDED` |\n✅ | ✅ a |\n✅ | ❌ | ⚠️ b |\n✅ | ⚠️ c |\n| RUNPATH symlink farm | ✅ | ✅ d |\n❌ e |\n✅ | ❌ | ❌ | ✅ |\n| Per DSO ELF note | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |\n| Per package ld.so.cache | ❌ | ✅ | ✅ | ✅ | ⚠️ f |\n✅ | ❌ |\n| Static linking (musl) | ✅ g |\n✅ | ❌ h |\n❌ | ➖ | ✅ | ✅ i |\n\na stock `patchelf --replace-needed`\n\n·\nb breaks on the rare duplicate soname ·\nc only the static-resolution variant cross compiles, and it is unbuilt ·\nd stock `patchelf --set-rpath`\n\n·\ne every ELF gains its own symlink directory in the store ·\nf `buildEnv`\n\ncollisions ·\ng uses musl, not glibc, so no glibc fork to maintain ·\nh ~82MB binary ·\ni via `pkgsStatic`\n\n## The final approach\n\nOver the week at [Tacosprint](https://tacosprint.org) we revived the **ELF note cache**, cleaned it up,\nand got it built and tested end to end. After a decade in limbo it now works:\nthe note writer, `patchelf --build-resolution-cache`\n\n([#647](https://github.com/NixOS/patchelf/pull/647)), shipped in\n[patchelf 0.19.0](https://github.com/NixOS/patchelf/releases/tag/0.19.0), the\nfirst patchelf release since 0.18.0 in April 2023.\n\nThe last thing to land is\n[nixpkgs#535735](https://github.com/NixOS/nixpkgs/pull/535735), which turns the note on\nacross the whole package set. Because it patches glibc it has to go through\n`staging`\n\n, which rebuilds the world, so every binary in nixpkgs comes out the\nother side resolving its libraries straight from the note. That is also where it\ngets exercised at scale, and we're committed to fixing whatever shakes out as we\ngo.\n\nOnce it has proven itself there, the longer term goal is to upstream the loader patch into glibc itself, so the fix isn't a nixpkgs carry but something every store based, nix style package manager, guix included, can rely on.", "url": "https://wpnews.pro/news/making-devenv-start-fast-and-the-whole-nixpkgs-with-it-devenv", "canonical_source": "https://devenv.sh/blog/2026/06/26/making-devenv-start-fast-and-the-whole-nixpkgs-with-it/", "published_at": "2026-06-26 18:27:04+00:00", "updated_at": "2026-06-26 19:06:33.541232+00:00", "lang": "en", "topics": ["developer-tools", "machine-learning"], "entities": ["Farid Zakaria", "Tacosprint", "NixOS", "nixpkgs", "devenv", "glibc", "imagemagick"], "alternates": {"html": "https://wpnews.pro/news/making-devenv-start-fast-and-the-whole-nixpkgs-with-it-devenv", "markdown": "https://wpnews.pro/news/making-devenv-start-fast-and-the-whole-nixpkgs-with-it-devenv.md", "text": "https://wpnews.pro/news/making-devenv-start-fast-and-the-whole-nixpkgs-with-it-devenv.txt", "jsonld": "https://wpnews.pro/news/making-devenv-start-fast-and-the-whole-nixpkgs-with-it-devenv.jsonld"}}