{"slug": "behind-the-scenes-of-bun-install", "title": "Behind The Scenes of Bun Install", "summary": "The article explains that Bun's package installation is significantly faster than other Node.js package managers because it treats installation as a systems programming problem rather than a JavaScript problem, optimizing for modern hardware by minimizing system calls. It contrasts this approach with traditional package managers that inherited Node.js's architecture from 2009, which optimized for slow I/O operations that are no longer the primary bottleneck. The piece details how system calls are expensive due to CPU mode switching, and argues that in 2025, minimizing these calls is more critical than optimizing for I/O latency.", "body_md": "Running `bun install`\n\nis fast, very fast. On average, it runs ~7× faster than npm, ~4× faster than pnpm, and ~17× faster than yarn. The difference is especially noticeable in large codebases. What used to take minutes now takes (milli)seconds.\n\nThese aren't just cherry-picked benchmarks. Bun is fast because it **treats package installation as a systems programming problem**, not a JavaScript problem.\n\nIn this post we’ll explore what that means: from minimizing syscalls and caching manifests as binary, to optimizing tarball extraction, leveraging OS-native file copying, and scaling across CPU cores.\n\nBut to understand why this matters, we first have to take a small step back in time.\n\nIt's the year 2009. You're installing jQuery from a `.zip`\n\nfile, your iPhone 3GS has 256MB of RAM. GitHub was just a year old, SSDs cost $700 for 256GB. Your laptop's 5400RPM hard drive maxes out at 100MB/s, and \"broadband\" means 10 Mbps (if you're lucky).\n\nBut more importantly: Node.js just launched! [Ryan Dahl is on stage](https://www.youtube.com/watch?v=EeYvFl7li9E) explaining why servers spend most of their time waiting.\n\nIn 2009, a typical disk seek takes 10ms, a database query 50–200ms, and an HTTP request to an external API 300ms+. During each of these transactions, traditional servers would just... wait. Your server would start reading a file, and then just freeze for 10ms.\n\nNow multiply that by thousands of concurrent connections each doing multiple I/O operations. Servers spent ~95% of their time waiting for I/O operations.\n\nNode.js figured that JavaScript's event loop (originally designed for browser events) was perfect for server I/O. When code makes an async request, the I/O happens in the background while the main thread immediately moves to the next task. Once complete, a callback gets queued for execution.\n\nJavaScript's event loop was a great solution for a world where waiting for data was the primary bottleneck.\n\nFor the next 15 years, Node's architecture shaped how we built tools. Package managers inherited Node's thread pool, event loop, async patterns; optimizations that made sense when disk seeks took 10ms.\n\nBut hardware evolved. It's not 2009 anymore, we're 16 years into the future, as hard as that is to believe. The M4 Max MacBook I'm using to write this would've ranked among the 50 fastest supercomputers on Earth in 2009. Today's NVMe drives push 7,000 MB/s, 70× faster than what Node.js was designed for! The slow mechanical drives are gone, internet speeds stream 4K video, and even low-end smartphones have more RAM than high-end servers had in 2009.\n\nYet today's package managers still optimize for the last decade's problems. In 2025, the real bottleneck isn't I/O anymore. **It's system calls**.\n\n[The Problem with System Calls](#the-problem-with-system-calls)\n\nEvery time your program wants the operating system to do something (read a file, open a network connection, allocate memory), it makes a system call. Each time you make a system call, the CPU has to perform a *mode switch*.\n\nYour CPU can run programs in two modes:\n\nwhere your application code runs. Programs in`user mode`\n\n,`user mode`\n\ncannot directly access your device's hardware, physical memory addresses, etc. This isolation prevents programs from interfering with each other or crashing the system.where the operating system's kernel runs. The kernel is the core component of the OS that manages resources like scheduling processes to use the CPU, handling memory, and hardware like disks or network devices. Only the kernel and device drivers operate in`kernel mode`\n\n,`kernel mode`\n\n!\n\nWhen you want to open a file, (e.g. `fs.readFile()`\n\n) in your program, the CPU running in `user mode`\n\ncannot directly read from disk. It first has to switch to `kernel mode`\n\n.\n\nDuring this mode switch, the CPU stops executing your program → saves all its state → switches into kernel mode → performs the operation → then switches back to user mode.\n\nHowever, this mode switching is expensive! Just this switch alone costs 1000-1500 CPU cycles in pure overhead, before any actual work happens.\n\nYour CPU operates on a clock that ticks billions of times per second. A 3GHz processor completes 3 billion cycles per second. During each cycle the CPU can execute instructions: add numbers, move data, make comparisons, etc. Each cycle takes 0.33ns.\n\nOn a 3GHz processor, 1000-1500 cycles is about 500 nanoseconds. This might sound negligibly fast, but modern SSDs can handle over 1 million operations per second. If each operation requires a system call, you're burning 1.5 billion cycles per second just on mode switching.\n\nPackage installation makes thousands of these system calls. Installing React and its dependencies might trigger 50,000+ system calls: that's seconds of CPU time lost to mode switching alone! Not even reading files or installing packages, just switching between user and kernel mode.\n\nThis is why Bun treats package installation as a **systems programming problem**. Fast install speeds come from **minimizing system calls** and **leveraging every OS-specific optimization available**.\n\nYou can see the difference when we trace the actual system calls made by each package manager:\n\n```\nBenchmark 1: strace -c -f npm install\n    Time (mean ± σ):  37.245 s ±  2.134 s [User: 8.432 s, System: 4.821 s]\n    Range (min … max):   34.891 s … 41.203 s    10 runs\n\n    System calls: 996,978 total (108,775 errors)\n    Top syscalls: futex (663,158),  write (109,412), epoll_pwait (54,496)\n\n  Benchmark 2: strace -c -f bun install\n    Time (mean ± σ):      5.612 s ±  0.287 s [User: 2.134 s, System: 1.892 s]\n    Range (min … max):    5.238 s …  6.102 s    10 runs\n\n    System calls: 165,743 total (3,131 errors)\n    Top syscalls: openat(45,348), futex (762), epoll_pwait2 (298)\n\n  Benchmark 3: strace -c -f yarn install\n    Time (mean ± σ):     94.156 s ±  3.821 s    [User: 12.734 s, System: 7.234 s]\n    Range (min … max):   89.432 s … 98.912 s    10 runs\n\n    System calls: 4,046,507 total (420,131 errors)\n    Top syscalls: futex (2,499,660), epoll_pwait (326,351), write (287,543)\n\n  Benchmark 4: strace -c -f pnpm install\n    Time (mean ± σ):     24.521 s ±  1.287 s    [User: 5.821 s, System: 3.912 s]\n    Range (min … max):   22.834 s … 26.743 s    10 runs\n\n    System calls: 456,930 total (32,351 errors)\n    Top syscalls: futex (116,577), openat(89,234), epoll_pwait (12,705)\n\n  Summary\n    'strace -c -f bun install' ran\n      4.37 ± 0.28 times faster than 'strace -c -f pnpm install'\n      6.64 ± 0.51 times faster than 'strace -c -f npm install'\n     16.78 ± 1.12 times faster than 'strace -c -f yarn install'\n\n  System Call Efficiency:\n    - bun:  165,743 syscalls (29.5k syscalls/s)\n    - pnpm: 456,930 syscalls (18.6k syscalls/s)\n    - npm:  996,978 syscalls (26.8k syscalls/s)\n    - yarn: 4,046,507 syscalls (43.0k syscalls/s)\n```\n\nWe can see that Bun installs much faster, but it also makes far fewer system calls. For a simple install, yarn makes over 4 million system calls, npm almost 1 million, pnpm close to 500k, and bun 165k.\n\nAt 1000-1500 cycles per call, yarn's 4 million system calls means it's spending billions of CPU cycles just on mode switching. On a 3GHz processor, that's seconds of pure overhead!\n\nAnd it's not just the *amount* of system calls. Look at those `futex`\n\ncalls! Bun made 762 `futex`\n\ncalls (only 0.46% of total system calls), whereas npm made 663,158 (66.51%), yarn made 2,499,660 (61.76%), and pnpm made 116,577 (25.51%).\n\n`futex`\n\n(fast userspace mutex) is a Linux system call used for thread synchronization. Threads are smaller units of a program that run simultaneously that often share access to memory or resources, so they must coordinate to avoid conflicts.\n\nMost of the time, threads coordinate using fast atomic CPU instructions in `user mode`\n\n. There's no need to switch to `kernel mode`\n\n, so it's very efficient!\n\nBut if a thread tries to acquire a lock that's already taken, it makes a `futex`\n\nsyscall to ask the kernel to put it to sleep until the lock becomes available. A high number of `futex`\n\ncalls is an indicator that many threads are waiting on one another, causing delays.\n\nSo what's Bun doing differently here?\n\n[Eliminating JavaScript overhead](#eliminating-javascript-overhead)\n\nnpm, pnpm and yarn are all written in Node.js. In Node.js, system calls aren’t made directly: when you call `fs.readFile()`\n\n, you’re actually going through several layers before reaching the OS.\n\nNode.js uses [libuv](https://libuv.org/), a C library that abstracts platform differences and manages async I/O through a thread pool.\n\nThe result is that when Node.js has to read a single file, it triggers a pretty complex pipeline. For a simple `fs.readFile('package.json', ...)`\n\n:\n\n- JavaScript validates arguments and converts strings from UTF-16 to UTF-8 for libuv's C APIs. This briefly blocks the main thread before any I/O even starts.\n- libuv queues the request for one of 4 worker threads. If all threads are busy, your request waits.\n- A worker thread picks up the request, opens the file descriptor, and makes the actual\n`read()`\n\nsystem call. - The kernel switches to\n`kernel mode`\n\n, fetches the data from disk, and returns it to the worker thread. - The worker pushes the file data back to the main thread through the event loop, which eventually schedules and runs your callback.\n\nEvery single `fs.readFile()`\n\ncall goes through this pipeline. Package installation involves reading *thousands* of `package.json`\n\nfiles: scanning directories, processing dependency metadata, and so on. Each time threads coordinate (e.g., when accessing the task queue or signaling back to the event loop), a `futex`\n\nsystem call can be used to manage locks or waits.\n\n**The overhead of making thousands of these system calls can take longer than the actual data movement itself!**\n\nBun does it differently. **Bun is written in Zig**, a programming language that compiles to native code with direct system call access:\n\n``` js\n// Direct system call, no JavaScript overhead\nvar file = bun.sys.File.from(try bun.sys.openatA(\n    bun.FD.cwd(),\n    abs,\n    bun.O.RDONLY,\n    0,\n).unwrap());\n```\n\nWhen Bun reads a file:\n\n- Zig code directly invokes the system call (e.g.,\n`openat()`\n\n) - The kernel immediately executes the system call and returns data\n\nThat's it. There's no JavaScript engine, thread pools, event loops or marshaling between different runtime layers. Just native code making direct system calls to the kernel.\n\nThe performance difference speaks for itself:\n\n| Runtime | Version | Files/Second | Performance |\n|---|---|---|---|\nBun | v1.2.20 | 146,057 | |\n| Node.js | v24.5.0 | 66,576 | 2.2x slower |\n| Node.js | v22.18.0 | 64,631 | 2.3x slower |\n\nIn this benchmark, Bun processes 146,057 `package.json`\n\nfiles per second, while Node.js v24.5.0 manages 66,576 and v22.18.0 handles 64,631. That's over 2x faster!\n\nBun's 0.019ms per file represents the actual I/O cost, so how long it takes to read data when you make direct system calls without any runtime overhead. Node.js takes 0.065ms for the same operation. Package managers written in Node.js are \"stuck\" with Node's abstractions; they use the thread pool whether they need it or not. But they pay this cost on every file operation.\n\nBun's package manager is more like a native application that happens to understand JavaScript packages, not a JavaScript application trying to do systems programming.\n\nEven though Bun isn't written in Node.js, you can use `bun install`\n\nin any Node.js project without switching runtimes. Bun's package manager respects your existing Node.js setup and tooling, you just get faster installs!\n\nBut at this point we haven't even started installing packages yet. Let's see the optimizations Bun applies to the actual installation.\n\nWhen you type `bun install`\n\n, Bun first figures out what you're asking it to do. It reads any flags you've passed, and finds your `package.json`\n\nto read your dependencies.\n\n[Async DNS Resolution](#async-dns-resolution)\n\n⚠️ Note: This optimization is specific to macOS\n\nWorking with dependencies means working with network requests, and network requests require DNS resolution to convert domain names like `registry.npmjs.org`\n\ninto IP addresses.\n\nAs Bun is parsing the `package.json`\n\n, it already starts to prefetch the DNS lookups. This means network resolution begins even before dependency analysis is even complete.\n\nFor a Node.js-based package managers, one way to do it is by using `dns.lookup()`\n\n. While this looks async from JavaScript's perspective, it's actually implemented as a *blocking* `getaddrinfo()`\n\ncall under the hood, running on `libuv`\n\n's thread pool. It still blocks a thread, just not the main thread.\n\nAs a nice optimization, Bun takes a different approach on macOS by making it truly asynchronous at the system level. Bun uses Apple's \"hidden\" async DNS API (`getaddrinfo_async_start()`\n\n), which isn't part of the POSIX standard, but it allows bun to make DNS requests that run completely asynchronously using [mach ports](https://docs.darlinghq.org/internals/macos-specifics/mach-ports.html), Apple's inter-process communication system.\n\nWhile DNS resolution happens in the background, Bun can continue processing other operations like file I/O, network requests, or dependency resolution without any thread blocking. By the time it needs to download React, the DNS lookup is already done.\n\nIt's a small optimization (and not benchmarked), but it shows Bun's attention to detail: optimize at every layer!\n\n[Binary Manifest Caching](#binary-manifest-caching)\n\nNow that Bun has established a connection to the npm registry, it needs the package manifests.\n\nA manifest is a JSON file containing all versions, dependencies, and metadata for each package. For popular packages like React with 100+ versions, these manifests can be several megabytes!\n\nA typical manifest can look something like this:\n\n```\n{\n  \"name\": \"lodash\",\n  \"versions\": {\n    \"4.17.20\": {\n      \"name\": \"lodash\",\n      \"version\": \"4.17.20\",\n      \"description\": \"Lodash modular utilities.\",\n      \"license\": \"MIT\",\n      \"repository\": {\n        \"type\": \"git\",\n        \"url\": \"git+https://github.com/lodash/lodash.git\"\n      },\n      \"homepage\": \"https://lodash.com/\"\n    },\n    \"4.17.21\": {\n      \"name\": \"lodash\",\n      \"version\": \"4.17.21\",\n      \"description\": \"Lodash modular utilities.\",\n      \"license\": \"MIT\",\n      \"repository\": {\n        \"type\": \"git\",\n        \"url\": \"git+https://github.com/lodash/lodash.git\"\n      },\n      \"homepage\": \"https://lodash.com/\"\n    }\n    // ... 100+ more versions, nearly identical\n  }\n}\n```\n\nMost package managers cache these manifests as JSON files in their cache directories. When you run `npm install`\n\nagain, instead of downloading the manifest, they read it from the cache.\n\nThat all makes sense, but the issue is that on every install (even if it's cached), they still need to parse the JSON file. This includes validating the syntax, building the object tree, managing garbage collection, and so on. A lot of parsing overhead.\n\nAnd it's not just the JSON parsing overhead. Looking at lodash: the string `\"Lodash modular utilities.\"`\n\nappears in every single version—that's 100+ times. `\"MIT\"`\n\nappears 100+ times. `\"git+https://github.com/lodash/lodash.git\"`\n\nis duplicated for every version, the URL `\"https://lodash.com/\"`\n\nappears in every version. Overall, lots of repeated strings.\n\nIn memory, JavaScript creates a separate string object for each string. This wastes memory and makes comparisons slower. Every time the package manager checks if two packages use the same version of postcss, it's comparing separate string objects rather than pointing to the same interned string.\n\n**Bun stores package manifests in a binary format.** When Bun downloads package information, it parses the JSON once and stores it as binary files (`.npm`\n\nfiles in `~/.bun/install/cache/`\n\n). These binary files contain all the package information (versions, dependencies, checksums, etc.) stored at specific byte offsets.\n\nWhen Bun accesses the name `lodash`\n\n, it's just pointer arithmetic: `string_buffer + offset`\n\n. No allocations, no parsing, no object traversal, just reading bytes at a known location.\n\n```\n// Pseudocode\n\n// String buffer (all strings stored once)\nstring_buffer = \"lodash\\0MIT\\0Lodash modular utilities.\\0git+https://github.com/lodash/lodash.git\\0https://lodash.com/\\04.17.20\\04.17.21\\0...\"\n                 ^0     ^7   ^11                        ^37                                      ^79                   ^99      ^107\n\n// Version entries (fixed-size structs)\nversions = [\n  { name_offset: 0, name_len: 6, version_offset: 99, version_len: 7, desc_offset: 11, desc_len: 26, license_offset: 7, license_len: 3, ... },  // 4.17.20\n  { name_offset: 0, name_len: 6, version_offset: 107, version_len: 7, desc_offset: 11, desc_len: 26, license_offset: 7, license_len: 3, ... }, // 4.17.21\n  // ... 100+ more version structs\n]\n```\n\nTo check if packages need updating, Bun stores the responses's `ETag`\n\n, and sends `If-None-Match`\n\nheaders. When npm responds with `\"304 Not Modified\"`\n\n, Bun knows the cached data is fresh without parsing a single byte.\n\nLooking at the benchmarks:\n\n```\nBenchmark 1: bun install # fresh\n  Time (mean ± σ):     230.2 ms ± 685.5 ms    [User: 145.1 ms, System: 161.9 ms]\n  Range (min … max):     9.0 ms … 2181.0 ms    10 runs\n\nBenchmark 2: bun install # cached\n  Time (mean ± σ):       9.1 ms ±   0.3 ms    [User: 8.5 ms, System: 5.9 ms]\n  Range (min … max):     8.7 ms …  11.5 ms    10 runs\n\nBenchmark 3: npm install # fresh\n  Time (mean ± σ):      1.786 s ±  4.407 s    [User: 0.975 s, System: 0.484 s]\n  Range (min … max):    0.348 s … 14.328 s    10 runs\n\nBenchmark 4: npm install # cached\n  Time (mean ± σ):     363.1 ms ±  21.6 ms    [User: 276.3 ms, System: 63.0 ms]\n  Range (min … max):   344.7 ms … 412.0 ms    10 runs\n\nSummary\n  bun install # cached ran\n    25.30 ± 75.33 times faster than bun install # fresh\n    39.90 ± 2.37 times faster than npm install # cached\n   \t196.26 ± 484.29 times faster than npm install # fresh\n```\n\nHere you can see that a cached(!!) `npm install`\n\nis *slower* than a *fresh* Bun install. That's how much overhead JSON parsing the cached files can add (among other factors).\n\n[Optimized Tarball Extraction](#optimized-tarball-extraction)\n\nNow that Bun has fetched the package *manifests*, it needs to download and extract compressed *tarballs* from the npm registry.\n\nTarballs are compressed archive files (like `.zip`\n\nfiles) that contain all the actual source code and files for each package.\n\nMost package managers stream the tarball data as it arrives, and decompress as it streams in. When you extract a tarball that's streaming in, the typical pattern assumes the size is unknown, and looks something like this:\n\n``` js\nlet buffer = Buffer.alloc(64 * 1024); // Start with 64KB\nlet offset = 0;\n\nfunction onData(chunk) {\n  while (moreDataToCome) {\n    if (offset + chunk.length > buffer.length) {\n      // buffer full → allocate bigger one\n      const newBuffer = Buffer.alloc(buffer.length * 2);\n\n      // copy everything we’ve already written\n      buffer.copy(newBuffer, 0, 0, offset);\n\n      buffer = newBuffer;\n    }\n\n    // copy new chunk into buffer\n    chunk.copy(buffer, offset);\n    offset += chunk.length;\n  }\n\n  // ... decompress from buffer ...\n}\n```\n\nStart with a small buffer, and let it grow as more decompressed data arrives. When the buffer fills up, you allocate a larger buffer, copy all the existing data over, and continue.\n\nThis seems reasonable, but it creates a performance bottleneck: you end up copying the same data multiple times as the buffer repeatedly outgrows its current size.\n\nWhen we have a 1MB package:\n\n- Start with 64KB buffer\n- Fill up → Allocate 128KB → Copy 64KB over\n- Fill up → Allocate 256KB → Copy 128KB over\n- Fill up → Allocate 512KB → Copy 256KB over\n- Fill up → Allocate 1MB → Copy 512KB over\n\nYou just copied 960KB of data unnecessarily! And this happens for every single package. The memory allocator has to find contiguous space for each new buffer, while the old buffer stays allocated during the copy operation. For large packages, you might copy the same bytes 5-6 times.\n\nBun takes a different approach by **buffering the entire tarball before decompressing**. Instead of processing data as it arrives, Bun waits until the entire compressed file is downloaded into memory.\n\nNow you might think *\"Wait, aren't they just wasting RAM keeping everything in memory?\"* And for large packages like TypeScript (which can be 50MB compressed), you'd have a point.\n\nBut the vast majority of npm packages are tiny, most are under 1MB. For these common cases, buffering the whole thing eliminates all the repeated copying. Even for those larger packages, the temporary memory spike is usually fine on modern systems, and avoiding 5-6 buffer copies more than makes up for it.\n\nOnce Bun has the complete tarball in memory, it can read the last 4 bytes of the gzip format. These bytes are special since store the uncompressed size of the file! Instead of having to guess how large the uncompressed file will be, **Bun can pre-allocate memory to eliminate buffer resizing entirely:**\n\n```\n{\n  // Last 4 bytes of a gzip-compressed file are the uncompressed size.\n  if (tgz_bytes.len > 16) {\n    // If the file claims to be larger than 16 bytes and smaller than 64 MB, we'll preallocate the buffer.\n    // If it's larger than that, we'll do it incrementally. We want to avoid OOMing.\n    const last_4_bytes: u32 = @bitCast(tgz_bytes[tgz_bytes.len - 4 ..][0..4].*);\n    if (last_4_bytes > 16 and last_4_bytes < 64 * 1024 * 1024) {\n      // It's okay if this fails. We will just allocate as we go and that will error if we run out of memory.\n      esimated_output_size = last_4_bytes;\n      if (zlib_pool.data.list.capacity == 0) {\n          zlib_pool.data.list.ensureTotalCapacityPrecise(zlib_pool.data.allocator, last_4_bytes) catch {};\n      } else {\n          zlib_pool.data.ensureUnusedCapacity(last_4_bytes) catch {};\n      }\n    }\n  }\n}\n```\n\nThose 4 bytes tell Bun \"this gzip will decompress to exactly 1,048,576 bytes\", so it can pre-allocate exactly this amount of memory upfront. There's no repeated resizing or copying of data; just one memory allocation.\n\nTo do the actual decompression, Bun uses [ libdeflate](https://github.com/ebiggers/libdeflate). This is a high-performance lib that decompresses tarballs faster than the standard\n\n[used by most package managers. It's optimized specifically for modern CPUs with SIMD instructions.](https://zlib.net/manual.html)\n\n`zlib`\n\nOptimized tarball extraction would've been difficult to for package managers written in Node.js. You'd need to create a separate read stream, seek to the end, read 4 bytes, parse them, close the stream, then start over with your decompression. Node's APIs aren't designed for this pattern.\n\nIn Zig it's pretty straight-forward: you just seek to the end and read the last four bytes, that's it!\n\nNow that Bun has all the package data, it faces another challenge: how do you efficiently store and access thousands of (interdependent) packages?\n\n[Cache-Friendly Data Layout](#cache-friendly-data-layout)\n\nDealing with thousands of packages can be tricky. Each package has dependencies, which have their own dependencies, creating a pretty complex graph.\n\nDuring installation, package managers have to traverse this graph to check the package versions, resolve any conflicts, and determine which version to install. They also need to \"hoist\" dependencies by moving them to higher levels so multiple packages can share them.\n\nBut the way that this dependency graph is stored has a big impact on performance. Traditional package managers store dependencies like this:\n\n``` js\nconst packages = {\n  next: {\n    name: \"next\",\n    version: \"15.5.0\",\n    dependencies: {\n      \"@swc/helpers\": \"0.5.15\",\n      \"postcss\": \"8.4.31\",\n      \"styled-jsx\": \"5.1.6\",\n    },\n  },\n  postcss: {\n    name: \"postcss\",\n    version: \"8.4.31\",\n    dependencies: {\n      nanoid: \"^3.3.6\",\n      picocolors: \"^1.0.0\",\n    },\n  },\n};\n```\n\nThis looks clean as JavaScript code, but it's not ideal for modern CPU architectures.\n\nIn JavaScript, each object is stored on the heap. When accessing `packages[\"next\"]`\n\n, the CPU accesses a pointer that tells it where Next's data is located in memory. This data then contains yet another pointer to where its dependencies live, which in turn contains more pointers to the actual dependency strings.\n\nThe key issue is how JavaScript allocates objects in memory. When you create objects at different times, the JavaScript engine uses whatever memory is available at that moment:\n\n```\n// These objects are created at different moments during parsing\npackages[\"react\"] = { name: \"react\", ... }  \t  // Allocated at address 0x1000\npackages[\"next\"] = { name: \"next\", ... }     \t\t// Allocated at address 0x2000\npackages[\"postcss\"] = { name: \"postcss\", ... }  // Allocated at address 0x8000\n// ... hundreds more packages\n```\n\nThese addresses are basically just random. There is no locality guarantee - objects can just be scattered across RAM, even objects that are related to each other!\n\nThis random scattering matters because of how modern CPUs actually fetch data.\n\nModern CPUs are incredibly fast at processing data (billions of operations per second), but fetching data from RAM is slow. To bridge this gap, CPUs have multiple cache levels:\n\n- L1 cache, small storage, but extremely fast (~4 CPU cycles)\n- L2 cache, medium storage, a bit slower (~12 CPU cycles)\n- L3 cache: 8-32MB storage, requires ~40 CPU cycles\n- RAM: Lots of GB, requires ~300 cycles (slow!)\n\nVisualizing CPU cache speeds vs RAM. Cache optimization matters!\n\n— Ben Dicken (@BenjDicken)[pic.twitter.com/q2rkGqSUAG][Oct 18, 2024]\n\nThe \"issue\" is that caches work with *cache lines*. When you access memory, the CPU doesn't just load that one byte: it loads the entire 64-byte chunk in which that byte appears. It figures that if you need one byte, you'll probably need nearby bytes soon (this is called spatial locality).\n\nThis optimization works great for data that's stored sequentially, but it backfires when your data is scattered randomly across memory.\n\nWhen the CPU loads `packages[\"next\"]`\n\nat address `0x2000`\n\n, it actually loads all the bytes within that cache line. But the next package, `packages[\"postcss\"]`\n\n, is at address `0x8000`\n\n. This is a completely different cache line! The other 56 bytes the CPU loaded in the cache line are just completely wasted, they're just random memory from whatever happened to be allocated nearby; maybe garbage, maybe parts of unrelated objects.\n\nBut you paid the cost of loading 64 bytes but only used 8...\n\nBy the time it's accessed 512 different packages (32KB / 64 bytes), you've filled your entire L1 cache already. Now every new package access evicts a previously loaded cache line to make space. The package you just accessed will be evicted soon, and that dependency it needs to check in 10 microseconds is already gone. Cache hit rate drops, and every access becomes a ~300 cycle trip to RAM instead of a 4 cycle L1 hit, far from optimal.\n\nThe nested structure of objects creates whats called \"pointer chasing\", a common anti-pattern in system programming. The CPU can't predict where to load next because each pointer could point anywhere. It simply cannot know where `next.dependencies`\n\nlives until it finishes loading the `next`\n\nobject.\n\nWhen traversing Next's dependencies, the CPU has to perform multiple dependent memory loads:\n\n- Load\n`packages[\"next\"]`\n\npointer → Cache miss → RAM fetch (~300 cycles) - Follow that pointer to load\n`next.dependencies`\n\npointer → Another cache miss → RAM fetch (~300 cycles) - Follow that to find\n`\"postcss\"`\n\nin the hash table → Cache miss → RAM fetch (~300 cycles) - Follow that pointer to load the actual string data → Cache miss → RAM fetch (~300 cycles)\n\nWe can end up with many cache misses since we're working with hundreds of dependencies, all scattered across memory. Each cache line we load (64 bytes) might contain data for just one object. With all those objects spread across GBs of RAM, the working set easily exceeds the L1 cache (32KB), L2 (256KB) and even the L3 cache (8-32MB). By the time we need an object again, it's likely that it's been evicted from all cache levels.\n\nThat's ~1200 cycles (400ns on a 3GHz CPU) just to read one dependency name! For a project with 1000 packages averaging 5 dependencies each, that's 2ms of pure memory latency.\n\n**Bun uses Structure of Arrays**. Instead of each package storing its own dependency array, Bun keeps all dependencies in one big shared array, all package names in another shared array, and so on:\n\n```\n// ❌ Traditional Array of Structures (AoS) - lots of pointers\npackages = {\n  next: { dependencies: { \"@swc/helpers\": \"0.5.15\", \"postcss\": \"8.4.31\" } },\n};\n\n// ✅ Bun's Structure of Arrays (SoA) - cache friendly\npackages = [\n  {\n    name: { off: 0, len: 4 },\n    version: { off: 5, len: 6 },\n    deps: { off: 0, len: 2 },\n  }, // next\n];\n\ndependencies = [\n  { name: { off: 12, len: 13 }, version: { off: 26, len: 7 } }, // @swc/helpers@0.5.15\n  { name: { off: 34, len: 7 }, version: { off: 42, len: 6 } }, // postcss@8.4.31\n];\n\nstring_buffer = \"next\\015.5.0\\0@swc/helpers\\00.5.15\\0postcss\\08.4.31\\0\";\n```\n\nInstead of each package storing *pointers* to its own data scattered across memory, Bun just uses *large contiguous buffers*, including:\n\n`packages`\n\nstores lightweight structs that specify where to find this package's data using offsets`dependencies`\n\nstores the actual dependency relationships for all packages in one place`string_buffer`\n\nstores all text (names, versions, etc.) sequentially in one massive string`versions`\n\nstores all parsed semantic versions as compact structs\n\nNow, accessing Next's dependencies just becomes arithmetic:\n\n`packages[0]`\n\ntells us that Next's dependencies start at position`0`\n\nin the`dependencies`\n\narray, and there's 2 dependencies:`{ name_offset: 0, deps_offset: 0, deps_count: 2 }`\n\n- Go to\n`dependencies[1]`\n\nwhich tells us that postcss's name starts at position`34`\n\nin the string`string_buffer`\n\n, and version at position`42`\n\n:`{ name_offset: 34, version_offset: 42 }`\n\n- Go to position 34 in\n`string_buffer`\n\nand read`postcss`\n\n- Go to position 42 in\n`string_buffer`\n\nand read`\"8.4.31\"`\n\n- … and so on\n\nNow when you access `packages[0]`\n\n, the CPU doesn't just load those 8 bytes: it loads an entire 64-byte cache line. Since each package is 8 bytes, and 64 ÷ 8 = 8, you get `packages[0]`\n\nthrough `packages[7]`\n\nin a single memory fetch.\n\nSo when your code processes the `react`\n\ndependency (`packages[0]`\n\n, `packages[1]`\n\nthrough `packages[7]`\n\nare already sitting in your L1 cache, ready to be accessed with zero additional memory fetches. That's why sequential access is so fast: you're getting 8 packages just by accessing memory once.\n\nInstead of the many small, scattered allocations throughout memory that we saw in the previous example, we now have just ~6 large allocations in total, regardless of how many packages you have. This is completely different from the pointer-based approach, which required a separate memory fetch for each object.\n\n[Optimized Lockfile Format](#optimized-lockfile-format)\n\nBun also applies the Structure of Arrays approach to its `bun.lock`\n\nlockfile.\n\nWhen you run `bun install`\n\n, Bun has to parse the existing lockfile to determine what's already installed and what needs updating. Most package managers store lockfiles as nested JSON (npm) or YAML (pnpm, yarn). When npm parses `package-lock.json`\n\n, it's processing deeply nested objects:\n\n```\n{\n  \"dependencies\": {\n    \"next\": {\n      \"version\": \"15.5.0\",\n      \"requires\": {\n        \"@swc/helpers\": \"0.5.15\",\n        \"postcss\": \"8.4.31\"\n      }\n    },\n    \"postcss\": {\n      \"version\": \"8.4.31\",\n      \"requires\": {\n        \"nanoid\": \"^3.3.6\",\n        \"picocolors\": \"^1.0.0\"\n      }\n    }\n  }\n}\n```\n\nEach package becomes its own object with nested dependency objects. JSON parsers must allocate memory for every object, validate syntax, and build complex nested trees. For projects with thousands of dependencies, this creates the same pointer-chasing problem we saw earlier!\n\nBun applies the Structure of Arrays approach to its lockfile, in a human-readable format:\n\n```\n{\n  \"lockfileVersion\": 0,\n  \"packages\": {\n    \"next\": [\n      \"next@npm:15.5.0\",\n      { \"@swc/helpers\": \"0.5.15\", \"postcss\": \"8.4.31\" },\n      \"hash123\"\n    ],\n    \"postcss\": [\n      \"postcss@npm:8.4.31\",\n      { \"nanoid\": \"^3.3.6\", \"picocolors\": \"^1.0.0\" },\n      \"hash456\"\n    ]\n  }\n}\n```\n\nThis again deduplicates strings, and stores dependencies in a cache-friendly layout. They're stored following *dependency order* rather than alphabetically or in a nested hierarchy. This means that a parser can read memory more efficiently (sequentially), avoiding random jumps between objects.\n\nAnd not only that, Bun also pre-allocates memory based on the lockfile size. Just like with tarball extraction, this avoids the repeated resize-and-copy cycles that create performance bottlenecks during parsing.\n\nAs a sidenote: Bun originally used a binary lockfile format (`bun.lockb`\n\n) to avoid JSON parsing overhead entirely, but binary files are impossible to review in pull requests and can't be merged when conflicts happen.\n\n[File copying](#file-copying)\n\nAfter the packages are installed and cached in `~/.bun/install/cache/`\n\n, Bun must copy the files into `node_modules`\n\n. This is where we see most of Bun's performance impact!\n\nTraditional file copying traverses each directory and copies files individually. This requires multiple system calls per file:\n\n- opening the source file (\n`open()`\n\n) - creating and opening the destination file (\n`open()`\n\n) - repeatedly reading chunks from the source and writing them to the destination until complete (\n`read()`\n\n/`write()`\n\n) - finally, closing both files\n`close()`\n\n.\n\nEach of these steps requires that expensive mode switch between user mode and the kernel.\n\nFor a typical React app with thousands of package files, this generates **hundreds of thousands to millions of system calls!** This is exactly the systems programming problem we described earlier: the overhead of making all these system calls becomes more expensive than actually moving the data.\n\nBun uses different strategies depending on your operating system and filesystem, leveraging every OS-specific optimization available. Bun supports several file copying backends, each with different performance characteristics:\n\n[macOS](#macos)\n\nOn macOS, Bun uses Apple's native `clonefile()`\n\ncopy-on-write system call.\n\n`clonefile`\n\ncan **clone** **entire directory trees in a single system call**. This system call creates new directory and file metadata entries that reference the *same physical disk blocks* as the original files. Instead of writing new data to disk, the filesystem just creates new \"pointers\" to existing data.\n\n```\n// Traditional approach: millions of syscalls\nfor (each file) {\n  copy_file_traditionally(src, dst);  // 50+ syscalls per file\n}\n\n// Bun's approach: ONE syscall\nclonefile(\"/cache/react\", \"/node_modules/react\", 0);\n```\n\nSSD stores data in fixed-size *blocks*. When you normally copy a file (`copy()`\n\n), the filesystem allocates new blocks and writes duplicate data. With `clonefile`\n\n, both the original and \"copied\" file have metadata that points to the exact same physical blocks on your SSD.\n\nCopy-on-write means data is only duplicated when modified. This results in an `O(1)`\n\noperation vs. the `O(n)`\n\nof traditional copying.\n\nThe metadata of both files point to the same data blocks **until you modify one of them**.\n\nWhen you modify the contents of one of the files, the filesystem automatically allocates new blocks for the edited parts, and updates the file metadata to point to the new blocks.\n\nHowever, this rarely happens since `node_modules`\n\nfiles are typically read-only after installation; we don't actively modify modules from within our code.\n\nThis makes copy-on-write extremely efficient: multiple packages can share identical dependency files without using additional disk space.\n\n```\nBenchmark 1: bun install --backend=copyfile\n  Time (mean ± σ):      2.955 s ±  0.101 s    [User: 0.190 s, System: 1.991 s]\n  Range (min … max):    2.825 s …  3.107 s    10 runs\n\nBenchmark 2: bun install --backend=clonefile\n  Time (mean ± σ):      1.274 s ±  0.052 s    [User: 0.140 s, System: 0.257 s]\n  Range (min … max):    1.184 s …  1.362 s    10 runs\n\nSummary\n  bun install --backend=clonefile ran\n    2.32 ± 0.12 times faster than bun install --backend=copyfile\n```\n\nWhen `clonefile`\n\nfails (due to lack of filesystem support), Bun falls back to `clonefile_each_dir`\n\nfor per-directory cloning. If that also fails, Bun uses traditional `copyfile`\n\nas the final fallback.\n\n[Linux](#linux)\n\nLinux doesn't have `clonefile()`\n\n, but it has something even older and more powerful: hardlinks. Bun implements a fallback chain that tries increasingly less optimal approaches until one works:\n\n#### 1. Hardlinks\n\nOn Linux, Bun's default strategy is **hardlinks.** A hardlink doesn't create a new file at all, it only creates a new *name* for an existing file, and references this existing file.\n\n```\nlink(\"/cache/react/index.js\", \"/node_modules/react/index.js\");\n```\n\nTo understand hardlinks, you need to understand *inodes*. Every file on Linux has an inode, which is a data structure that contains all the file's metadata (permissions, timestamps, etc.). The filename is just a pointer to an inode:\n\nBoth paths point to the same inode. If you delete one path, the other remains. However, if you modify one, both see changes (because they're the same file!).\n\nThis results in great performance gains because **there's zero data movement**. Creating a hard link requires a single system call that completes in microseconds, regardless of whether you're linking a 1KB file or a 100MB bundle. Much more efficient than traditional copying, which has to read and write every single byte.\n\nThey're also extremely efficient for disk space, since there's only ever one copy of the actual data on disk, no matter how many packages reference the same dependency files\n\nHowever, hardlinks have limitations. They can't cross filesystem boundaries (e.g. your cache is in a different location than your `node_modules`\n\n), some filesystems don't support them, and certain file types or permission configurations can cause hardlink creation to fail.\n\nWhen hardlinks aren't possible, Bun has some fallbacks:\n\n#### 2. `ioctl_ficlone`\n\nIt starts with `ioctl_ficlone`\n\n, which enables copy-on-write on filesystems like Btrfs and XFS. This is very similar to `clonefile`\n\n's copy-on-write system in the way that it also creates a new file references that share the same disk data. Unlike hardlinks, these are separate files; they just happen to share storage until modified.\n\n#### 3. `copy_file_range`\n\nIf copy-on-write isn't available, Bun tries to at least keep the copying in kernel space and falls back to `copy_file_range`\n\n.\n\nIn a traditional copy, the kernel reads from disk into a kernel buffer, then copies that data to your program's buffer in user space. Later when you call `write()`\n\n, it copies it back to a kernel buffer before writing to disk. That's four memory operations and multiple context switches!\n\nWith `copy_file_range`\n\n, the kernel reads from disk into a kernel buffer and writes directly to disk. Just two operations and zero context switches for the data movement.\n\n#### 4. `sendfile`\n\nIf that's unavailable, Bun uses `sendfile`\n\n. This is a system call that was originally designed for network transfers, but it's also effective for copying data directly between two files on disk.\n\nThis command also keeps data in kernel space: the kernel reads data from one destination (a reference to an open file on disk, e.g. a source file in `~/.bun/install/cache/`\n\n) and writes it to another destination (like a destination file in `node_modules`\n\n), all within the kernel's memory space.\n\nThis process is called disk-to-disk copying, as it moves data between files stored on the same or different disks without touching your program's memory. It's an older API but more widely supported, making it a reliable fallback when newer system calls aren't available while still reducing the number of memory calls.\n\n#### 5. `copyfile`\n\nAs a last resort, Bun uses traditional file copying; the same approach most package managers use. This creates entirely separate copies of each file by reading data from the cache and writing it to the destination using a `read()`\n\n/`write()`\n\nloop. This uses multiple system calls, which is exactly what Bun is trying to minimize. It's the least efficient option, but it's universally compatible.\n\n```\nBenchmark 1: bun install --backend=copyfile\n  Time (mean ± σ):     325.0 ms ±   7.7 ms    [User: 38.4 ms, System: 295.0 ms]\n  Range (min … max):   314.2 ms … 340.0 ms    10 runs\n\nBenchmark 2: bun install --backend=hardlink\n  Time (mean ± σ):     109.4 ms ±   5.1 ms    [User: 32.0 ms, System: 86.8 ms]\n  Range (min … max):   102.8 ms … 119.0 ms    19 runs\n\nSummary\n  bun install --backend=hardlink ran\n    2.97 ± 0.16 times faster than bun install --backend=copyfile\n```\n\nThese file copying optimizations address the primary bottleneck: **system call overhead**. Instead of using a one-size-fits-all approach, Bun chooses the most efficient file copying specifically tailored to you.\n\n[Multi-Core Parallelism](#multi-core-parallelism)\n\nAll the above-mentioned optimizations are great, but they aim to reduce the workload for a single CPU core. However, modern laptops have 8, 16, even 24 CPU cores!\n\nNode.js has a thread pool, but all the actual work (e.g. figuring out which version of React works with which version of webpack, building the dependency graph, deciding what to install) happens on one thread and one CPU core. When npm runs on your M3 Max, one core works really hard while the other 15 are idle.\n\nA CPU *core* can independently execute instructions. Early computers had one core, they could only do one thing at a time, but modern CPUs pack multiple cores onto a single chip. A 16-core CPU can execute 16 different instruction streams simultaneously, not just switching between them really fast.\n\nThis is yet another fundamental bottleneck for traditional package managers: no matter how many cores you have, the package manager can only use one CPU core.\n\nBun takes a different approach with a lock-free, work-stealing thread pool architecture.\n\nWork-stealing means that idle threads can \"steal\" pending tasks from busy threads' queues. When a thread finishes its work, it checks its local queue, then the global queue, then steals from other threads. No thread sits idle when there's still work to do.\n\nInstead of being limited to JavaScript's event loop, Bun spawns native threads that can fully utilize every CPU core. The thread pool **automatically scales to match your device's CPU's core count**, allowing Bun to maximize parallelizing the I/O-heavy parts of the installation process. One thread can be extracting `next`\n\n's tarball, another is resolving `postcss`\n\ndependencies, a third applying patches to `webpack`\n\n, and so on.\n\nBut multi-threading often comes with synchronization overhead. Those hundreds of thousands of `futex`\n\ncalls npm made were just threads constantly waiting for each other. Each time a thread wants to add a task to a shared queue, it has to lock it first, blocking all other threads.\n\n```\n// Traditional approach: Locks\nmutex.lock();                   // Thread 1 gets exclusive access\nqueue.push(task);               // Only Thread 1 can work\nmutex.unlock();                 // Finally releases lock\n// Problem: Threads 2-8 blocked, waiting in line\n```\n\nBun uses lock-free data structures instead. These use special CPU instructions called atomic operations that allow threads to safely modify shared data without locks:\n\n```\npub fn push(self: *Queue, batch: Batch) void {\n  // Atomic compare-and-swap, happens instantly\n  _ = @cmpxchgStrong(usize, &self.state, state, new_state, .seq_cst, .seq_cst);\n}\n```\n\nIn an earlier benchmark we saw that Bun was able to process 146,057 `package.json`\n\nfiles/second versus Node.js's 66,576. That's the impact of using *all* cores instead of one.\n\nBun also runs network operations differently. Traditional package managers often block. When downloading a package, the CPU sits idle waiting for the network.\n\nBun maintains a pool of 64(!) concurrent HTTP connections (configurable via `BUN_CONFIG_MAX_HTTP_REQUESTS`\n\n) on dedicated network threads. The network thread runs independently with its own event loop, handling all downloads while CPU threads handle the extraction and processing. Neither waits for the other.\n\nBun also gives each thread **its own memory pool**. An issue with \"traditional\" multi-threading is that all threads compete for the same memory allocator. This creates contention: if 16 threads all need memory at once, they have to wait for each other.\n\n```\n// Traditional: all threads share one allocator\nThread 1: \"I need 1KB for package data\"    // Lock allocator\nThread 2: \"I need 2KB for JSON parsing\"    // Wait...\nThread 3: \"I need 512B for file paths\"     // Wait...\nThread 4: \"I need 4KB for extraction\"      // Wait...\n```\n\nBun instead gives each thread its own large chunk of pre-allocated memory that the thread manages independently. There's no sharing or waiting, each thread works with its own data whenever possible.\n\n```\n// Bun: each thread has its own allocator\nThread 1: Allocates from pool 1    // Instant\nThread 2: Allocates from pool 2    // Instant\nThread 3: Allocates from pool 3    // Instant\nThread 4: Allocates from pool 4    // Instant\n```\n\n[Conclusion](#conclusion)\n\nThe package managers we benchmarked weren't built wrong, they were solutions designed for the constraints of their time.\n\nnpm gave us a foundation to build on, yarn made managing workspaces less painful, and pnpm came up with a clever way to save space and speed things up with hardlinks. Each worked hard to solve the problems developers were actually hitting at the time.\n\nBut that world no longer exists. SSDs are 70× faster, CPUs have dozens of cores, and memory is cheap. The real bottleneck shifted from hardware speed to software abstractions.\n\nBuns approach wasn't revolutionary, it was just willing to look at what actually slows things down today. When SSDs can handle a million operations per second, why accept thread pool overhead? When you're reading the same package manifest for the hundredth time, why parse JSON again? When the filesystem supports copy-on-write, why duplicate gigabytes of data?\n\nThe tools that will define the next decade of developer productivity are being written right now, by teams who understand that performance bottlenecks shifted when storage got fast and memory got cheap. They're not just incrementally improving what exists; they're rethinking what's possible.\n\nInstalling packages 25x faster isn't \"magic\": it's what happens when tools are built for the hardware we actually have.\n\n→ Experience software built for 2025 at [bun.com](https://bun.com/)", "url": "https://wpnews.pro/news/behind-the-scenes-of-bun-install", "canonical_source": "https://bun.com/blog/behind-the-scenes-of-bun-install", "published_at": "2025-09-10 15:00:00+00:00", "updated_at": "2026-05-22 20:45:25.292996+00:00", "lang": "en", "topics": ["developer-tools", "open-source", "products"], "entities": ["Bun", "npm", "pnpm", "yarn", "Node.js", "Ryan Dahl", "jQuery", "iPhone 3GS"], "alternates": {"html": "https://wpnews.pro/news/behind-the-scenes-of-bun-install", "markdown": "https://wpnews.pro/news/behind-the-scenes-of-bun-install.md", "text": "https://wpnews.pro/news/behind-the-scenes-of-bun-install.txt", "jsonld": "https://wpnews.pro/news/behind-the-scenes-of-bun-install.jsonld"}}