{"slug": "starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c", "title": "Starving the Garbage Collector: A Pragmatic Guide to Zero-Allocation C#", "summary": "Developer Ian Cowley open-sourced a suite of high-performance, zero-dependency C# engines, including a native DataFrame library, a fast text searcher, and a semantic Markdown parser. The engines bypass the .NET Garbage Collector by using structs over classes and Span<T> for zero-allocation memory management, achieving speeds rivaling native C/C++.", "body_md": "Over the last few weeks, I’ve open-sourced a suite of high-performance, zero-dependency C# engines. This includes a native DataFrame library ([Glacier.Polaris](https://github.com/ian-cowley/Glacier.Polaris)), a blistering fast text searcher ([Glacier.Grep](https://github.com/ian-cowley/Glacier.Grep)), and a semantic Markdown parser for RAG contexts ([Glacier.DocTree](https://github.com/ian-cowley/Glacier.DocTree)). You can find the source code for all of these on my [GitHub](https://github.com/ian-cowley?tab=repositories).\n\nA recurring question I’m getting from other devs looking at these repositories is simple: *How exactly are you bypassing the Garbage Collector to get these speeds?*\n\nI’ve never hidden my distaste for heavy, magic-filled frameworks. Whether it's an unwieldy data access library or a bloated client-side framework, they all share a common flaw: they wrap your code in layers of hidden allocations that murder your CPU caches and force the .NET Garbage Collector (GC) into overdrive.\n\nWhen you want to build systems that process millions of rows a second or rival native C/C++ in raw compute speed, you have to take control of your memory. To give you a fighting chance at writing your own high-performance engines, let's break down how memory allocation actually works in C#, using the architecture of the `Glacier`\n\nrepositories as our guide.\n\nIn C#, every time you use the `new`\n\nkeyword on a `class`\n\n(a reference type), you are asking the runtime to find a contiguous block of free memory on the Managed Heap.\n\nThe heap is a messy place. When it gets full, the GC steps in. It pauses your application threads, traverses the object graph to see what you are still using, compacts the memory, and cleans up the garbage. For standard CRUD apps, this pause is negligible. For a DataFrame engine like `Glacier.Polaris`\n\nprocessing millions of rows, a GC pause is a catastrophic event. It's a heavy tax on your CPU cycles.\n\nThe alternative is the Stack. The stack is a tightly managed, incredibly fast area of memory assigned exclusively to the current thread. When you create a `struct`\n\n(a value type), it goes on the stack. When the method finishes, the stack unwinds, and the memory is instantly freed. No GC involved. Zero tax.\n\nBut dropping classes for structs isn't just about dodging the GC; it's about mechanical sympathy. Modern CPUs don't read bytes from RAM one at a time; they pull 64-byte \"cache lines.\" By using `struct`\n\nand explicitly packing your data via `[StructLayout(LayoutKind.Sequential)]`\n\n, you ensure that when the CPU grabs a cache line, it receives highly relevant, tightly packed data, drastically reducing cache misses.\n\n**The Golden Rule:** If you want to go fast in a tight loop, favor `struct`\n\nover `class`\n\n.\n\nValue types are great, but what about arrays and strings? Historically, if you wanted a subset of an array or a string, you called `.Substring()`\n\nor `.Skip().Take()`\n\n. These operations allocate new objects on the heap, copying the data over.\n\nIf you look at the source for `Glacier.DocTree`\n\nor `Glacier.Grep`\n\n, you'll notice we rarely allocate new strings when reading text. Instead, we use `Span<T>`\n\nand `ReadOnlySpan<T>`\n\n.\n\nA `Span<T>`\n\nis a `ref struct`\n\n. It is essentially a pointer to a block of memory and a length, meaning it *must* live on the stack. You can slice a massive buffer into smaller chunks, and it costs absolutely nothing. Zero allocations. Zero copying.\n\n```\n// The old, bloated way that triggers the GC\nstring line = \"Error: Connection Timeout\";\nstring message = line.Substring(7); // Allocates a new string on the heap\n\n// The Glacier way (Zero-Allocation)\nReadOnlySpan<char> lineSpan = \"Error: Connection Timeout\".AsSpan();\nReadOnlySpan<char> messageSpan = lineSpan.Slice(7); // Just a view into memory!\n```\n\nBecause `Span<T>`\n\nis tied to the stack, the compiler will stop you if you try to use it across an `await`\n\nboundary (like asynchronously reading a file stream). State machines generated by `async/await`\n\ncannot preserve stack-only references.\n\nTo bridge this gap, we use `Memory<T>`\n\n. `Memory<T>`\n\ncan safely live on the heap and travel through async pipelines. Once the I/O operation yields and you are ready to do synchronous, CPU-bound processing, you simply call `.Span`\n\non your `Memory<T>`\n\nand begin slicing at zero cost.\n\nThe stack is fast, but it's small (typically around 1MB). If you try to put a massive DataFrame column there, you will crash your app with a `StackOverflowException`\n\n.\n\nIn `Glacier.Polaris`\n\n, we are mapping primitive types directly to dense arrays to avoid the overhead of boxing. But allocating massive arrays in a tight loop with `new int[100000]`\n\nwill trigger a Gen 0 GC collection almost instantly.\n\nInstead of relying on standard arrays, Polaris uses custom allocators and structures like `MemoryOwnerColumn`\n\nand `ValidityMask`\n\n. This allows us to maintain C-like memory control while remaining safe within the .NET ecosystem. When we need temporary buffers, we rent them from `System.Buffers.ArrayPool<T>.Shared`\n\n:\n\n```\n// Rent an array of AT LEAST the requested size\nint[] buffer = ArrayPool<int>.Shared.Rent(100000);\ntry \n{\n    // Wrap it in a span for safe, fast access\n    Span<int> workSpan = buffer.AsSpan(0, 100000);\n    // Do heavy data processing...\n}\nfinally \n{\n    // Always return it! The GC never sees a new allocation.\n    ArrayPool<int>.Shared.Return(buffer);\n}\n```\n\nOnce your memory is flat, contiguous, and not bothering the Garbage Collector, you can unleash the CPU.\n\nIn `Glacier.Polaris`\n\n, the math isn't done row-by-row in a simple `for`\n\nloop. We process data in chunks using SIMD (Single Instruction, Multiple Data) CPU vector instructions.\n\nIn older .NET versions, this meant hardcoding explicit intrinsics and pinning arrays with `fixed`\n\n, which added slight overhead. Modern .NET abstracts this beautifully. We use `MemoryMarshal.GetReference`\n\nto grab a lightweight ref to our data without pinning it, and feed it into cross-platform `Vector256`\n\nlogic that works efficiently on both x64 and ARM64 processors.\n\n```\nusing System.Runtime.CompilerServices;\nusing System.Runtime.InteropServices;\nusing System.Runtime.Intrinsics;\n\npublic static int SimdSum(ReadOnlySpan<int> data)\n{\n    int sum = 0;\n    int i = 0;\n\n    // Grab a fast, unpinned reference to the underlying data\n    ref int current = ref MemoryMarshal.GetReference(data);\n\n    // Process 8 integers at a time (if hardware supports 256-bit vectors)\n    if (Vector256.IsHardwareAccelerated && data.Length >= Vector256<int>.Count)\n    {\n        Vector256<int> vSum = Vector256<int>.Zero;\n\n        for (; i <= data.Length - Vector256<int>.Count; i += Vector256<int>.Count)\n        {\n            // Load 8 contiguous integers directly into the CPU register\n            Vector256<int> vData = Vector256.LoadUnsafe(ref current, (nuint)i);\n\n            // Add them in parallel\n            vSum += vData; \n        }\n\n        // Horizontal add to collapse the vector lanes into a final scalar sum\n        sum += Vector256.Sum(vSum); \n    }\n\n    // Process any remaining elements normally\n    for (; i < data.Length; i++) \n    {\n        sum += Unsafe.Add(ref current, i);\n    }\n\n    return sum;\n}\n```\n\nThis is where the magic happens. We've bypassed the GC to keep our memory clean, extracted an unpinned reference, and fed it directly into the CPU's vector lanes.\n\nIt’s easy to talk about zero-allocation theory, but advanced developers deal in metrics. When you strip away the frameworks and embrace the mechanics outlined above, the results in `BenchmarkDotNet`\n\nlook like this:\n\n| Method | Mean | Allocated |\n|---|---|---|\n| Standard Substring | 18.45 ns | 32 B |\nGlacier Span Slice |\n0.02 ns |\n0 B |\n\nSeeing that `0 B`\n\nunder the allocated column is the entire point.\n\nGetting away from the Garbage Collector isn't about rewriting every line of business logic you have. It's about surgical precision. Identify the hot paths—the tight loops where data flows by the gigabyte—and strip away the abstractions.\n\nDrop the heavy frameworks. Stop calling `new`\n\nin a loop. Embrace `struct`\n\n, slice with `Span<T>`\n\n, use the `ArrayPool`\n\n, and build custom column allocators when the scale demands it. Take a look through the * Glacier* repositories to see these patterns in action. This is how you build engines that don't just participate in the .NET ecosystem, but actually push it to its absolute limits.", "url": "https://wpnews.pro/news/starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c", "canonical_source": "https://dev.to/iancowley/starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c-oj5", "published_at": "2026-06-14 10:06:29+00:00", "updated_at": "2026-06-14 10:40:50.522568+00:00", "lang": "en", "topics": ["developer-tools", "machine-learning", "natural-language-processing"], "entities": ["Ian Cowley", "Glacier.Polaris", "Glacier.Grep", "Glacier.DocTree", "GitHub", ".NET", "C#"], "alternates": {"html": "https://wpnews.pro/news/starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c", "markdown": "https://wpnews.pro/news/starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c.md", "text": "https://wpnews.pro/news/starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c.txt", "jsonld": "https://wpnews.pro/news/starving-the-garbage-collector-a-pragmatic-guide-to-zero-allocation-c.jsonld"}}