{"slug": "chicken-soup-for-the-grug-soul", "title": "Chicken Soup for the Grug Soul", "summary": "The article argues that C is a fundamentally fixable language, with its main flaws stemming from its outdated standard library rather than the language itself. The author proposes replacing null-terminated strings with a custom `sp_str_t` struct (containing a data pointer and length) and using a string builder API to modernize C programming. This approach enables safer and more efficient string operations like substring extraction, trimming, and comparison without the bugs and limitations of traditional C strings.", "body_md": "c is my daily driver\nAlmost every day, I write C. I am not an embedded developer. I exclusively use regular computers (a Linux desktop and a MacBook Air) and write mostly regular code (small tools, throwaway scripts, GPU experiments, video games).\nUnless I have a good reason otherwise, I do it exclusively in C. I legimately find it to be the best tool for most things I do.\nOut of the box, C is a (mostly) well designed language with perhaps the worst standard library and tooling of any commonly used language. It’s truly bad. But those things are not C; they’re just the environment. And even though some of C’s warts are baked in too deeply, the vast majority of them are not.\nIn other words, C is fixable. It is a beautiful, modern language resting primordial and soupish in an extremely thick cocoon. I’m going to show you how to replace things which we may not think of as replaceable to make C a good language for any program, and we’re going to start with strings.\nstring.h\nis a special kind of hell\nDespite all of C’s shortcomings and sins, the worst among them is the standard library. Fundamentally, computers haven’t changed that much from 1970, but programming languages have changed a great deal, and this sense the C standard library is truly irrelevant. And within this pasture of footguns and insecure-by-default and hidden mutable state, the most insidious beast of them all also roams: string.h\n.\nFundamentally, the issue with C strings is that they are null terminated. I won’t pretend to know why this decision was made; I’ll guarantee that whoever made it was highly intelligent and operating under constraints which would seem alien to us today. But at some point since then, the pendulum crossed some threshold, and then kept swinging for thirty or so more years, and has left us now with a Very Bad Thing.\nNull terminated strings mean you cannot:\n- Take a substring without allocating and copying\n- Know the length of a string without traversing it\n- Write lexers and parsers which return string views that work with your regular string API\n- Build strings that are valid at every step\nPlus, of course, the unfathomable number of bugs and security issues that arise from a missing null terminator. Step one to modernizing C is to completely ditch null terminated strings in favor of the humble sp_str_t\n.\nsp_str_t\nwill set you free\ntypedef struct {\nconst c8* data;\nu32 len;\n} sp_str_t;\nThe basics of sp_str_t\n:\n- They’re immutable1\n- You can trivially copy them or pass them around. In fact, you should almost never use\nsp_str_t*\n. - They can trivially wrap C strings with\n.data = cstr\nand.len = strlen(cstr)\nLook at how easy it is to manipulate strings once you have this building block:\nsp_str_sub()\nsp_str_t sp_str_sub(sp_str_t str, u32 index, u32 len) {\nsp_str_t substr = {\n.len = len,\n.data = str.data + index\n};\nSP_ASSERT(index + len <= str.len);\nreturn substr;\n}\nsp_str_trim()\nsp_str_t sp_str_trim_right(sp_str_t str) {\nwhile (str.len) {\nc8 c = sp_str_back(str);\nswitch (c) {\ncase ' ':\ncase '\\t':\ncase '\\r':\ncase '\\n': {\nstr.len--;\nbreak;\n}\ndefault: {\nreturn str;\n}\n}\n}\nreturn str;\n}\nsp_str_equal()\nbool sp_str_equal(sp_str_t a, sp_str_t b) {\nif (a.len != b.len) return false;\nreturn sp_os_is_memory_equal(a.data, b.data, a.len);\n}\nsp_str_builder_t\nis the secret sauce\nFor more complicated string operations, you should use a string builder API. Any time you need to join strings, pad strings, serialize data to strings, you should use the string builder. A string builder is nothing more than a buffer with a size and a capacity. Here’s the basic API:\nSP_API void sp_str_builder_append(sp_str_builder_t* builder, sp_str_t str);\nSP_API void sp_str_builder_append_c8(sp_str_builder_t* builder, c8 c);\nSP_API void sp_str_builder_append_fmt(sp_str_builder_t* builder, sp_str_t fmt, ...);\nSP_API void sp_str_builder_new_line(sp_str_builder_t* builder);\nSP_API sp_str_t sp_str_builder_write(sp_str_builder_t* builder);\nThen, you can implement lots of useful stuff very easily:\nsp_str_concat()\nsp_str_t sp_str_concat(sp_str_t a, sp_str_t b) {\nreturn sp_format(\"{}{}\", SP_FMT_STR(a), SP_FMT_STR(b));\n}\nsp_str_join()\nsp_str_t sp_str_join(sp_str_t a, sp_str_t b, sp_str_t join) {\nreturn sp_format(\"{}{}{}\", SP_FMT_STR(a), SP_FMT_STR(join), SP_FMT_STR(b));\n}\nsp_str_replace_c8()\nsp_str_t sp_str_replace_c8(sp_str_t str, c8 from, c8 to) {\nsp_str_builder_t builder = SP_ZERO_INITIALIZE();\nfor (u32 i = 0; i < str.len; i++) {\nc8 c = str.data[i];\nif (c == from) {\nsp_str_builder_append_c8(&builder, to);\n} else {\nsp_str_builder_append_c8(&builder, c);\n}\n}\nreturn sp_str_builder_write(&builder);\n}\nThe backbone of complex string building is format strings. I prefer to use my own style of format strings, sp_format\n2, which trades the compiler support of printf\n-style format strings for better syntax, support for colors, and easy custom formatters. But sprintf\nworks just fine.\nstrings should be immutable\nsp_str_builder_t\nis the only entry point in the library for mutable strings 2. Although that’s not quite the right way to think about it, because a string builder doesn’t keep a string; it keeps a mutable buffer which you manipulate and from which produce an immutable string. But that’s bullshit semantics.\nImmutability is something I took from functional programming. It solves a ton of problems. When I pass a string into a function, I never have to worry about the state of that string when it returns. For a datatype whose main purpose in programs is, roughly, to be tweaked, trimmed, stripped, joined, and generally fucked with, that’s very useful.\nwhat about allocations?\nThe problem with immutability is that it forces you to copy; to allocate memory. It’s pretty easy to write a C library with good ergonomics, but I often find the wheels fall apart when you try to have a sane, consistent strategy for memory allocation.\nUnfortunately, pretty much everything that we want to do with string processing needs an allocation. Temporary buffers for intermediate results, or for final results, or for copies to maintain immutability. But nearly all of this memory is transient. You don’t care where it comes from, and you don’t care where it goes.\nThis is where RAII looks really nice. These problems just go away with automatic destruction of resources. But this, too, is a falsehood. A programming language is like a river. It defines a gradient of how difficult is to do so-and-so; a gradient of friction. And people by and large tend to follow that gradient. You tend to do things that are easy to do, and avoid things that are hard to do.\nIn C++, what’s easy to do is to use a standard heap allocator for everything and make lots and lots of small, one-off heap allocations and have RAII free them automatically. But this is nothing more than the gradient at work.\nRAII is not the best way to make frequent small allocations with short lifetimes. But it is the least frictive, at least by default, in C++. We’re in C though, subject to no such gradient, and we can do better regardless. We’re missing one more piece for our string library.\nuse a global allocator\nWe’d prefer not to need a separate heap allocation for all those things. It’s slow, and it’s wasteful, and if we could avoid it then we could end up with a string library that is faster despite being immutable.\nIn my standard library, there is a thread-local context. It is just a global; calling it a context does not make it fancier than what it is. The context holds an allocator, and anything inside sp.h\nwill use this allocator via sp_alloc()\nwhen it needs memory. An allocator can be as simple as a trivial wrapper around malloc()\nor VirtualAlloc()\nor as complex as a fully custom built allocator.\nThe problem of these small, frequent allocations then becomes very simple. All we have to do is use a cheap allocator for temporary memory; I use a bump allocator (one of many names for the same thing) which is just a pointer and an offset. It grabs a large block of memory on initialization, and when you allocate, it just increments the offset. Freeing just decrements the offset.\nThis isn’t a new idea, or an original idea, or even my idea. I ripped this straight from several of the programming folks that I like to follow. And it’s not a panacea, of course. But for this very common use case, it’s incredible.\nmiscellanaeia\noperator overloading\nOperator overloading is pretty nice, but writing a + b\nversus sp_str_join(a, b)\nis meaningless. For more complex operations, sp_str_builder_t\nand sp_format\nare superior APIs anyway.\nliterals\nYou have to wrap literals in a macro:\n#define SP_STR_LIT(cstr) (sp_str_t) { .data = (cstr), .len = strlen(cstr) }\nThis is a little annoying, but also unimportant. For literal-heavy APIs (e.g. sp_format()\n– you almost always want to use a literal for the format string), just provide a function that takes a C string (e.g. sp_format_cstr()\n).\nconversion\nYes, you have to copy sp_str_t\ninto a null terminated buffer if you want to call most C libraries. No, it’s not a big deal. The bump allocator fixes this, too.\nYou could keep a u32 capacity\nin sp_str_t\n, and when allocating tack on an extra byte. When converting to a C string, check the size against the capacity and use that byte as a null terminator if you have it. If not (which is rare; just substrings, pretty much), then you copy. But this is unnecesary.\nintrusive pointers\nSome libraries (e.g. stb_ds.h\n) implement data structures by returning a pointer that has a header allocated before it in memory. Then, API functions can accept a regular pointer while still keeping metadata. This seems like a natural choice for strings, since it lets you unify APIs that take a sp_str_t\nand those that take a const char*\n.\nUltimately, I decided to say fuck it and skip this. It’s admittedly nice not to have to call sp_str_to_cstr()\nat API boundaries. But the problem becomes that you no longer know whether a given const char*\nis actually null terminated; is it a plain C string, or is it one of our strings? The only way to avoid this extreme footgun is to null terminate all strings. And we’re back where we started.\nAnd we bid you good night!\nThat’s it. My C string code is pretty much exactly as ergonomic as, say, my Python string code, minus a few nice-to-have operators.\nI technically cheat this sometimes by adjusting the lengths of strings ↩︎\nThis is a printf replacement that’s in the style of std::format; because\nsp.h\nis a single header library and thus compiled with your program, we can do some funny business to give some compile time guarantees (e.g. making sure what is passed to SP_FMT_* is of the correct type). Unfortunately, compile time checking ofprintf\nstyle format strings is baked into the compiler, so we can’t get around that. Even still, I find it very useful and a great ergonomic improvement. ↩︎ ↩︎", "url": "https://wpnews.pro/news/chicken-soup-for-the-grug-soul", "canonical_source": "https://spader.zone/sp-001/", "published_at": "2025-08-29 00:00:00+00:00", "updated_at": "2026-05-23 05:40:55.004032+00:00", "lang": "en", "topics": ["developer-tools"], "entities": ["C", "Linux", "MacBook Air"], "alternates": {"html": "https://wpnews.pro/news/chicken-soup-for-the-grug-soul", "markdown": "https://wpnews.pro/news/chicken-soup-for-the-grug-soul.md", "text": "https://wpnews.pro/news/chicken-soup-for-the-grug-soul.txt", "jsonld": "https://wpnews.pro/news/chicken-soup-for-the-grug-soul.jsonld"}}