# Data structures your CS degree kind of glossed over

> Source: <https://dev.to/lovestaco/data-structures-your-cs-degree-kind-of-glossed-over-29dh>
> Published: 2026-07-04 12:34:37+00:00

*Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git-lrc to help devs discover the project. Do give it a try and share your feedback.*

Every CS program hammers the same seven into you.

Arrays, linked lists, hash tables, stacks, queues, graphs, trees. You could probably recite their Big O complexities in your sleep at this point, and honestly, for 90% of the code you'll ever write, that's plenty.

But every now and then a system hits a wall that none of the seven basics can handle gracefully, and someone had to invent a weirder tool to patch the gap.

I went down a rabbit hole recently looking at a handful of these, and I liked them enough that I wanted to write them up properly instead of just leaving forty open tabs to rot.

Fair warning, there is some depth here. Get a drink.

Normal hash tables are great until you need to ask "have I possibly seen this before" across a dataset way too big to store in memory.

Think a crawler checking billions of URLs, or a database deciding whether it's even worth going to disk to look for a row.

A Bloom filter solves this by giving up on certainty in one direction.

It's a fixed array of bits, plus a small handful of independent hash functions.

Adding an item flips a handful of bits on.

Checking for an item hashes it the same way and checks whether those same bits are on.

If any single bit is off, that item was never added, full stop, no ambiguity.

If they're all on, the item was probably added, but two unrelated items can accidentally light up the same bits, so you might get a false alarm.

The asymmetry is the entire design.

Zero false negatives, occasional false positives.

It's the data structure equivalent of a metal detector at a stadium gate.

It'll never wave through someone with a knife, but it might beep at your belt buckle and make you empty your pockets for nothing.

That tradeoff is exactly what you want for a cheap first filter before doing the expensive real check.

Databases like Cassandra and Postgres use this trick internally, and browsers have used it to screen URLs against malicious site lists without hitting the network for every single page load.

Regular hash tables deal with collisions by either chaining items into a list at the same slot, or probing forward to the next open slot.

Both work, but both can degrade badly if your table gets crowded or your hash function isn't playing nice with your data.

Cuckoo hashing takes a different approach.

Every key gets two candidate slots from two separate hash functions instead of one.

Insert a key, and if its first slot is free, done.

If it's taken, you don't chain or probe, you just boot the current occupant out and drop it into its own second candidate slot.

If that slot happens to be full too, that displaced key gets bumped along to its alternate, and so on, like a slow-motion game of musical chairs where the last person standing just keeps shoving someone else off a seat until the music stops.

The payoff is a genuine worst case O(1) lookup.

Not "usually fast," not amortized, actually guaranteed constant time, because any key can only ever live in one of two known spots.

You check both, and you're done.

The tradeoff is that insertion can occasionally spiral into a cycle of evictions that forces a full rehash, so it's a structure you pick when read speed matters more than insert simplicity.

Binary search trees cap every node at two children, which is fine on paper but turns ugly once your dataset is large enough that the tree lives partly on disk instead of in memory.

More children per level would mean fewer levels total, but a binary tree can't do that by definition, so as your data grows, the tree grows tall, and every search means walking down that height one disk read at a time.

Disk reads are slow enough that shaving off even a few levels makes a real difference.

The fix, generalized decades ago into what's called a B-tree, is to let each node hold many sorted keys and point to many children instead of just two.

Picture a library where instead of one long single-file shelf you'd have to walk end to end, you've got wide shelving units, each holding dozens of sorted spines, with a sign at the end of the aisle telling you which unit to check next.

Fewer aisles to walk, fewer trips, same sorted order underneath.

That structure is why B-trees and their close cousin, the B+ tree, are the backbone of most file systems and relational databases today.

Short, wide trees beat tall, skinny ones the moment "one more level" costs you real time instead of a rounding error.

Now take a tree built specifically for prefix lookups, like the kind used to route internet traffic.

Every device on the internet has an IP address, and routers need to match incoming packets against routing tables with potentially huge numbers of entries, fast enough that you never notice the delay.

A plain prefix tree wastes a lot of space on long, single-child chains, nodes that only exist to lead to exactly one other node.

A radix tree cleans that up by collapsing any run of single-child nodes into one combined edge.

The tree ends up shorter and denser wherever your data shares long common prefixes, which is exactly the shape IP address ranges tend to have.

Where this falls apart is on data without shared structure.

Feed a radix tree a pile of random, high-entropy strings, and the compression trick barely helps, because there's nothing common to collapse.

Try inserting a single character at the very start of a half-million-word string, stored as one contiguous block of memory, and watch everything after it get copied over one slot to make room.

That's fine for a tweet, brutal for a novel-length document, which is exactly the problem text editors run into.

A rope sidesteps this by never storing the text as one giant block in the first place.

Instead it's broken into smaller chunks, strung together as leaves of a tree, and every internal node in that tree keeps a running count of how many characters live in everything beneath it.

Want character 40,000? You don't scan from the beginning, you walk down the tree using those counts like a table of contents, skipping whole branches you don't need.

Editing near the start no longer means rewriting everything after it, just re-linking a small part of the tree.

This is a big part of why editors like VS Code stay responsive on genuinely massive files, and it's the same underlying idea used in collaborative editing tools that need to merge edits from multiple people without re-copying the whole document every keystroke.

If you want to go deeper on the split and concatenate mechanics, [Wikipedia's rope article](https://en.wikipedia.org/wiki/Rope_(data_structure)) is a solid next stop.

None of these structures are trying to dethrone arrays or hash tables.

Every one of them exists because a specific, real constraint (a slow disk, a shared address prefix, an unwieldy string, a need for certainty in one direction only, a demand for guaranteed constant time) made the basic version buckle, and someone patched it with one deliberate structural twist.

If you want the original source for the Bloom filter idea, [Burton Bloom's 1970 paper](https://dl.acm.org/doi/10.1145/362686.362692) is short and surprisingly readable for something that's still load-bearing infrastructure over fifty years later.

What's the weirdest structure you've actually had to reach for outside a classroom? I'd genuinely like to hear the story behind it.

Disclaimer: This article was written by me; AI was used to fix grammar and improve readability.

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs — without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

| [🇩🇰 Dansk](https://github.com/HexmosTech/git-lrc/readme/README.da.md) | [🇪🇸 Español](https://github.com/HexmosTech/git-lrc/readme/README.es.md) | [🇮🇷 Farsi](https://github.com/HexmosTech/git-lrc/readme/README.fa.md) | [🇫🇮 Suomi](https://github.com/HexmosTech/git-lrc/readme/README.fi.md) | [🇯🇵 日本語](https://github.com/HexmosTech/git-lrc/readme/README.ja.md) | [🇳🇴 Norsk](https://github.com/HexmosTech/git-lrc/readme/README.nn.md) | [🇵🇹 Português](https://github.com/HexmosTech/git-lrc/readme/README.pt.md) | [🇷🇺 Русский](https://github.com/HexmosTech/git-lrc/readme/README.ru.md) | [🇦🇱 Shqip](https://github.com/HexmosTech/git-lrc/readme/README.sq.md) | [🇨🇳 中文](https://github.com/HexmosTech/git-lrc/readme/README.zh.md) | [🇮🇳 हिन्दी](https://github.com/HexmosTech/git-lrc/readme/README.hi.md) |

GenAI today is a **race car without brakes**. It accelerates fast -- you describe something, and large blocks of code appear instantly. But AI agents *silently break things*: they remove logic, relax constraints, introduce expensive cloud calls, leak credentials, and change behavior -- without telling you. You often find out in production.

** git-lrc is your braking system.** It hooks into

`git commit`

and runs an AI review on every diff In short, git-lrc helps **Prevent Outages, Breaches, and Technical Debt Before They Happen**

**At a glance:** [10 risk categories](https://github.com/HexmosTech/git-lrc#what-git-lrc-checks-for) · [100+ failure patterns tracked](https://github.com/HexmosTech/git-lrc#what-git-lrc-checks-for) · every commit…