Someone dumped 20 zero-days on open source tools with no warning. The fuzzing was run by AI.

wpnews.pro

Last week an anonymous GitHub account called bikini

pushed a repository named exploitarium

and, in the space of a few days, dropped more than twenty proof-of-concept exploits against popular open source software. nmap, Ghidra, FFmpeg, VLC, Firefox, libssh2, c-ares, OpenVPN, Docker, PHP, ImageMagick. None of the bugs had been reported to the maintainers beforehand. None were patched. The README said so plainly: at the time of posting, none had been reported, and you were free to file them yourself and "take credit for the CVE."

It hit the top of Hacker News and the thread filled up fast, and the argument that broke out underneath is more interesting than the exploits themselves.

This is worth paying attention to if you build, ship, or rely on open source infrastructure. Which is most of us.

The repository is a single consolidated archive. Twenty-three folders, each one a self-contained proof of concept against a different target. Some are network plumbing you probably run without thinking about it: c-ares (the DNS resolver library behind curl and a long list of other tools), libssh2, nghttp2, OpenVPN. Some are tools developers use directly: Ghidra and objdump for reverse engineering, nmap for scanning, Wireshark-adjacent dissectors in spirit if not in this specific repo. Several are media decoders, which is a category that has been quietly dangerous for two decades: FFmpeg, VLC, ImageMagick, 7-Zip's RAR5 handling.

A few of the entries already carry CVE numbers. libssh2-cve-2026-55200

is in there by name. Others read like serious findings: use-after-free bugs, RCE candidates, a privilege escalation in MyBB, a local privilege escalation in SystemInformer. The nmap one targets IPv6 extension-length parsing, and several people in the thread flagged it as potentially high severity because it sits in parser code, which is exactly where remote code execution tends to hide.

So this is not one researcher responsibly nudging one vendor. It is someone emptying a notebook onto the public internet, all at once.

Here is where the story gets complicated, and why the Hacker News thread turned into a forensic argument rather than a victory lap.

People who actually know these codebases went digging, and the verdict was mixed. A Ghidra user pointed out that one of the three Ghidra findings requires you to already have the ability to overwrite binaries in the Swift tool directory. If you can overwrite the binaries a program runs, yes, you get code execution. That is not a vulnerability. That is how computers work. Someone else called the Docker entry "just a weird bug" rather than an exploitable flaw, and the VLC proof of concept landed as a straight crash with no path to running arbitrary code.

But not everything was dismissed. The same person who brushed off Ghidra checked c-ares, libssh2, and FFmpeg and said the bugs there "seem to all work as of the latest upstream commit." The nmap parser bug drew serious concern. The Firefox one, which involves the private-data and untrusted-input flags, sounded plausible to people who don't work on Firefox internals but couldn't fully rule it in or out.

This is the messy reality of a drop like this. No single person is an expert on every codebase. A maintainer of FFmpeg is not also a maintainer of OpenVPN. So when twenty bugs land at once, the entire security community has to scramble to triage them in parallel, and a lot of that triage happens in public, badly, on a forum, while the clock runs.

This is the part I keep coming back to.

After the initial wave of criticism, the account posted a statement, and it is unusually honest about method. The fuzzing workflow was automated. All of it. They used GPT-5.5-3-Codex-Spark to run the fuzzing against a strict harness, and the AI flagged candidate bugs that a human then investigated and confirmed.

Two things in that statement stood out. First, the author pushes back hard on the idea that you need a frontier model to find these. "You do not need a SOTA model to help you identify these issues," they wrote. Their data, they claim, shows that the model choice is marginal once you pair decent human oversight with a good harness. Second, they are explicit about what was and was not AI-generated: the proofs of concept were hand-typed, except RustDesk, where they leaned on AI because they are less fluent in the language. The README writeups, they admit, are "very clearly entirely AI," because an LLM formats Markdown well and they reviewed the output for accuracy.

Read that again. The discovery was AI-assisted and scaled. The writeups were AI-generated. The actual exploit code was human. That division of labor is probably the template for the next few years of security research, and it is a little unsettling how productive it is.

We are used to thinking of AI as either a coding autocomplete or a threat that writes malware. This is neither. It is an analyst that never sleeps, pointing a human at the exact lines of C where an integer wraps or a buffer gets reused after free. The human still has to write the exploit and confirm it lands. But the finding, the part that used to take weeks of manual auditing, got compressed into an automated loop.

The ethical question underneath all of this is older than the internet, and it never gets resolved. It just gets re-litigated every time someone drops a batch of bugs.

There are two camps, and both have a real point.

Coordinated disclosure says you tell the maintainer first, give them a reasonable window, usually sixty to ninety days, to ship a fix, and only then publish. The argument is simple: publishing a working exploit for an unpatched bug puts every user at risk the moment it goes public, and the people who get hurt first are not the maintainers. They are the admins and end users who had no say in the matter.

Full disclosure says the opposite. Publish immediately. The argument has a few forms, but the core one is that secret bugs are not safe. They sit in the code whether or not they are public, and the only people who benefit from them staying quiet are attackers who already know. Public pressure, the full-disclosure crowd argues, is the only reliable way to get things fixed. Vendors drag their feet. ninety-day windows stretch into years. Silence protects nobody.

The bikini

account took the most extreme version of full disclosure. No private report. No window. No warning. The bugs went straight to GitHub, working exploits attached, and the author invited the community to do the responsible reporting themselves.

In the Hacker News thread the split was visible in real time. One commenter said flatly that they prefer these being public over sitting in "some government or corporate toolbox." Another countered that disclosure still enables better software to exist, even if nobody follows through. A third, the one I found most honest, said the whole situation just sucks: we are apparently going to need every open source project in the world to stop and audit this one repository on the off chance that their codebase is in it.

I do not have a clean answer here. Coordinated disclosure is the grown-up choice and I lean toward it, but I also cannot pretend the full-disclosure argument is wrong. What I am certain of is that this model, one person emptying twenty bugs onto GitHub with no warning, is going to get copied. The cost of doing it just dropped, because of the AI fuzzing part.

Step back from this specific drop and a pattern is obvious, and it is not new.

Almost every target in the repo parses untrusted input in C. Network protocol parsers. Container formats. Image and video decoders. DNS messages. SSH handshakes. The reason Wireshark dissectors came up repeatedly in the discussion, even though Wireshark is not directly in this repo, is that dissectors are protocol decoders written almost entirely in C, and anyone who can send packets can pick which decoder runs. The same is true of media players: hand VLC a malformed VP9 stream and you are exercising C code that was never formally verified, written under deadline pressure, handling a format specification that runs hundreds of pages.

C is not the enemy. A lot of the most important software in the world is written in it. But C does not check your bounds, and it does not refuse to use memory after you free it, and it does not tell you when you have wrapped an integer past its maximum. Every parser written in C is a standing invitation to exactly the class of bug that fills this repository: use-after-free, out-of-bounds write, integer overflow that becomes a memory corruption.

This is why the Rust rewrite movement exists. It is why curl now ships an optional Rust-based DNS resolver. It is why people keep trying to move parsers into memory-safe languages. The bugs in exploitarium

are not exotic. They are the same bugs we have been shipping for forty years, found faster, by a loop that runs while the human sleeps.

If you maintain or operate any of the affected software, the practical moves are unglamorous. Patch as fixes land. The maintainers are going to be triaging in public over the coming weeks, and the responsible thing for everyone else is to follow their advisories and update promptly. Do not pull proof-of-concept code from a random GitHub repository onto a production machine to "test" it. Several commenters joked about the irony of security hobbyists rushing to download exploits and compromising themselves in the process. It is not really a joke.

If you run services that accept untrusted input, and you probably do, sandbox the parser. Run your media decoders and protocol parsers in containers or seccomp-restricted processes. Network parsers like nmap and protocol dissectors belong behind a trust boundary, not on a laptop that also holds your SSH keys. And if you are a maintainer of C-based parsing code, this is a good week to look at property-based testing and continuous fuzzing of your own. The author of this drop built an AI-assisted harness in their spare time. Organizations with actual budgets can do the same thing before someone else does it for them and publishes the results without warning.

I keep thinking about the asymmetry. One anonymous person, an AI fuzzing loop, a GitHub account, and suddenly twenty open source projects have to drop everything and figure out whether they are on fire. The maintainer of a small library does not have a security team. They have evenings and weekends and a day job. Multiply that across twenty-three codebases and you get a meaningful slice of the open source world burning cycles it does not have.

The bugs are real, or some of them are. The disclosure ethics are debatable, and reasonable people land on different sides. But the part that feels new is the speed and the scale, and the fact that the loop behind it is going to get cheaper, not more expensive, from here. The next drop will be bigger. It will probably also be messier.

We have been treating memory bugs in parsers as a background fact of computing for decades. That background hum is about to get louder.

source & further reading

dev.to — original article Monlite – documents, vectors, cache, and job queue in one SQLite file What Loop Engineering Needs From Runtime Infrastructure Hey there! 👋

Someone dumped 20 zero-days on open source tools with no warning. The fuzzing was run by AI.

Run your AI side-project on zahid.host