May 29, 2026
a public Dirty Frag PoC failed, so the server looked safe. a cheap DeepSeek-V4-Flash feedback loop found the missed path -> fcrypt mismatch, nscd cache, and root in ~90 minutes.
the short version #
i got root on my university's shared login server. not because the sysadmins were asleep or because the box was some ancient forgotten machine. they were actually fast. they had read the CVE writeups, blocked the recommended kernel crypto interface, disabled unprivileged user namespaces, tested the public PoC, watched it fail, and moved on.
reasonable response tbh. the problem is that public PoCs are not truth oracles. they only tell you one thing: this exact code path, as written, did not work on this exact run. that is not the same as "the system is safe".
so i put DeepSeek-V4-Flash in a boring shell feedback loop on a Lightsail replica:
compile -> run -> read error -> patch -> repeat
about 90 minutes later, the exploit worked on the real server. i reported it, and they patched it properly within an hour. this is mostly a field-trip blog on the whole journey to get root access and how cheap intelligence has gotten if harnessed correctly.
the target #
this was ssh1.iitd.ac.in
, a shared login box for the electrical engineering department at IIT Delhi. my friend had gotten root on it last year when it was running Linux kernel 4 on Ubuntu 16. ancient era. this time it was not ancient anymore.
initial state:
| thing | value |
|---|---|
| kernel | Linux 6.8.0-111-generic , built April 11 2026 |
| distro | |
kernel.unprivileged_userns_clone = 0
rulesmodprobe
important bit: AF_ALG
was blocked, but pcbc(fcrypt)
was still registered in /proc/crypto
. foreshadowing...
background, quickly #
Dirty Frag is a Linux kernel privilege escalation from May 2026. it has two main paths: CVE-2026-43284 -> xfrm ESP path, and CVE-2026-43500 -> RxRPC path.
both get to a page-cache corruption primitive during kernel crypto work. the public exploit uses that to temporarily modify /etc/passwd
, blank root's password, then su
to root. very funny family of exploits.
AF_ALG is the kernel's userspace crypto socket API. most writeups said: block
AF_ALG
, block the algif modules, you're good until you patch. this is a really good patch as it stops the public poc from working.## first attempt
i first tried Copy Fail, mostly because it had dropped around the same time and looked like the obvious thing to test. it died immediately:
socket(AF_ALG, ...) = -1 EAFNOSUPPORT
fair enough. mitigation worked.
then i tried Dirty Frag.
ESP path -> blocked because unprivileged user namespaces were disabled.
RxRPC path -> got further, because creating an AF_RXRPC
socket caused rxrpc
, fcrypt
, and pcbc
to auto-load. but it still failed at the checksum step because the public PoC expected to use AF_ALG
.
at this point the obvious conclusion was:
both cves blocked -> public poc failed -> server safe
most people will probably stop there. but pcbc(fcrypt)
was still in /proc/crypto
, so the question became: if the kernel can still use the algorithm internally, why is userspace AF_ALG
the final blocker?
the loop #
this is where deepseek v4 flash, the cheapest model known to mankind, did the useful work.
the loop was boring:
compile -> run -> read stderr + dmesg + return code -> patch code -> repeat
i love boring loops coz they work and i can track them.
also slightly cursed realization: this is just control engineering. this semester i was studying ELL 225: Control Engineering. i did not expect any of that to make my prompting better, but it did.
LLM in a harness feels like a controller: goal prompt -> setpoint shell/tool environment -> plant stderr/dmesg/tests -> sensor code edits -> control input repeated runs -> feedback loop.
what the agent found #
1. public PoC fcrypt != kernel fcrypt
this was the actual turn. within about 9 minutes, the agent noticed that the fcrypt
implementation in the public PoC did not match the kernel's pcbc(fcrypt-generic)
implementation. different round structure. different key mixing. byte order weirdness.
i will be honest here, i had no idea about this and i was already far from my home waters. dipping my toes first time in cybersecurity.
2. byte order in fcrypt_user_setkey
on the Lightsail replica, the agent fixed the byte order in fcrypt_user_setkey
so the userspace fallback matched the kernel schedule. small and boring but important fix.
3. POC_NO_UNSHARE
there was also a POC_NO_UNSHARE
path already sitting in the codebase. using it got past the user namespace setup and moved the exploit from immediate failure to rc=3
.
i feel stupid for not checking this myself first.
4. nscd
final blocker was nscd. on Ubuntu 24.04,
nscd
can cache passwd lookups. so even after the page-cache corruption modified /etc/passwd
, PAM could still see the old cached root:x:0:0:...
entry and reject the blank password. the exploit looked like it failed even when the file had already changed.fix:
systemctl is-active --quiet nscd && nscd --invalidate passwd
root and cleanup #
final run on the real server was around 05:24 IST on May 22. first shot failed because of nscd. patched that. second shot worked. root shell.
cleanup was straightforward: restored original /etc/passwd
, unloaded rxrpc
, fcrypt
, and pcbc
, dropped caches, deleted the exploit binary, and checked syslog for the RxRPC unregister messages.
reported to sysadmins within the hour. they patched properly within another hour by upgrading the kernel to Linux 6.8.0-117-generic
.
the actual failure #
this is not a "lol sysadmins bad" post. they did the obvious mitigation and verified it against the obvious public exploit. problem is that the obvious exploit was not the full search space.
they treated public PoC failed
same as systems are safe
.
those are not the same statement.
three things slipped through:
- the RxRPC path auto-loaded modules the AF_ALG mitigation did not cover
- the public PoC had an fcrypt mismatch, so its failure was partially a PoC bug
- nobody iterated after the first failure
why this matters for agents #
deepseek, and other chinese models like glm 5.1 are CHEAPP. like, stupid cheap compared to the american counterparts.
for this problem, raw model intelligence was not the bottleneck. it did not need to be a god model like mythos. it needed to compile code, run code, read errors, compare source, and try again. that's it.
this is why i think cheap subagents + feedback loops are underrated for most tasks. the useful question is often not:
can this model solve the whole thing in one shot?
it is:
can this model keep trying sane variants without getting bored?
mythos and other god-tier models are good at seeing the bigger chain. cheap subagents are good at grinding through the local search space once you know where to look.
mitigation #
real fix: patch your kernel.
upstream commits:
-> ESP pathf4c50a4034e6
-> RxRPC pathaa54b1d27fe0
if you cannot patch immediately, don't only block AF_ALG
. block protocol modules too:
printf 'install esp4 /bin/false
install esp6 /bin/false
install rxrpc /bin/false
install af_alg /bin/false
' > /etc/modprobe.d/dirtyfrag.conf
rmmod esp4 esp6 rxrpc af_alg 2>/dev/null
sync && echo 3 > /proc/sys/vm/drop_caches
and if you're testing a mitigation, don't stop at public PoC failed
. run it in a staging environment through a loop. make the agent explain why it failed, then make it try the branch it thinks should still work.
timeline #
| phase | start (IST) | end | duration | active |
|---|---|---|---|---|
| Copy Fail attempt + AF_ALG discovery | May 21 23:38 | 23:45 | ~7 min | yes |
| Dirty Frag pivot + first RxRPC test | 23:46 | 23:55 | ~9 min | yes |
| fcrypt mismatch discovery | 23:56 | 00:05 | ~9 min | yes |
| Lightsail testing + byte order fix | May 22 00:05 | 00:20 | ~15 min | yes |
| gap / sleep / other things | 00:20 | 04:05 | ~3h 45m | no |
| final test on real server | 04:05 | 05:24 | ~1h 19m | yes |
| root + cleanup | 05:24 | 05:46 | ~22 min | yes |
| damage assessment + report | 05:46 | 05:50 | ~4 min | yes |
total: about 95 minutes of active agent work, about 10 minutes of my active time, about 90 minutes wall-clock from first prompt to root shell.
closing #
god-tier models like mythos and cheap subagents are two entirely different beasts.
one is good at chaining different small pieces into a bigger exploit path. the other is good at exploring subspaces around a known exploit and checking whether the system is actually secure.
cheap subagents may have a really good use case in defending systems against fresh CVEs:
cve reported -> agents check replica(s) of prod -> findings go to sysadmin -> mitigations get implemented if available
not glamorous. probably useful.