# Show HN: ESP32 512kB – Tailscale, English to Python LLM and 8 containers local

> Source: <https://punnerud.github.io/pyspell/>
> Published: 2026-06-17 20:58:08+00:00

## What it is

A PySpell program is a single expression (Python) or some `let`

bindings followed by a
trailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free
identifiers are resolved at evaluation time against a host-supplied *environment*: CLI variables
on a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted
`fetch_json`

; there are no loops, functions, or imports — that is the point: small, fast, and
safe to accept from elsewhere.

**"Micro-containers" — the direction, honestly stated.** The aim is lightweight, pushable units of code on tiny devices. Today it's a

*sandboxed evaluator*, not OS containers: the sandbox is at the

*language*level (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust

*subset*— not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision.

**Two ways to compile.** On the host, full-fidelity front-ends use

`syn`

(Rust) and `rustpython-parser`

(Python). For "type code in a browser and run it on the
chip", a tiny hand-written parser (a few kB, `no_std`

) builds the same AST on the device.
Either way: source → AST → evaluate.
## An offline AI coding agent, served off the chip

Open `http://<dongle>/`

over the tunnel and you get a Cursor-like agent. Type
*"flash the light"*, *"show the text "hello""*, *"what is 7 plus 5"*,
or *"reverse the word robot"* — a **~0.45 M-parameter language model (< 500 kB, int8)**
turns it into PySpell code, **runs it live on the chip**, and shows the result, or the
physical action (the screen lights up, the RGB LED blinks). Runtime, model, tokenizer and dictionary
are all served **from the dongle, offline** — no cloud, no key (OpenAI is optional, behind
the ⚙).

A model that small is only useful because of a chain of tricks — the full write-up is in
[ tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md). The headlines:

### The model points, the browser copies

A 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked
to. It emits tiny *semantic* directives; the browser copies the literal content verbatim.
`calculate 3 + 2`

→ `print(`

; **3 + 2**)```
change add to
subtract
```

→ `@@ + ==> -`

. Quoted text is literal content — copied byte-for-byte,
excluded from vocab checks.

### The device serves; the browser computes

Inference runs in WebAssembly, client-side. The 0.5 MB model image streams **off flash a
TCP segment at a time** (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted
edge inference: the constrained device serves and grades, the browser runs the model.

### Frozen embeddings, distilled

The 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a
part-of-speech vector, and **frozen** — the tiny model starts with meaningful word
geometry instead of spending its tiny budget learning it.

### The vocabulary is the dictionary

Those same 512 tokens + embeddings are served back to the browser for input validation ("outside the model's vocabulary…") and related-word RAG over the model's own vocabulary.

**Retrain it for your language.** The pipeline is small and template-driven: translate the instruction phrasings (an LLM does this well), swap the embedding model for a multilingual one, re-curate and train, then flash. Full guide in

[tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md).

## Syntax at a glance

### Python

```
free_heap > 100000 and uptime_s < 60
250 if distance > 1000 else 0
0 < temp < 60          # chained
20 not in peers
sum([1, 2, 3])
readings[-1]           # negative index
max(a, b)
```

### Rust

``` js
free_heap > 100000 && uptime_s < 60
if distance > 1000 { 250 } else { 0 }
let used = total - free; used * 100 / total
!peers.contains(20)
sum([1, 2, 3])
readings[readings.len() - 1]
max(a, b)
```

## Language reference

### Literals & values

| Kind | Examples | Notes |
|---|---|---|
| Integer | `0` , `42` , `-7` | 64-bit signed |
| Float | `1.5` , `3.14` | 64-bit |
| Boolean | `true` /`True` , `false` /`False` | both spellings accepted |
| String | `"hello"` , `'oslo'` | `+` concatenates; `==` /`<` compare; `len()` counts chars |
| List | `[1, 2, 3]` | elements are values |

### Operators

| Group | Python | Rust | Notes |
|---|---|---|---|
| Arithmetic | `+ - * / %` (and `//` ) | on integers, `/` and `//` both truncate toward zero; a float operand promotes to float division. There is no separate float floor-div. | |
| Comparison | `== != < <= > >=` | Python allows chaining (`a < b < c` ) | |
| Boolean | `and` , `or` , `not` | `&&` , `||` , `!` | short-circuiting |
| Unary | `-x` , `not x` / `!x` | ||
| Membership | `x in list` , `x not in list` | `list.contains(x)` | numeric equality |
| Index | `list[i]` | negative indexing supported |

### Control flow & bindings

| Feature | Python | Rust |
|---|---|---|
| Conditional | `a if cond else b` | `if cond { a } else { b }` (else required) |
| Local bindings | (single expression only) | `let x = e; let y = e2; final_expr` |
| Free variables | any bare name not bound by `let` is read from the host environment |

### Built-in functions

| Function | Result | Description |
|---|---|---|
`len(list)` | int | number of elements |
`abs(x)` | number | absolute value |
`min(list)` / `min(a, b, …)` | number | minimum |
`max(list)` / `max(a, b, …)` | number | maximum |
`sum(list)` | number | sum of a numeric list |
`any(list)` | bool | true if any element is truthy |
`all(list)` | bool | true if all elements are truthy |
`round(x)` | int | round to nearest integer |
`int(x)` | int | truncate toward zero |
`float(x)` | float | convert to float |
`bool(x)` | bool | truthiness |
`index(list, x)` | int | position of first `x` , or `-1` |
`before(list, a, b)` | bool | true if `a` occurs before `b` |
`first(list)` | value | first element, or `-1` if empty |
`last(list)` | value | last element, or `-1` if empty |
`str(x)` | string | string representation of a value |
`json_get(text, "a.b.0.c")` | scalar | extract the scalar at a dotted/indexed JSON path (no full parse — only the matched value is materialized) |
`fetch(url)` | string | HTTP(S) GET body. Gated by a host allowlist; errors if the host isn't allowed or no network capability is present |
`fetch_json(url, "a.b.0.c")` | scalar | stream the response and extract just the scalar at the path, stopping as soon as it's found — never buffers the whole body. Preferred on the device. |
`show(x)` | x | render `x` to text and display it (the ESP32 screen; stdout on host), returning `x` so it composes. Device gates it via config (allow on/off, auto-revert seconds). |

Classic one-liner — fetch a value and show it on the dongle's screen:

```
show("Oslo: " + fetch_json(
  "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
  "properties.timeseries.0.data.instant.details.air_temperature") + " C")
# screen shows:  Oslo: 14.9 C   (and the call returns that string)
```

## Network & JSON

`fetch(url)`

+ `json_get(text, path)`

let a program pull live data and read one
field out of it. `fetch`

is a mediated capability — the host/device decides which hosts are
reachable (an allowlist), so a program can't reach arbitrary URLs.

```
# Host CLI (allow the host explicitly):
pyspell run oslo_temp.py --allow-host api.met.no
# where oslo_temp.py is:
json_get(
  fetch("https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75"),
  "properties.timeseries.0.data.instant.details.air_temperature")
# → 14.9
```

**Memory note (device):** `json_get`

is path-directed so it never builds the
whole document in RAM — it materializes only the matched value. On the ESP32 (≈60 kB free, no PSRAM)
reading a field out of a large response is feasible because `fetch_json`

*streams* the
HTTP(S) body and stops the moment the field is found (freeing the TLS buffers early) — so a ~50 kB yr.no
response never has to fit in RAM at once.

```
# On the ESP32, over Tailscale (single process; ≈60 kB free; verified live):
fetch_json(
  "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
  "properties.timeseries.0.data.instant.details.air_temperature")
# → 14.9   (the dongle fetched yr.no itself)
```

## Running on the host

```
# Evaluate, binding free variables:
cargo run -p pyspell-cli -- run examples/health.py --set free_heap=120000 --set uptime_ms=45000
# → true

# Compile to a portable IR blob:
cargo run -p pyspell-cli -- compile examples/health.py    # → examples/health.py.psb

# Push live to a device over USB-serial, or an interactive REPL:
cargo run -p pyspell-cli -- repl --port /dev/cu.usbmodem2101 --lang python
```

## Running on the ESP32

The portable evaluator (`pyspell-core`

, `no_std + alloc`

) runs unchanged on the
ESP32-S3. Programs read live device variables from the environment:

| Variable | Meaning |
|---|---|
`free_heap` | free heap, bytes |
`min_free_heap` | lowest free heap seen since boot, bytes |
`uptime_ms` | milliseconds since boot |
`uptime_s` | seconds since boot |

### Demo: PySpell over Tailscale

The `demo/esp32-tailscale-pyspell`

firmware adds a web text window and a `/run`

API *inside a Tailscale tunnel* — open the device's Tailscale IP in a browser, type an
expression, set a timeout, and run it on the chip. PySpell adds only ~62 kB on top of the networking
firmware.

```
# Web window:
open http://100.x.y.z/

# POST (preferred): program in the body, lang/timeout in the query.
# More room for code than a URL, and no percent-encoding.
curl -X POST 'http://100.x.y.z/run?lang=py&timeout=10' --data 'free_heap > 100000'   # → true
curl -X POST 'http://100.x.y.z/run?lang=rs&timeout=10' --data 'uptime_ms / 1000'       # → 22

# GET (also supported): code is URL-encoded in the query.
curl 'http://100.x.y.z/run?lang=py&timeout=10&code=free_heap%20%3E%20100000'   # → true
```

`timeout`

is in seconds, clamped to 1–60, and enforced as a real wall-clock deadline on
the device. The single request must fit one TCP segment (≈1.2 kB) — POST leaves more of that for code.

### Response format

The reply is `text/plain`

(no JSON wrapper):

| Outcome | Body |
|---|---|
| Success | the raw value — `true` /`false` , an integer, a float, or a list like `[1, 2, 3]` |
| Failure | a line starting with `error:` — e.g. `error: parse error: unexpected end of input` , `error: unknown name `foo`` , or `error: program exceeded its time limit` |

## How it fits in 512 kB

The ESP32-S3 has **512 kB of SRAM and no PSRAM**, yet it runs a full Tailscale node
(control plane *and* DERP), the PySpell evaluator, a browser agent IDE served off the chip, a
native MCP server, and TLS to api.met.no. That only fits because of a long chain of memory tricks.

**Honest headline.** The "~260 kB free" you see between requests is a calm-moment reading. The number that matters is the

**worst-case peak free heap: ≈60 kB**, measured during a TLS fetch with the Tailscale control session live. Every trick below keeps transient spikes under that ceiling — and the blunt consequence is that an 8-way parallel pool and full Tailscale

*don't*coexist on the esp-idf stack; cheap parallelism waits for the lean pure-Rust stack.

### Crypto & TLS

**SPKI leaf-key pinning** instead of CA-chain validation — one RSA-PSS verify, no 6 kB
chain buffer (a TLS fetch drops ~45→30 kB). A **heap admission gate** bounds concurrency
so peak heap is `K × per-fetch`

, never `N × per-fetch`

.

### Stream, don't buffer

The netmap is read with `serde_json::from_reader`

over the HTTP/2 frames, so serde
**skips the huge DERPMap field** instead of buffering it (~60 kB → one 4 kB chunk).
`fetch_json`

stops the moment the value is found, and raw **byte-scans**
replace JSON DOM trees.

### Pages from flash

Static content lives in flash as `&'static str`

(zero heap) and is streamed out as
**512-byte TCP segments** — only the current segment is ever in RAM, so the 4.3 kB agent
IDE serves without a full-page buffer.

### Allocator & sockets

Heap and stack share one DRAM pool (**+16 kB heap = −16 kB stack**), tuned by hand.
`SO_LINGER=0`

frees lwIP sockets immediately (no TIME_WAIT pile-up), and a
**cooperative shared stack** on the lean build makes parallelism cheap where per-thread
stacks can't.

The full catalog — every trick with the exact file and symbol — is in
[ docs/memory-512kb.md](https://github.com/punnerud/pyspell/blob/main/docs/memory-512kb.md).

## Sandbox & limits

**Deny-by-default grammar.** Only the whitelisted expression nodes and the built-ins above exist — no loops, functions, recursion, attribute access, imports, strings, or I/O.**Instruction budget.** Every evaluation has a step limit (runaway guard).**Wall-clock timeout.** A caller can supply a deadline (e.g. 10 s); on the device the ESP timer enforces it.**Parser stays small.** The on-device parser accepts only the safe subset, so the device's attack surface is just a bounded decoder + evaluator.
