# Show HN: ESP32 512kB – Tailscale, English to Python LLM and 8 containers local > Source: > Published: 2026-06-17 20:58:08+00:00 ## What it is A PySpell program is a single expression (Python) or some `let` bindings followed by a trailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free identifiers are resolved at evaluation time against a host-supplied *environment*: CLI variables on a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted `fetch_json` ; there are no loops, functions, or imports — that is the point: small, fast, and safe to accept from elsewhere. **"Micro-containers" — the direction, honestly stated.** The aim is lightweight, pushable units of code on tiny devices. Today it's a *sandboxed evaluator*, not OS containers: the sandbox is at the *language*level (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust *subset*— not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision. **Two ways to compile.** On the host, full-fidelity front-ends use `syn` (Rust) and `rustpython-parser` (Python). For "type code in a browser and run it on the chip", a tiny hand-written parser (a few kB, `no_std` ) builds the same AST on the device. Either way: source → AST → evaluate. ## An offline AI coding agent, served off the chip Open `http:///` over the tunnel and you get a Cursor-like agent. Type *"flash the light"*, *"show the text "hello""*, *"what is 7 plus 5"*, or *"reverse the word robot"* — a **~0.45 M-parameter language model (< 500 kB, int8)** turns it into PySpell code, **runs it live on the chip**, and shows the result, or the physical action (the screen lights up, the RGB LED blinks). Runtime, model, tokenizer and dictionary are all served **from the dongle, offline** — no cloud, no key (OpenAI is optional, behind the ⚙). A model that small is only useful because of a chain of tricks — the full write-up is in [ tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md). The headlines: ### The model points, the browser copies A 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked to. It emits tiny *semantic* directives; the browser copies the literal content verbatim. `calculate 3 + 2` → `print(` ; **3 + 2**)``` change add to subtract ``` → `@@ + ==> -` . Quoted text is literal content — copied byte-for-byte, excluded from vocab checks. ### The device serves; the browser computes Inference runs in WebAssembly, client-side. The 0.5 MB model image streams **off flash a TCP segment at a time** (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted edge inference: the constrained device serves and grades, the browser runs the model. ### Frozen embeddings, distilled The 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a part-of-speech vector, and **frozen** — the tiny model starts with meaningful word geometry instead of spending its tiny budget learning it. ### The vocabulary is the dictionary Those same 512 tokens + embeddings are served back to the browser for input validation ("outside the model's vocabulary…") and related-word RAG over the model's own vocabulary. **Retrain it for your language.** The pipeline is small and template-driven: translate the instruction phrasings (an LLM does this well), swap the embedding model for a multilingual one, re-curate and train, then flash. Full guide in [tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md). ## Syntax at a glance ### Python ``` free_heap > 100000 and uptime_s < 60 250 if distance > 1000 else 0 0 < temp < 60 # chained 20 not in peers sum([1, 2, 3]) readings[-1] # negative index max(a, b) ``` ### Rust ``` js free_heap > 100000 && uptime_s < 60 if distance > 1000 { 250 } else { 0 } let used = total - free; used * 100 / total !peers.contains(20) sum([1, 2, 3]) readings[readings.len() - 1] max(a, b) ``` ## Language reference ### Literals & values | Kind | Examples | Notes | |---|---|---| | Integer | `0` , `42` , `-7` | 64-bit signed | | Float | `1.5` , `3.14` | 64-bit | | Boolean | `true` /`True` , `false` /`False` | both spellings accepted | | String | `"hello"` , `'oslo'` | `+` concatenates; `==` /`<` compare; `len()` counts chars | | List | `[1, 2, 3]` | elements are values | ### Operators | Group | Python | Rust | Notes | |---|---|---|---| | Arithmetic | `+ - * / %` (and `//` ) | on integers, `/` and `//` both truncate toward zero; a float operand promotes to float division. There is no separate float floor-div. | | | Comparison | `== != < <= > >=` | Python allows chaining (`a < b < c` ) | | | Boolean | `and` , `or` , `not` | `&&` , `||` , `!` | short-circuiting | | Unary | `-x` , `not x` / `!x` | || | Membership | `x in list` , `x not in list` | `list.contains(x)` | numeric equality | | Index | `list[i]` | negative indexing supported | ### Control flow & bindings | Feature | Python | Rust | |---|---|---| | Conditional | `a if cond else b` | `if cond { a } else { b }` (else required) | | Local bindings | (single expression only) | `let x = e; let y = e2; final_expr` | | Free variables | any bare name not bound by `let` is read from the host environment | ### Built-in functions | Function | Result | Description | |---|---|---| `len(list)` | int | number of elements | `abs(x)` | number | absolute value | `min(list)` / `min(a, b, …)` | number | minimum | `max(list)` / `max(a, b, …)` | number | maximum | `sum(list)` | number | sum of a numeric list | `any(list)` | bool | true if any element is truthy | `all(list)` | bool | true if all elements are truthy | `round(x)` | int | round to nearest integer | `int(x)` | int | truncate toward zero | `float(x)` | float | convert to float | `bool(x)` | bool | truthiness | `index(list, x)` | int | position of first `x` , or `-1` | `before(list, a, b)` | bool | true if `a` occurs before `b` | `first(list)` | value | first element, or `-1` if empty | `last(list)` | value | last element, or `-1` if empty | `str(x)` | string | string representation of a value | `json_get(text, "a.b.0.c")` | scalar | extract the scalar at a dotted/indexed JSON path (no full parse — only the matched value is materialized) | `fetch(url)` | string | HTTP(S) GET body. Gated by a host allowlist; errors if the host isn't allowed or no network capability is present | `fetch_json(url, "a.b.0.c")` | scalar | stream the response and extract just the scalar at the path, stopping as soon as it's found — never buffers the whole body. Preferred on the device. | `show(x)` | x | render `x` to text and display it (the ESP32 screen; stdout on host), returning `x` so it composes. Device gates it via config (allow on/off, auto-revert seconds). | Classic one-liner — fetch a value and show it on the dongle's screen: ``` show("Oslo: " + fetch_json( "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75", "properties.timeseries.0.data.instant.details.air_temperature") + " C") # screen shows: Oslo: 14.9 C (and the call returns that string) ``` ## Network & JSON `fetch(url)` + `json_get(text, path)` let a program pull live data and read one field out of it. `fetch` is a mediated capability — the host/device decides which hosts are reachable (an allowlist), so a program can't reach arbitrary URLs. ``` # Host CLI (allow the host explicitly): pyspell run oslo_temp.py --allow-host api.met.no # where oslo_temp.py is: json_get( fetch("https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75"), "properties.timeseries.0.data.instant.details.air_temperature") # → 14.9 ``` **Memory note (device):** `json_get` is path-directed so it never builds the whole document in RAM — it materializes only the matched value. On the ESP32 (≈60 kB free, no PSRAM) reading a field out of a large response is feasible because `fetch_json` *streams* the HTTP(S) body and stops the moment the field is found (freeing the TLS buffers early) — so a ~50 kB yr.no response never has to fit in RAM at once. ``` # On the ESP32, over Tailscale (single process; ≈60 kB free; verified live): fetch_json( "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75", "properties.timeseries.0.data.instant.details.air_temperature") # → 14.9 (the dongle fetched yr.no itself) ``` ## Running on the host ``` # Evaluate, binding free variables: cargo run -p pyspell-cli -- run examples/health.py --set free_heap=120000 --set uptime_ms=45000 # → true # Compile to a portable IR blob: cargo run -p pyspell-cli -- compile examples/health.py # → examples/health.py.psb # Push live to a device over USB-serial, or an interactive REPL: cargo run -p pyspell-cli -- repl --port /dev/cu.usbmodem2101 --lang python ``` ## Running on the ESP32 The portable evaluator (`pyspell-core` , `no_std + alloc` ) runs unchanged on the ESP32-S3. Programs read live device variables from the environment: | Variable | Meaning | |---|---| `free_heap` | free heap, bytes | `min_free_heap` | lowest free heap seen since boot, bytes | `uptime_ms` | milliseconds since boot | `uptime_s` | seconds since boot | ### Demo: PySpell over Tailscale The `demo/esp32-tailscale-pyspell` firmware adds a web text window and a `/run` API *inside a Tailscale tunnel* — open the device's Tailscale IP in a browser, type an expression, set a timeout, and run it on the chip. PySpell adds only ~62 kB on top of the networking firmware. ``` # Web window: open http://100.x.y.z/ # POST (preferred): program in the body, lang/timeout in the query. # More room for code than a URL, and no percent-encoding. curl -X POST 'http://100.x.y.z/run?lang=py&timeout=10' --data 'free_heap > 100000' # → true curl -X POST 'http://100.x.y.z/run?lang=rs&timeout=10' --data 'uptime_ms / 1000' # → 22 # GET (also supported): code is URL-encoded in the query. curl 'http://100.x.y.z/run?lang=py&timeout=10&code=free_heap%20%3E%20100000' # → true ``` `timeout` is in seconds, clamped to 1–60, and enforced as a real wall-clock deadline on the device. The single request must fit one TCP segment (≈1.2 kB) — POST leaves more of that for code. ### Response format The reply is `text/plain` (no JSON wrapper): | Outcome | Body | |---|---| | Success | the raw value — `true` /`false` , an integer, a float, or a list like `[1, 2, 3]` | | Failure | a line starting with `error:` — e.g. `error: parse error: unexpected end of input` , `error: unknown name `foo`` , or `error: program exceeded its time limit` | ## How it fits in 512 kB The ESP32-S3 has **512 kB of SRAM and no PSRAM**, yet it runs a full Tailscale node (control plane *and* DERP), the PySpell evaluator, a browser agent IDE served off the chip, a native MCP server, and TLS to api.met.no. That only fits because of a long chain of memory tricks. **Honest headline.** The "~260 kB free" you see between requests is a calm-moment reading. The number that matters is the **worst-case peak free heap: ≈60 kB**, measured during a TLS fetch with the Tailscale control session live. Every trick below keeps transient spikes under that ceiling — and the blunt consequence is that an 8-way parallel pool and full Tailscale *don't*coexist on the esp-idf stack; cheap parallelism waits for the lean pure-Rust stack. ### Crypto & TLS **SPKI leaf-key pinning** instead of CA-chain validation — one RSA-PSS verify, no 6 kB chain buffer (a TLS fetch drops ~45→30 kB). A **heap admission gate** bounds concurrency so peak heap is `K × per-fetch` , never `N × per-fetch` . ### Stream, don't buffer The netmap is read with `serde_json::from_reader` over the HTTP/2 frames, so serde **skips the huge DERPMap field** instead of buffering it (~60 kB → one 4 kB chunk). `fetch_json` stops the moment the value is found, and raw **byte-scans** replace JSON DOM trees. ### Pages from flash Static content lives in flash as `&'static str` (zero heap) and is streamed out as **512-byte TCP segments** — only the current segment is ever in RAM, so the 4.3 kB agent IDE serves without a full-page buffer. ### Allocator & sockets Heap and stack share one DRAM pool (**+16 kB heap = −16 kB stack**), tuned by hand. `SO_LINGER=0` frees lwIP sockets immediately (no TIME_WAIT pile-up), and a **cooperative shared stack** on the lean build makes parallelism cheap where per-thread stacks can't. The full catalog — every trick with the exact file and symbol — is in [ docs/memory-512kb.md](https://github.com/punnerud/pyspell/blob/main/docs/memory-512kb.md). ## Sandbox & limits **Deny-by-default grammar.** Only the whitelisted expression nodes and the built-ins above exist — no loops, functions, recursion, attribute access, imports, strings, or I/O.**Instruction budget.** Every evaluation has a step limit (runaway guard).**Wall-clock timeout.** A caller can supply a deadline (e.g. 10 s); on the device the ESP timer enforces it.**Parser stays small.** The on-device parser accepts only the safe subset, so the device's attack surface is just a bounded decoder + evaluator.