{"slug": "show-hn-esp32-512kb-tailscale-english-to-python-llm-and-8-containers-local", "title": "Show HN: ESP32 512kB – Tailscale, English to Python LLM and 8 containers local", "summary": "A developer released PySpell, a sandboxed evaluator for ESP32 microcontrollers that runs a safe subset of Python and Rust, enabling local execution of code generated by a tiny 0.45M-parameter language model served offline from the device. The project demonstrates inverted edge inference where the constrained device serves the model and browser runs it, aiming toward lightweight micro-containers for tiny devices.", "body_md": "## What it is\n\nA PySpell program is a single expression (Python) or some `let`\n\nbindings followed by a\ntrailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free\nidentifiers are resolved at evaluation time against a host-supplied *environment*: CLI variables\non a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted\n`fetch_json`\n\n; there are no loops, functions, or imports — that is the point: small, fast, and\nsafe to accept from elsewhere.\n\n**\"Micro-containers\" — the direction, honestly stated.** The aim is lightweight, pushable units of code on tiny devices. Today it's a\n\n*sandboxed evaluator*, not OS containers: the sandbox is at the\n\n*language*level (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust\n\n*subset*— not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision.\n\n**Two ways to compile.** On the host, full-fidelity front-ends use\n\n`syn`\n\n(Rust) and `rustpython-parser`\n\n(Python). For \"type code in a browser and run it on the\nchip\", a tiny hand-written parser (a few kB, `no_std`\n\n) builds the same AST on the device.\nEither way: source → AST → evaluate.\n## An offline AI coding agent, served off the chip\n\nOpen `http://<dongle>/`\n\nover the tunnel and you get a Cursor-like agent. Type\n*\"flash the light\"*, *\"show the text \"hello\"\"*, *\"what is 7 plus 5\"*,\nor *\"reverse the word robot\"* — a **~0.45 M-parameter language model (< 500 kB, int8)**\nturns it into PySpell code, **runs it live on the chip**, and shows the result, or the\nphysical action (the screen lights up, the RGB LED blinks). Runtime, model, tokenizer and dictionary\nare all served **from the dongle, offline** — no cloud, no key (OpenAI is optional, behind\nthe ⚙).\n\nA model that small is only useful because of a chain of tricks — the full write-up is in\n[ tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md). The headlines:\n\n### The model points, the browser copies\n\nA 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked\nto. It emits tiny *semantic* directives; the browser copies the literal content verbatim.\n`calculate 3 + 2`\n\n→ `print(`\n\n; **3 + 2**)```\nchange add to\nsubtract\n```\n\n→ `@@ + ==> -`\n\n. Quoted text is literal content — copied byte-for-byte,\nexcluded from vocab checks.\n\n### The device serves; the browser computes\n\nInference runs in WebAssembly, client-side. The 0.5 MB model image streams **off flash a\nTCP segment at a time** (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted\nedge inference: the constrained device serves and grades, the browser runs the model.\n\n### Frozen embeddings, distilled\n\nThe 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a\npart-of-speech vector, and **frozen** — the tiny model starts with meaningful word\ngeometry instead of spending its tiny budget learning it.\n\n### The vocabulary is the dictionary\n\nThose same 512 tokens + embeddings are served back to the browser for input validation (\"outside the model's vocabulary…\") and related-word RAG over the model's own vocabulary.\n\n**Retrain it for your language.** The pipeline is small and template-driven: translate the instruction phrasings (an LLM does this well), swap the embedding model for a multilingual one, re-curate and train, then flash. Full guide in\n\n[tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md).\n\n## Syntax at a glance\n\n### Python\n\n```\nfree_heap > 100000 and uptime_s < 60\n250 if distance > 1000 else 0\n0 < temp < 60          # chained\n20 not in peers\nsum([1, 2, 3])\nreadings[-1]           # negative index\nmax(a, b)\n```\n\n### Rust\n\n``` js\nfree_heap > 100000 && uptime_s < 60\nif distance > 1000 { 250 } else { 0 }\nlet used = total - free; used * 100 / total\n!peers.contains(20)\nsum([1, 2, 3])\nreadings[readings.len() - 1]\nmax(a, b)\n```\n\n## Language reference\n\n### Literals & values\n\n| Kind | Examples | Notes |\n|---|---|---|\n| Integer | `0` , `42` , `-7` | 64-bit signed |\n| Float | `1.5` , `3.14` | 64-bit |\n| Boolean | `true` /`True` , `false` /`False` | both spellings accepted |\n| String | `\"hello\"` , `'oslo'` | `+` concatenates; `==` /`<` compare; `len()` counts chars |\n| List | `[1, 2, 3]` | elements are values |\n\n### Operators\n\n| Group | Python | Rust | Notes |\n|---|---|---|---|\n| Arithmetic | `+ - * / %` (and `//` ) | on integers, `/` and `//` both truncate toward zero; a float operand promotes to float division. There is no separate float floor-div. | |\n| Comparison | `== != < <= > >=` | Python allows chaining (`a < b < c` ) | |\n| Boolean | `and` , `or` , `not` | `&&` , `||` , `!` | short-circuiting |\n| Unary | `-x` , `not x` / `!x` | ||\n| Membership | `x in list` , `x not in list` | `list.contains(x)` | numeric equality |\n| Index | `list[i]` | negative indexing supported |\n\n### Control flow & bindings\n\n| Feature | Python | Rust |\n|---|---|---|\n| Conditional | `a if cond else b` | `if cond { a } else { b }` (else required) |\n| Local bindings | (single expression only) | `let x = e; let y = e2; final_expr` |\n| Free variables | any bare name not bound by `let` is read from the host environment |\n\n### Built-in functions\n\n| Function | Result | Description |\n|---|---|---|\n`len(list)` | int | number of elements |\n`abs(x)` | number | absolute value |\n`min(list)` / `min(a, b, …)` | number | minimum |\n`max(list)` / `max(a, b, …)` | number | maximum |\n`sum(list)` | number | sum of a numeric list |\n`any(list)` | bool | true if any element is truthy |\n`all(list)` | bool | true if all elements are truthy |\n`round(x)` | int | round to nearest integer |\n`int(x)` | int | truncate toward zero |\n`float(x)` | float | convert to float |\n`bool(x)` | bool | truthiness |\n`index(list, x)` | int | position of first `x` , or `-1` |\n`before(list, a, b)` | bool | true if `a` occurs before `b` |\n`first(list)` | value | first element, or `-1` if empty |\n`last(list)` | value | last element, or `-1` if empty |\n`str(x)` | string | string representation of a value |\n`json_get(text, \"a.b.0.c\")` | scalar | extract the scalar at a dotted/indexed JSON path (no full parse — only the matched value is materialized) |\n`fetch(url)` | string | HTTP(S) GET body. Gated by a host allowlist; errors if the host isn't allowed or no network capability is present |\n`fetch_json(url, \"a.b.0.c\")` | scalar | stream the response and extract just the scalar at the path, stopping as soon as it's found — never buffers the whole body. Preferred on the device. |\n`show(x)` | x | render `x` to text and display it (the ESP32 screen; stdout on host), returning `x` so it composes. Device gates it via config (allow on/off, auto-revert seconds). |\n\nClassic one-liner — fetch a value and show it on the dongle's screen:\n\n```\nshow(\"Oslo: \" + fetch_json(\n  \"https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75\",\n  \"properties.timeseries.0.data.instant.details.air_temperature\") + \" C\")\n# screen shows:  Oslo: 14.9 C   (and the call returns that string)\n```\n\n## Network & JSON\n\n`fetch(url)`\n\n+ `json_get(text, path)`\n\nlet a program pull live data and read one\nfield out of it. `fetch`\n\nis a mediated capability — the host/device decides which hosts are\nreachable (an allowlist), so a program can't reach arbitrary URLs.\n\n```\n# Host CLI (allow the host explicitly):\npyspell run oslo_temp.py --allow-host api.met.no\n# where oslo_temp.py is:\njson_get(\n  fetch(\"https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75\"),\n  \"properties.timeseries.0.data.instant.details.air_temperature\")\n# → 14.9\n```\n\n**Memory note (device):** `json_get`\n\nis path-directed so it never builds the\nwhole document in RAM — it materializes only the matched value. On the ESP32 (≈60 kB free, no PSRAM)\nreading a field out of a large response is feasible because `fetch_json`\n\n*streams* the\nHTTP(S) body and stops the moment the field is found (freeing the TLS buffers early) — so a ~50 kB yr.no\nresponse never has to fit in RAM at once.\n\n```\n# On the ESP32, over Tailscale (single process; ≈60 kB free; verified live):\nfetch_json(\n  \"https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75\",\n  \"properties.timeseries.0.data.instant.details.air_temperature\")\n# → 14.9   (the dongle fetched yr.no itself)\n```\n\n## Running on the host\n\n```\n# Evaluate, binding free variables:\ncargo run -p pyspell-cli -- run examples/health.py --set free_heap=120000 --set uptime_ms=45000\n# → true\n\n# Compile to a portable IR blob:\ncargo run -p pyspell-cli -- compile examples/health.py    # → examples/health.py.psb\n\n# Push live to a device over USB-serial, or an interactive REPL:\ncargo run -p pyspell-cli -- repl --port /dev/cu.usbmodem2101 --lang python\n```\n\n## Running on the ESP32\n\nThe portable evaluator (`pyspell-core`\n\n, `no_std + alloc`\n\n) runs unchanged on the\nESP32-S3. Programs read live device variables from the environment:\n\n| Variable | Meaning |\n|---|---|\n`free_heap` | free heap, bytes |\n`min_free_heap` | lowest free heap seen since boot, bytes |\n`uptime_ms` | milliseconds since boot |\n`uptime_s` | seconds since boot |\n\n### Demo: PySpell over Tailscale\n\nThe `demo/esp32-tailscale-pyspell`\n\nfirmware adds a web text window and a `/run`\n\nAPI *inside a Tailscale tunnel* — open the device's Tailscale IP in a browser, type an\nexpression, set a timeout, and run it on the chip. PySpell adds only ~62 kB on top of the networking\nfirmware.\n\n```\n# Web window:\nopen http://100.x.y.z/\n\n# POST (preferred): program in the body, lang/timeout in the query.\n# More room for code than a URL, and no percent-encoding.\ncurl -X POST 'http://100.x.y.z/run?lang=py&timeout=10' --data 'free_heap > 100000'   # → true\ncurl -X POST 'http://100.x.y.z/run?lang=rs&timeout=10' --data 'uptime_ms / 1000'       # → 22\n\n# GET (also supported): code is URL-encoded in the query.\ncurl 'http://100.x.y.z/run?lang=py&timeout=10&code=free_heap%20%3E%20100000'   # → true\n```\n\n`timeout`\n\nis in seconds, clamped to 1–60, and enforced as a real wall-clock deadline on\nthe device. The single request must fit one TCP segment (≈1.2 kB) — POST leaves more of that for code.\n\n### Response format\n\nThe reply is `text/plain`\n\n(no JSON wrapper):\n\n| Outcome | Body |\n|---|---|\n| Success | the raw value — `true` /`false` , an integer, a float, or a list like `[1, 2, 3]` |\n| Failure | a line starting with `error:` — e.g. `error: parse error: unexpected end of input` , `error: unknown name `foo`` , or `error: program exceeded its time limit` |\n\n## How it fits in 512 kB\n\nThe ESP32-S3 has **512 kB of SRAM and no PSRAM**, yet it runs a full Tailscale node\n(control plane *and* DERP), the PySpell evaluator, a browser agent IDE served off the chip, a\nnative MCP server, and TLS to api.met.no. That only fits because of a long chain of memory tricks.\n\n**Honest headline.** The \"~260 kB free\" you see between requests is a calm-moment reading. The number that matters is the\n\n**worst-case peak free heap: ≈60 kB**, measured during a TLS fetch with the Tailscale control session live. Every trick below keeps transient spikes under that ceiling — and the blunt consequence is that an 8-way parallel pool and full Tailscale\n\n*don't*coexist on the esp-idf stack; cheap parallelism waits for the lean pure-Rust stack.\n\n### Crypto & TLS\n\n**SPKI leaf-key pinning** instead of CA-chain validation — one RSA-PSS verify, no 6 kB\nchain buffer (a TLS fetch drops ~45→30 kB). A **heap admission gate** bounds concurrency\nso peak heap is `K × per-fetch`\n\n, never `N × per-fetch`\n\n.\n\n### Stream, don't buffer\n\nThe netmap is read with `serde_json::from_reader`\n\nover the HTTP/2 frames, so serde\n**skips the huge DERPMap field** instead of buffering it (~60 kB → one 4 kB chunk).\n`fetch_json`\n\nstops the moment the value is found, and raw **byte-scans**\nreplace JSON DOM trees.\n\n### Pages from flash\n\nStatic content lives in flash as `&'static str`\n\n(zero heap) and is streamed out as\n**512-byte TCP segments** — only the current segment is ever in RAM, so the 4.3 kB agent\nIDE serves without a full-page buffer.\n\n### Allocator & sockets\n\nHeap and stack share one DRAM pool (**+16 kB heap = −16 kB stack**), tuned by hand.\n`SO_LINGER=0`\n\nfrees lwIP sockets immediately (no TIME_WAIT pile-up), and a\n**cooperative shared stack** on the lean build makes parallelism cheap where per-thread\nstacks can't.\n\nThe full catalog — every trick with the exact file and symbol — is in\n[ docs/memory-512kb.md](https://github.com/punnerud/pyspell/blob/main/docs/memory-512kb.md).\n\n## Sandbox & limits\n\n**Deny-by-default grammar.** Only the whitelisted expression nodes and the built-ins above exist — no loops, functions, recursion, attribute access, imports, strings, or I/O.**Instruction budget.** Every evaluation has a step limit (runaway guard).**Wall-clock timeout.** A caller can supply a deadline (e.g. 10 s); on the device the ESP timer enforces it.**Parser stays small.** The on-device parser accepts only the safe subset, so the device's attack surface is just a bounded decoder + evaluator.", "url": "https://wpnews.pro/news/show-hn-esp32-512kb-tailscale-english-to-python-llm-and-8-containers-local", "canonical_source": "https://punnerud.github.io/pyspell/", "published_at": "2026-06-17 20:58:08+00:00", "updated_at": "2026-06-17 21:24:09.259380+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-tools", "ai-infrastructure", "developer-tools"], "entities": ["ESP32", "Tailscale", "PySpell", "Rust", "Python", "WebAssembly", "all-MiniLM", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/show-hn-esp32-512kb-tailscale-english-to-python-llm-and-8-containers-local", "markdown": "https://wpnews.pro/news/show-hn-esp32-512kb-tailscale-english-to-python-llm-and-8-containers-local.md", "text": "https://wpnews.pro/news/show-hn-esp32-512kb-tailscale-english-to-python-llm-and-8-containers-local.txt", "jsonld": "https://wpnews.pro/news/show-hn-esp32-512kb-tailscale-english-to-python-llm-and-8-containers-local.jsonld"}}