cd /news/artificial-intelligence/show-hn-esp32-512kb-tailscale-englis… · home topics artificial-intelligence article
[ARTICLE · art-31769] src=punnerud.github.io ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Show HN: ESP32 512kB – Tailscale, English to Python LLM and 8 containers local

A developer released PySpell, a sandboxed evaluator for ESP32 microcontrollers that runs a safe subset of Python and Rust, enabling local execution of code generated by a tiny 0.45M-parameter language model served offline from the device. The project demonstrates inverted edge inference where the constrained device serves the model and browser runs it, aiming toward lightweight micro-containers for tiny devices.

read11 min views3 publishedJun 17, 2026

What it is #

A PySpell program is a single expression (Python) or some let

bindings followed by a trailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free identifiers are resolved at evaluation time against a host-supplied environment: CLI variables on a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted fetch_json

; there are no loops, functions, or imports — that is the point: small, fast, and safe to accept from elsewhere.

"Micro-containers" — the direction, honestly stated. The aim is lightweight, pushable units of code on tiny devices. Today it's a

sandboxed evaluator, not OS containers: the sandbox is at the

languagelevel (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust

subset— not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision.

Two ways to compile. On the host, full-fidelity front-ends use

syn

(Rust) and rustpython-parser

(Python). For "type code in a browser and run it on the chip", a tiny hand-written parser (a few kB, no_std

) builds the same AST on the device. Either way: source → AST → evaluate.

An offline AI coding agent, served off the chip #

Open http://<dongle>/

over the tunnel and you get a Cursor-like agent. Type "flash the light", "show the text "hello"", "what is 7 plus 5", or "reverse the word robot" — a ~0.45 M-parameter language model (< 500 kB, int8) turns it into PySpell code, runs it live on the chip, and shows the result, or the physical action (the screen lights up, the RGB LED blinks). Runtime, model, tokenizer and dictionary are all served from the dongle, offline — no cloud, no key (OpenAI is optional, behind the ⚙).

A model that small is only useful because of a chain of tricks — the full write-up is in tech.md. The headlines:

The model points, the browser copies

A 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked to. It emits tiny semantic directives; the browser copies the literal content verbatim. calculate 3 + 2

print(

; 3 + 2)``` change add to subtract


→ `@@ + ==> -`

. Quoted text is literal content — copied byte-for-byte,
excluded from vocab checks.

### The device serves; the browser computes

Inference runs in WebAssembly, client-side. The 0.5 MB model image streams **off flash a
TCP segment at a time** (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted
edge inference: the constrained device serves and grades, the browser runs the model.

### Frozen embeddings, distilled

The 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a
part-of-speech vector, and **frozen** — the tiny model starts with meaningful word
geometry instead of spending its tiny budget learning it.

### The vocabulary is the dictionary

Those same 512 tokens + embeddings are served back to the browser for input validation ("outside the model's vocabulary…") and related-word RAG over the model's own vocabulary.

**Retrain it for your language.** The pipeline is small and template-driven: translate the instruction phrasings (an LLM does this well), swap the embedding model for a multilingual one, re-curate and train, then flash. Full guide in

[tech.md](https://github.com/punnerud/pyspell/blob/main/tech.md).

## Syntax at a glance

### Python

free_heap > 100000 and uptime_s < 60 250 if distance > 1000 else 0 0 < temp < 60 # chained 20 not in peers sum([1, 2, 3]) readings[-1] # negative index max(a, b)


### Rust

``` js
free_heap > 100000 && uptime_s < 60
if distance > 1000 { 250 } else { 0 }
let used = total - free; used * 100 / total
!peers.contains(20)
sum([1, 2, 3])
readings[readings.len() - 1]
max(a, b)

Language reference #

Literals & values

Kind Examples Notes
Integer 0 , 42 , -7 64-bit signed
Float 1.5 , 3.14 64-bit
Boolean true /True , false /False both spellings accepted
String "hello" , 'oslo' + concatenates; == /< compare; len() counts chars
List [1, 2, 3] elements are values

Operators

Group Python Rust Notes
Arithmetic + - * / % (and // ) on integers, / and // both truncate toward zero; a float operand promotes to float division. There is no separate float floor-div.
Comparison == != < <= > >= Python allows chaining (a < b < c )
Boolean and , or , not && , `
Unary -x , not x / !x
Membership x in list , x not in list list.contains(x) numeric equality
Index list[i] negative indexing supported

Control flow & bindings

Feature Python Rust
Conditional a if cond else b if cond { a } else { b } (else required)
Local bindings (single expression only) let x = e; let y = e2; final_expr
Free variables any bare name not bound by let is read from the host environment

Built-in functions

Function Result Description
len(list) int number of elements
abs(x) number absolute value
min(list) / min(a, b, …) number minimum
max(list) / max(a, b, …) number maximum
sum(list) number sum of a numeric list
any(list) bool true if any element is truthy
all(list) bool true if all elements are truthy
round(x) int round to nearest integer
int(x) int truncate toward zero
float(x) float convert to float
bool(x) bool truthiness
index(list, x) int position of first x , or -1
before(list, a, b) bool true if a occurs before b
first(list) value first element, or -1 if empty
last(list) value last element, or -1 if empty
str(x) string string representation of a value
json_get(text, "a.b.0.c") scalar extract the scalar at a dotted/indexed JSON path (no full parse — only the matched value is materialized)
fetch(url) string HTTP(S) GET body. Gated by a host allowlist; errors if the host isn't allowed or no network capability is present
fetch_json(url, "a.b.0.c") scalar stream the response and extract just the scalar at the path, stopping as soon as it's found — never buffers the whole body. Preferred on the device.
show(x) x render x to text and display it (the ESP32 screen; stdout on host), returning x so it composes. Device gates it via config (allow on/off, auto-revert seconds).

Classic one-liner — fetch a value and show it on the dongle's screen:

show("Oslo: " + fetch_json(
  "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
  "properties.timeseries.0.data.instant.details.air_temperature") + " C")

Network & JSON #

fetch(url)

  • json_get(text, path)

let a program pull live data and read one field out of it. fetch

is a mediated capability — the host/device decides which hosts are reachable (an allowlist), so a program can't reach arbitrary URLs.

pyspell run oslo_temp.py --allow-host api.met.no
json_get(
  fetch("https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75"),
  "properties.timeseries.0.data.instant.details.air_temperature")

Memory note (device): json_get

is path-directed so it never builds the whole document in RAM — it materializes only the matched value. On the ESP32 (≈60 kB free, no PSRAM) reading a field out of a large response is feasible because fetch_json

streams the HTTP(S) body and stops the moment the field is found (freeing the TLS buffers early) — so a ~50 kB yr.no response never has to fit in RAM at once.

fetch_json(
  "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.91&lon=10.75",
  "properties.timeseries.0.data.instant.details.air_temperature")

Running on the host #

cargo run -p pyspell-cli -- run examples/health.py --set free_heap=120000 --set uptime_ms=45000

cargo run -p pyspell-cli -- compile examples/health.py    # → examples/health.py.psb

cargo run -p pyspell-cli -- repl --port /dev/cu.usbmodem2101 --lang python

Running on the ESP32 #

The portable evaluator (pyspell-core

, no_std + alloc

) runs unchanged on the ESP32-S3. Programs read live device variables from the environment:

Variable Meaning
free_heap free heap, bytes
min_free_heap lowest free heap seen since boot, bytes
uptime_ms milliseconds since boot
uptime_s seconds since boot

Demo: PySpell over Tailscale

The demo/esp32-tailscale-pyspell

firmware adds a web text window and a /run

API inside a Tailscale tunnel — open the device's Tailscale IP in a browser, type an expression, set a timeout, and run it on the chip. PySpell adds only ~62 kB on top of the networking firmware.

open http://100.x.y.z/

curl -X POST 'http://100.x.y.z/run?lang=py&timeout=10' --data 'free_heap > 100000'   # → true
curl -X POST 'http://100.x.y.z/run?lang=rs&timeout=10' --data 'uptime_ms / 1000'       # → 22

curl 'http://100.x.y.z/run?lang=py&timeout=10&code=free_heap%20%3E%20100000'   # → true

timeout

is in seconds, clamped to 1–60, and enforced as a real wall-clock deadline on the device. The single request must fit one TCP segment (≈1.2 kB) — POST leaves more of that for code.

Response format

The reply is text/plain

(no JSON wrapper):

Outcome Body
Success the raw value — true /false , an integer, a float, or a list like [1, 2, 3]
Failure a line starting with error: — e.g. error: parse error: unexpected end of input , error: unknown name foo`` , or error: program exceeded its time limit

How it fits in 512 kB #

The ESP32-S3 has 512 kB of SRAM and no PSRAM, yet it runs a full Tailscale node (control plane and DERP), the PySpell evaluator, a browser agent IDE served off the chip, a native MCP server, and TLS to api.met.no. That only fits because of a long chain of memory tricks.

Honest headline. The "~260 kB free" you see between requests is a calm-moment reading. The number that matters is the

worst-case peak free heap: ≈60 kB, measured during a TLS fetch with the Tailscale control session live. Every trick below keeps transient spikes under that ceiling — and the blunt consequence is that an 8-way parallel pool and full Tailscale

don'tcoexist on the esp-idf stack; cheap parallelism waits for the lean pure-Rust stack.

Crypto & TLS

SPKI leaf-key pinning instead of CA-chain validation — one RSA-PSS verify, no 6 kB chain buffer (a TLS fetch drops ~45→30 kB). A heap admission gate bounds concurrency so peak heap is K × per-fetch

, never N × per-fetch

.

Stream, don't buffer

The netmap is read with serde_json::from_reader

over the HTTP/2 frames, so serde skips the huge DERPMap field instead of buffering it (~60 kB → one 4 kB chunk). fetch_json

stops the moment the value is found, and raw byte-scans replace JSON DOM trees.

Pages from flash

Static content lives in flash as &'static str

(zero heap) and is streamed out as 512-byte TCP segments — only the current segment is ever in RAM, so the 4.3 kB agent IDE serves without a full-page buffer.

Allocator & sockets

Heap and stack share one DRAM pool (+16 kB heap = −16 kB stack), tuned by hand. SO_LINGER=0

frees lwIP sockets immediately (no TIME_WAIT pile-up), and a cooperative shared stack on the lean build makes parallelism cheap where per-thread stacks can't.

The full catalog — every trick with the exact file and symbol — is in docs/memory-512kb.md.

Sandbox & limits #

Deny-by-default grammar. Only the whitelisted expression nodes and the built-ins above exist — no loops, functions, recursion, attribute access, imports, strings, or I/O.Instruction budget. Every evaluation has a step limit (runaway guard).Wall-clock timeout. A caller can supply a deadline (e.g. 10 s); on the device the ESP timer enforces it.Parser stays small. The on-device parser accepts only the safe subset, so the device's attack surface is just a bounded decoder + evaluator.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @esp32 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-esp32-512kb-…] indexed:0 read:11min 2026-06-17 ·