REPL Is All You Need.A 19-year-old's proposal for an open standard that lets AI Agent control hardware directly.
AI agents can write code, search the web, deploy servers, and manage databases. They're incredibly capable β inside a container.
But ask your favorite AI to flip a relay, read a temperature sensor, or scan an I2C bus β and it can't. Not because it doesn't know how. It knows exactly what machine.Pin(5, Pin.OUT)
does. It just has nowhere to run that code.
AI agents lack a physical execution terminal.
CLI tools gave LLMs hands in the digital world (df -h
, pip install
, docker compose up
). Embedded systems can give them hands in the real world (GPIO.on()
, ADC.read()
, I2C.scan()
). But nobody has defined a standard for how agents should talk to hardware.
That's what PAL is.
Espressif's "Chat Coding" framework puts a full AI agent on an ESP32-S3 β ReAct loop, LLM calls, tool registry, IM channels (Telegram, WeChat), Event Router. It's impressive engineering. It's also the wrong architecture.
A lighter ESP32 agent. Same architecture, GPIO bugs confirmed. Toy-grade.
Rock solid. But adding a new operation means: write C β compile β flash β reboot. 10 minutes minimum. Agents can't iterate at that speed.
Python libraries everywhere, but 30-second boot, 5W power draw, $35+ cost, no hard real-time. Overkill for controlling a relay.
Here's what everyone missed: Python's REPL and an AI agent's interaction model are isomorphic.
REPL loop: Agent loop:
>>> type code receive task
execute reason β generate code
see output send code to REPL
>>> type next observe result β adjust
You don't need a tool registry. You don't need JSON schema. You don't need a Skill Registry. The REPL is the world's simplest IPC.
And Python? It's the language LLMs generate best β 25% of GitHub public repos, versus <0.5% for Lua. Claude has seen millions of machine.Pin()
calls in training data. It knows how to write this code.
ββββββββββββββββββββββββββββββββββββββββ
β Cloud Agent (AstrBot) β β Reasoning, planning
ββββββββββββββββ¬ββββββββββββββββββββββββ
β WebSocket JSON
ββββββββββββββββΌββββββββββββββββββββββββ
β ESP32-S3 PAL Terminal β
β β
β Core 0 (C, FreeRTOS, NEVER CHANGES): β β Hard real-time
β Β· SPI/I2C/UART drivers β
β Β· Hardware watchdog β
β Β· WiFi auto-reconnect β
β Β· Pin ownership table β
β β
β Core 1 (MicroPython, ANYTHING GOES): β β Agent playground
β Β· WebSocket β JSON β Python exec β
β Β· machine module β hardware β
β Β· uasyncio β concurrent tasks β
β Β· Crash? Core 0 restarts you. β
ββββββββββββββββββββββββββββββββββββββββ
Core 0 is the brake pedal. It never changes. Core 1 is the steering wheel. The agent can grip it however it wants.
If the agent writes an infinite loop? Core 0 detects heartbeat timeout β restarts Core 1 VM. If the agent tries to access system pins? machine.Pin()
returns OSError
. If the agent crashes? Core 0 keeps running. Physical control link never breaks.
No tool schemas. No pre-registration. Just Python code over WebSocket:
Agent β Terminal:
{
"version": "1",
"id": "msg_001",
"type": "exec",
"code": "from machine import Pin; Pin(5, Pin.OUT).on()",
"timeout_ms": 10000
}
Terminal β Agent:
{
"version": "1",
"id": "msg_001",
"type": "result",
"stdout": "",
"stderr": "",
"error": false,
"exec_time_ms": 12
}
That's it. One round-trip over WiFi: <10ms. Python execution: <1ms.
Physical execution needs determinism, low latency, 24/7 stability. AI reasoning needs elastic compute, large memory, frequent iteration. Forcing both onto one MCU is asking one chip to do two contradictory things.
A single cloud agent can manage dozens of PAL terminals across a factory floor. Agent-on-Device requires one agent instance per node β no global perspective.
Cloud agent misbehaving? Cut the WebSocket. It's over. Agent-on-Device misbehaving on an MCU? You wait for the hardware watchdog to trigger β that's your only recovery mechanism.
Hermes-style skill accumulation, vector databases, MCP tool chains, SQLite persistence β all mature AI infrastructure. None of it needs to be squeezed into 8MB of PSRAM.
PAL IS:
PAL IS NOT:
PAL is a draft specification (v0.1). I'm a freshman at Anhui University of Science and Technology. The reference implementation (ESP32-S3 Core 0/1) is under development.
Note: I'm currently preparing for exams and still learning how to express technical ideas fluently in English, so I used Claude to help draft and polish this post. All ideas, architecture, and the PAL specification itself are my own work.
This spec is open for discussion. I'm looking for:
"The best way to predict the future is to define it."
β PAL v0.1, 2026