PAL: Giving AI Agents Hands in the Physical World

A developer proposed PAL, an open standard that gives AI agents direct control over physical hardware via a Python REPL running on an ESP32-S3. The architecture splits real-time safety-critical operations on Core 0 from agent-driven Python execution on Core 1, enabling sub-10ms round-trip hardware control without tool schemas or pre-registration.

REPL Is All You Need.A 19-year-old's proposal for an open standard that lets AI Agent control hardware directly. AI agents can write code, search the web, deploy servers, and manage databases. They're incredibly capable — inside a container. But ask your favorite AI to flip a relay , read a temperature sensor , or scan an I2C bus — and it can't. Not because it doesn't know how. It knows exactly what machine.Pin 5, Pin.OUT does. It just has nowhere to run that code. AI agents lack a physical execution terminal. CLI tools gave LLMs hands in the digital world df -h , pip install , docker compose up . Embedded systems can give them hands in the real world GPIO.on , ADC.read , I2C.scan . But nobody has defined a standard for how agents should talk to hardware. That's what PAL is. Espressif's "Chat Coding" framework puts a full AI agent on an ESP32-S3 — ReAct loop, LLM calls, tool registry, IM channels Telegram, WeChat , Event Router. It's impressive engineering. It's also the wrong architecture. A lighter ESP32 agent. Same architecture, GPIO bugs confirmed. Toy-grade. Rock solid. But adding a new operation means: write C → compile → flash → reboot. 10 minutes minimum. Agents can't iterate at that speed. Python libraries everywhere, but 30-second boot, 5W power draw, $35+ cost, no hard real-time. Overkill for controlling a relay. Here's what everyone missed: Python's REPL and an AI agent's interaction model are isomorphic. REPL loop: Agent loop: type code receive task execute reason → generate code see output send code to REPL type next observe result → adjust You don't need a tool registry. You don't need JSON schema. You don't need a Skill Registry. The REPL is the world's simplest IPC. And Python? It's the language LLMs generate best — 25% of GitHub public repos, versus <0.5% for Lua. Claude has seen millions of machine.Pin calls in training data. It knows how to write this code. ┌──────────────────────────────────────┐ │ Cloud Agent AstrBot │ ← Reasoning, planning └──────────────┬───────────────────────┘ │ WebSocket JSON ┌──────────────▼───────────────────────┐ │ ESP32-S3 PAL Terminal │ │ │ │ Core 0 C, FreeRTOS, NEVER CHANGES : │ ← Hard real-time │ · SPI/I2C/UART drivers │ │ · Hardware watchdog │ │ · WiFi auto-reconnect │ │ · Pin ownership table │ │ │ │ Core 1 MicroPython, ANYTHING GOES : │ ← Agent playground │ · WebSocket → JSON → Python exec │ │ · machine module → hardware │ │ · uasyncio → concurrent tasks │ │ · Crash? Core 0 restarts you. │ └──────────────────────────────────────┘ Core 0 is the brake pedal. It never changes. Core 1 is the steering wheel. The agent can grip it however it wants. If the agent writes an infinite loop? Core 0 detects heartbeat timeout → restarts Core 1 VM. If the agent tries to access system pins? machine.Pin returns OSError . If the agent crashes? Core 0 keeps running. Physical control link never breaks. No tool schemas. No pre-registration. Just Python code over WebSocket: Agent → Terminal: { "version": "1", "id": "msg 001", "type": "exec", "code": "from machine import Pin; Pin 5, Pin.OUT .on ", "timeout ms": 10000 } Terminal → Agent: { "version": "1", "id": "msg 001", "type": "result", "stdout": "", "stderr": "", "error": false, "exec time ms": 12 } That's it. One round-trip over WiFi: <10ms. Python execution: <1ms. Physical execution needs determinism, low latency, 24/7 stability. AI reasoning needs elastic compute, large memory, frequent iteration. Forcing both onto one MCU is asking one chip to do two contradictory things. A single cloud agent can manage dozens of PAL terminals across a factory floor. Agent-on-Device requires one agent instance per node — no global perspective. Cloud agent misbehaving? Cut the WebSocket. It's over. Agent-on-Device misbehaving on an MCU? You wait for the hardware watchdog to trigger — that's your only recovery mechanism. Hermes-style skill accumulation, vector databases, MCP tool chains, SQLite persistence — all mature AI infrastructure. None of it needs to be squeezed into 8MB of PSRAM. PAL IS: PAL IS NOT: PAL is a draft specification v0.1 . I'm a freshman at Anhui University of Science and Technology. The reference implementation ESP32-S3 Core 0/1 is under development. Note: I'm currently preparing for exams and still learning how to express technical ideas fluently in English, so I used Claude to help draft and polish this post. All ideas, architecture, and the PAL specification itself are my own work. This spec is open for discussion. I'm looking for: "The best way to predict the future is to define it." — PAL v0.1, 2026