How I turned my AI CLI into an autonomous agent with Playwright and Sub-agents 🚀

A developer transformed their AI CLI tool Codey from a simple LLM wrapper into a secure, persistent agent runtime with human-like browsing via Playwright, autonomous sub-agents, persistent terminals, and security hardening. Key upgrades include in-memory screenshot capture, self-healing browser dependencies, sub-agent delegation, persistent terminal sessions, and replacing eval() with AST parsing.

When I first built Codey, it was a simple CLI wrapper around an LLM with a few basic tools. It was great for small tasks, but as I started throwing harder problems at it, the limitations became obvious. It couldn't run dev servers without blocking the thread, it couldn't browse documentation, and honestly, raw eval calls were keeping me up at night. So, I tore down the foundation and did a massive platform rewrite. Today, I'm excited to share how Codey evolved from a simple script into a secure, persistent agent runtime. Here’s a deep dive into the technical upgrades. 🌐 1. Human-Like Browsing Playwright + Vision I wanted Codey to be able to read documentation, check GitHub issues, and visually debug UIs. I integrated a full Playwright-backed web tool. The Vision Bottleneck: Initially, to pass visual context to the model, the pipeline looked like this: Screenshot - Write PNG to disk - Read PNG - Base64 encode. This disk I/O was noticeably slow. I optimized it by capturing the screenshot directly into memory as bytes and encoding it on the fly. We completely removed the .codey screenshots/ temp directory. Self-Healing Dependencies: There's nothing worse than a tool failing because a user doesn't have Chromium installed. Now, if the browser launch fails, Codey catches the error, automatically runs playwright install chromium, and retries the launch in the background. Smart Prompting: If you drop a link like https://... into the terminal, the system dynamically injects the web tool into the prompt and immediately triggers web.navigate instead of asking you to paste the content. 🤖 2. Sub-Agents and Persistent Terminals This is where the architecture really shifted from "chatbot" to "agent runtime". The delegate Tool: Codey can now launch a completely autonomous sub-agent. This second agent gets its own tool loop, its own history, and its own context. It goes off to solve a sub-task and returns a summary to the main agent. Persistent Sessions terminal : Previously, if Codey ran a command, it would lose the process. I added start, send, peek, and stop actions. Now, Codey can start a Next.js dev server, leave it running in the background, peek at the logs, and continue writing code. Human-in-the-Loop ask : Sometimes the AI shouldn't guess. If Codey isn't sure which file to edit, it pauses execution and renders an interactive multiple-choice prompt in your terminal. 🛡️ 3. Security Hardening As Codey got smarter, it got more dangerous. I had to lock it down. Killing eval : Arbitrary code execution is a massive vulnerability. I stripped out raw eval for the calculator tool and replaced it with strict ast.parse validation. We now use a strict whitelist of safe operators, functions, and constants. Fixing Shell Injections: I moved away from raw shell execution and string concatenation. Before: git diff passed directly to the shell. After: Using subprocess.run ... combined with shlex.split for safe argument parsing. Path Traversal & Approval Gates: Added a strict assert within project check to create file, edit file, and read files so the agent can't randomly decide to read ../../../etc/passwd. I also added a CONFIRM SHELL=true environment flag that forces Codey to ask for human permission before running potentially destructive commands. 🧠 4. State Management & Developer Experience Finally, I overhauled how Codey remembers things. Multi-Session Workflow: Codey used to dump everything into one history.jsonl per project. Now, it generates separate session files and greets you with an interactive startup picker showing message counts and previews so you can resume yesterday's work or start fresh. Streaming & Context: Switched to token-by-token streaming for a snappy, ChatGPT-like feel. Added trim history and MAX TOOL ROUNDS to prevent infinite loops and runaway API costs. Wrapping up The patches transformed Codey from CLI + LLM + tools into a Persistent agent runtime + browser automation + subagents + project memory. Building this has been an incredible lesson in agent orchestration and Python CLI development. If you're interested in AI coding assistants, want to build your own, or just want to poke around the source code, check out the repo I'd love your feedback, bug reports, or pull requests we always need more tools . 👉 Check out Codey on GitHub: github.com/varad-13/codey Let me know what you think in the comments What tools should I add next?