Zo Computer gives you an AI agent, a skills registry, a compute pool, browser automation, file hosting, scheduled automations, and persistent memory — all on a personal server. I wanted to understand every seam, so I rebuilt the whole thing in vanilla Python 3 with no web framework and no Docker. The result is ZoClone: 10 modules, 775 lines, 4 SQLite tables, one ThreadPoolExecutor. This is what the architecture actually looks like when you strip out the platform.
The main module is ZoClone.__init__
— and that's the entire dependency graph. Each subsystem is an attribute:
class ZoClone:
def __init__(self):
self.db = init_db()
self.executor = ThreadPoolExecutor(max_workers=10)
self.ai_client = None
self.pool = pool # ComputePool singleton
self.hosting = hosting # HostingService singleton
self.memory = memory # SQLite-backed memory
self.scheduler = scheduler # cron-like automations
No DI container, no event bus, no message queue. Every tool is a method on the same object. If you're coming from a microservice background, this is going to look like a 2014 Django app — and that's the point. When you can fit the whole mental model on one screen, you stop second-guessing where a bug lives.
Four tables. No ORM. No migrations. The schema is in a single executescript
block:
CREATE TABLE IF NOT EXISTS conversations(id TEXT PRIMARY KEY, title TEXT, updated_at INTEGER);
CREATE TABLE IF NOT EXISTS messages(id TEXT PRIMARY KEY, conv_id TEXT, role TEXT, content TEXT, tools TEXT, created_at INTEGER);
CREATE TABLE IF NOT EXISTS memory(id TEXT PRIMARY KEY, key TEXT UNIQUE, value TEXT, updated_at INTEGER);
CREATE TABLE IF NOT EXISTS files(id TEXT PRIMARY KEY, path TEXT UNIQUE, content TEXT, encoding TEXT, updated_at INTEGER);
IDs are SHA-256 hashes of (timestamp, content)
truncated to 24 chars. The tools
column on messages
is a freeform JSON blob. The memory
table is a key-value store with UNIQUE
on key
, which forces last-write-wins semantics. When your entire data model is four tables, schema design becomes a five-minute conversation instead of a five-day one.
Skills in Zo are a folder with a SKILL.md
(frontmatter) and a scripts/<name>.py
(handler). I auto-discover them at import time:
def load_skill(name: str, path: Path) -> Skill:
md_content = path.read_text()
frontmatter = {}
if md_content.startswith("---"):
end = md_content.find("---", 3)
for line in md_content[3:end].strip().split("\n"):
if ":" in line:
k, v = line.split(":", 1)
frontmatter[k.strip()] = v.strip()
py_file = path.parent / "scripts" / f"{name}.py"
spec = importlib.util.spec_from_file_location(name, py_file)
module = importlib.util.module_from_spec(spec)
spec..exec_module(module)
handler = getattr(module, "run", getattr(module, "execute", None))
return Skill(name=name, description=..., triggers=..., handler=handler)
No registry service, no API call to discover skills. The filesystem is the registry. Drop a folder, restart, it's loaded. The triggers
field in frontmatter is just a comma-separated string — the LLM gets all skill descriptions in its system prompt and decides which one to call. There's no embedding-based retrieval because, at 30 skills, exact-match triggers work fine.
The peer-to-peer compute mesh in ZoClone is a dict of jobs, a dict of nodes, and one threading.Lock
:
def assign_job(self, node_id: str) -> Optional[Dict]:
with self.lock:
pending = [j for j in self.jobs.values() if j["status"] == "pending"]
if not pending:
return None
pending.sort(key=lambda x: -x["priority"])
job = pending[0]
job["status"] = "assigned"
job["assigned_node"] = node_id
return job
That's it. The hub polls, picks the highest-priority pending job, marks the node busy, returns the work. No Redis Streams, no RabbitMQ, no Kafka. The trade-off is obvious: this is a single-process orchestrator, not a horizontally-scalable scheduler. But for a 50-node grid running nightly ML batch jobs, you don't need Kafka. You need a lock and a sort.
GPU tier multipliers, regional pricing, and reputation decay are all JSON columns in the nodes
dict. When you need to add a new pricing rule, you change one line of assign_job
. Compare that to a Kubernetes operator with custom resource definitions, admission webhooks, and reconciler loops.
Zo has a /zo/ask
API that spawns child agent invocations. The clone just calls it:
async def spawn(self, agent_id: str, prompt: str):
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.zo.computer/zo/ask",
headers={"authorization": self.api_token, "content-type": "application/json"},
json={"input": prompt, "model_name": self.model}
) as resp:
return {"agent_id": agent_id, "output": (await resp.json()).get("output", "")}
async def spawn_all(self, agents: list):
return await asyncio.gather(*[self.spawn(a["id"], a["prompt"]) for a in agents])
Five agent invocations in parallel is asyncio.gather
. No Celery, no RQ, no Dask. The model_name
is hardcoded — there's exactly one LLM driver, and it's whatever Zo gives you. If you want a different model, change one string.
run_command
is subprocess.run(cmd, shell=True)
. The agent can rm -rf ~
and it will. Production Zo wraps this in gVisor; I don't.LIKE '%query%'
scan. Fine at 1k rows, embarrassing at 100k.chat()
call is blocking. You see the full response or nothing.set_key()
writes API keys to a flat JSON file in ~/.zoclone/
. Multi-user means multi-disaster.if __name__ == "__main__"
block that prints the pool status.run_command
in a gVisor container, or at minimum a chroot + seccomp.memory
table for SQLite-vec0 and do real semantic recall.Authorization
header check on every API endpoint. Even internal services.The real lesson wasn't "look how short the code is" — it was "look how much of the platform is just a thin layer over a database, a thread pool, and a few HTTP calls." The parts that are genuinely hard (the LLM orchestration loop, the skill discovery) are maybe 100 lines. The rest is plumbing, and most of the plumbing doesn't need to exist.
Repo: github.com/AmSach/ZoClone
License: MIT
Stack: Python 3.10+, SQLite, requests, aiohttp, no web framework
If you've built a personal-AI clone of your own, drop the repo link in the comments. I want to see how other people split the agent loop from the storage layer.