I kept trying to make my AI assistant smarter by adding more tools to the same loop.
That worked for a while. Then the assistant had to do normal user things: continue a Codex task from chat, answer a status question from DingTalk, remember how a desktop workflow succeeded, wait behind another run that was using the mouse, and still route Claude Code traffic through the same localhost server.
At that point the problem was no longer "how many tools can one agent call?"
The problem was architecture.
The first shape was simple:
user message -> assistant loop -> tools -> answer
That is fine for a demo. It is not fine for a resident assistant.
A resident assistant has to know whether a message is a new task, a follow-up, a status check, a correction, or a cancellation. It has to avoid stealing the desktop from another running task. It has to remember procedures without shoving every old transcript into context. It has to delegate coding work to Codex or Claude Code without pretending it is the executor.
Those are different jobs. When I kept them inside one loop, every fix made the loop more capable and less understandable.
So I stopped thinking about the assistant as one agent and started treating it as a local control plane.
In CliGate, the architecture now looks more like this:
Experience Plane
-> Assistant Control Plane
-> Runtime Execution Plane
-> Proxy / Model Access Plane
Observation Plane + Memory / Policy Plane sit across the side.
The names sound formal, but the boundaries are practical.
The experience plane owns where the user is talking from: dashboard chat, assistant tasks, Telegram, Feishu, DingTalk, scheduled jobs.
The assistant control plane decides what kind of work this is. Should it answer from state? Should it start a task? Should it continue an existing one? Should it wait because the desktop is already held by another run?
The runtime execution plane is where Codex and Claude Code live. They do the actual coding work. The assistant can dispatch, continue, summarize, and coordinate them, but it does not need to become a worse version of them.
The proxy/model access plane handles the boring but necessary provider work: protocol translation, account pools, API keys, routing, model mapping, request logs, and usage.
The side planes are what keep the assistant sane:
The biggest improvement came from making the assistant consume observations instead of raw logs.
If a Codex run is waiting for approval, the assistant should not read a giant transcript to rediscover that. It should see a compact fact:
"Task X is waiting for approval to run command Y."
If another assistant run is currently driving the desktop, a new run should not guess from chat history. It should see a resource holder:
"desktop is held by run R."
That one change made status questions, cancellation, follow-ups, and concurrent runs much less fragile. The assistant no longer has to infer the system state from the last few messages. The system gives it a state model.
I also learned that "remembering" is not the same as stuffing more chat history into a prompt.
For this assistant, memory is file-based and scoped. It can store a workflow, a fact, a standing directive, or a reference. On the next similar request, the prompt only gets a small memory index. If the assistant thinks one entry matters, it explicitly recalls the body.
That keeps the default context small while still letting the assistant learn things like:
For procedure memories, the rule is verify-then-trust. Try the remembered steps, but confirm the UI still matches. If it changed, explore again and update the memory after success.
That is closer to how I want a practical assistant to evolve: not by growing a huge transcript, but by distilling successful work into reusable units.
Local AI tooling is messy in a specific way.
The user may have Claude Code, Codex CLI, Gemini CLI, OpenClaw, a browser session, a desktop app, a Telegram channel, and several provider accounts. The hard part is not only making one model call. The hard part is keeping all of those pieces coordinated without turning the assistant into an opaque supervisor that hijacks every message.
That is why CliGate still keeps a direct runtime path. If the user is already talking to a Codex session, the message can go straight there. The assistant control plane is for explicit coordination, background tasks, memory, policy, desktop work, and cross-channel workflows.
The split is not glamorous, but it is the difference between an impressive demo and a tool I can leave running.
I used to ask: how do I make the assistant loop smarter?
Now I ask: which plane should own this responsibility?
That question has prevented a lot of accidental complexity. It keeps provider routing out of the assistant loop, execution inside dedicated runtimes, observations out of raw logs, and memory out of unbounded chat history.
The project is open source here: CliGate.
If you are building agents around existing tools, are you putting everything inside one loop, or are you starting to split control, execution, observation, and memory too?