"My AI Assistant Could Code, But It Couldn't Operate My Desktop"

CliGate, an open-source local AI gateway, has evolved into a "local control plane" for agent work that can operate desktop applications through the operating system's accessibility tree. The tool, which started as a local API gateway for AI coding tools like Claude Code and Codex CLI, now includes a desktop agent that uses UI Automation on Windows to find controls, set values, and invoke buttons without relying on fragile screenshot-based automation. The system also introduces a "skills" framework that packages reusable procedures—such as publishing to Dev.to or building spreadsheets—as local, inspectable packages that the assistant can load on demand.

Most AI coding agents are good until the task leaves the terminal. They can edit files. They can run tests. They can explain a diff. Then the work hits a desktop app, an OAuth approval screen, a native settings window, or a web UI that was not designed for API access. Suddenly the agent is not stuck on intelligence. It is stuck on reach. That was the gap I kept running into while building my local AI setup. I had Claude Code, Codex CLI, Gemini CLI, local models, provider keys, and account pools. The missing piece was not another model. It was an operator. My old workflow had two separate worlds. In one world, coding agents lived inside terminals and repos. They could reason about code, run commands, and keep a session alive. In the other world, real work still happened through desktop apps, dashboards, browser windows, chat clients, and provider consoles. A human could jump between those worlds without thinking. An agent could not. That made the assistant feel smaller than it should: So I changed how I think about CliGate. CliGate is no longer just a local API gateway for AI tools. It is becoming a local control plane for agent work. CliGate still starts as one localhost service for AI coding tools. You can point Claude Code, Codex CLI, Gemini CLI, and OpenClaw at the same local server, then manage provider keys, account pools, routing, usage, logs, and local runtimes from one dashboard. But the newer assistant layer sits above that. It has two modes: That split matters. I do not want every normal message to be intercepted by a clever supervisor. Sometimes I just want to continue the current runtime session. Other times I want an assistant that can see the bigger picture. The assistant is not trying to replace Codex or Claude Code. It coordinates them. The second piece is skills. A skill is a local package of instructions, scripts, templates, and references. The assistant does not need every detail in context all the time. It can see a short description first, then read the full SKILL.md only when the task matches. For example: skills/ devto-publisher/ SKILL.md publish.js templates/ That turns the assistant from "a general chat box with tools" into something closer to a teammate with reusable procedures. One skill can know how to publish a Dev.to article. Another can know how to build a spreadsheet. Another can know the conventions of a local repo. The key is that these are local, inspectable, and executable through the same permission system as the rest of the agent. It is not magic. It is just a better way to keep operational knowledge out of one giant prompt. The part I am most excited about is desktop control. The first naive version of desktop automation is usually visual: take a screenshot, ask the model where to click, move the mouse, repeat. That works for demos, but it is fragile. Small buttons, focus changes, DPI scaling, popups, and animations can break it. CliGate's desktop agent takes a different default path on Windows: UI Automation first, screenshots second. Instead of guessing pixels, the assistant can ask the operating system for the UI tree: php list windows - focus app - find input - set value - send Enter - read text That means it can find a textbox by control type, set its value through the accessibility API, invoke a button, read visible text, and only fall back to screenshots when the app does not expose useful accessibility metadata. This is the bridge I wanted: a coding assistant that can work in repos, but also operate the desktop applications that surround the repo. The current shape is: That combination changes the product from "proxy for AI tools" into "local operator for developer workflows." I think the desktop-control layer deserves its own post, because "AI can operate any app through the OS accessibility tree" is a deeper topic than I can fit here. The project is open source here: CliGate on GitHub https://github.com/codeking-ai/cligate How are you handling the boundary between coding agents and the desktop apps they still need to interact with?