{"slug": "my-ai-assistant-could-code-but-it-couldn-t-operate-my-desktop", "title": "\"My AI Assistant Could Code, But It Couldn't Operate My Desktop\"", "summary": "CliGate, an open-source local AI gateway, has evolved into a \"local control plane\" for agent work that can operate desktop applications through the operating system's accessibility tree. The tool, which started as a local API gateway for AI coding tools like Claude Code and Codex CLI, now includes a desktop agent that uses UI Automation on Windows to find controls, set values, and invoke buttons without relying on fragile screenshot-based automation. The system also introduces a \"skills\" framework that packages reusable procedures—such as publishing to Dev.to or building spreadsheets—as local, inspectable packages that the assistant can load on demand.", "body_md": "Most AI coding agents are good until the task leaves the terminal.\n\nThey can edit files. They can run tests. They can explain a diff. Then the work hits a desktop app, an OAuth approval screen, a native settings window, or a web UI that was not designed for API access. Suddenly the agent is not stuck on intelligence. It is stuck on reach.\n\nThat was the gap I kept running into while building my local AI setup. I had Claude Code, Codex CLI, Gemini CLI, local models, provider keys, and account pools. The missing piece was not another model.\n\nIt was an operator.\n\nMy old workflow had two separate worlds.\n\nIn one world, coding agents lived inside terminals and repos. They could reason about code, run commands, and keep a session alive.\n\nIn the other world, real work still happened through desktop apps, dashboards, browser windows, chat clients, and provider consoles. A human could jump between those worlds without thinking. An agent could not.\n\nThat made the assistant feel smaller than it should:\n\nSo I changed how I think about CliGate.\n\nCliGate is no longer just a local API gateway for AI tools. It is becoming a local control plane for agent work.\n\nCliGate still starts as one localhost service for AI coding tools.\n\nYou can point Claude Code, Codex CLI, Gemini CLI, and OpenClaw at the same local server, then manage provider keys, account pools, routing, usage, logs, and local runtimes from one dashboard.\n\nBut the newer assistant layer sits above that.\n\nIt has two modes:\n\nThat split matters. I do not want every normal message to be intercepted by a clever supervisor. Sometimes I just want to continue the current runtime session. Other times I want an assistant that can see the bigger picture.\n\nThe assistant is not trying to replace Codex or Claude Code. It coordinates them.\n\nThe second piece is skills.\n\nA skill is a local package of instructions, scripts, templates, and references. The assistant does not need every detail in context all the time. It can see a short description first, then read the full `SKILL.md`\n\nonly when the task matches.\n\nFor example:\n\n```\nskills/\n  devto-publisher/\n    SKILL.md\n    publish.js\n    templates/\n```\n\nThat turns the assistant from \"a general chat box with tools\" into something closer to a teammate with reusable procedures.\n\nOne skill can know how to publish a Dev.to article. Another can know how to build a spreadsheet. Another can know the conventions of a local repo. The key is that these are local, inspectable, and executable through the same permission system as the rest of the agent.\n\nIt is not magic. It is just a better way to keep operational knowledge out of one giant prompt.\n\nThe part I am most excited about is desktop control.\n\nThe first naive version of desktop automation is usually visual: take a screenshot, ask the model where to click, move the mouse, repeat. That works for demos, but it is fragile. Small buttons, focus changes, DPI scaling, popups, and animations can break it.\n\nCliGate's desktop agent takes a different default path on Windows: UI Automation first, screenshots second.\n\nInstead of guessing pixels, the assistant can ask the operating system for the UI tree:\n\n``` php\nlist windows -> focus app -> find input -> set value -> send Enter -> read text\n```\n\nThat means it can find a textbox by control type, set its value through the accessibility API, invoke a button, read visible text, and only fall back to screenshots when the app does not expose useful accessibility metadata.\n\nThis is the bridge I wanted: a coding assistant that can work in repos, but also operate the desktop applications that surround the repo.\n\nThe current shape is:\n\nThat combination changes the product from \"proxy for AI tools\" into \"local operator for developer workflows.\"\n\nI think the desktop-control layer deserves its own post, because \"AI can operate any app through the OS accessibility tree\" is a deeper topic than I can fit here.\n\nThe project is open source here: [CliGate on GitHub](https://github.com/codeking-ai/cligate)\n\nHow are you handling the boundary between coding agents and the desktop apps they still need to interact with?", "url": "https://wpnews.pro/news/my-ai-assistant-could-code-but-it-couldn-t-operate-my-desktop", "canonical_source": "https://dev.to/codekingai/my-ai-assistant-could-code-but-it-couldnt-operate-my-desktop-1log", "published_at": "2026-05-26 09:10:44+00:00", "updated_at": "2026-05-26 09:34:03.850818+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "ai-products", "large-language-models"], "entities": ["Claude Code", "Codex CLI", "Gemini CLI", "CliGate", "OpenClaw"], "alternates": {"html": "https://wpnews.pro/news/my-ai-assistant-could-code-but-it-couldn-t-operate-my-desktop", "markdown": "https://wpnews.pro/news/my-ai-assistant-could-code-but-it-couldn-t-operate-my-desktop.md", "text": "https://wpnews.pro/news/my-ai-assistant-could-code-but-it-couldn-t-operate-my-desktop.txt", "jsonld": "https://wpnews.pro/news/my-ai-assistant-could-code-but-it-couldn-t-operate-my-desktop.jsonld"}}