I built a Windows computer-use MCP server in pure Go. One EXE. No Python. No Docker.
It's a single 27 MB executable that gives local LLMs (Claude, Gemini, Cursor, Kiro, OpenCode, Ollama, you name it) the ability to actually use a Windows desktop. Think of it as giving an LLM a mouse, keyboard, eyes, and long-term memory.
(Screenshots and demo clips coming soon — wanted to get the post up first.)
Over 14,000 lines of Go, zero dependencies on OpenCV, go-ole, or any COM binding library. Almost every subsystem was implemented from scratch in pure Go instead of wrapping existing libraries.
I'm not a professional software engineer. I built this by pair-programming with multiple AI models across hundreds of iterations.
And I didn't spend a cent on API tokens. Gemini CLI (free tier), Claude Code (trial credits), Ollama (local, free), GitHub Copilot, OpenCode's Big Pickle (200K ctx, free). The ollama launch claude
trick was a workhorse — point Claude Code at a Nemotron or MiniMax3 locally and get agentic scaffolding on budget hardware.
This started as "I want my AI to click a button." It somehow turned into a Windows automation framework with vision, memory, and a training pipeline.
But the real reason? A friend who's disabled and uses Narrator as their primary computer interface. They asked for a month to test once I was ready to go public. After countless trials and Python's "works on my machine" nonsense, I wanted something that actually ships.
120 MCP tools covering the full desktop stack:
find_image
/find_all_images
with triple cascade: template matching → ONNX YOLO → OCRocr_languages
, middle mouse, horizontal scroll, fullscreen detectionBattle-tested with: Claude Desktop, Claude Code, Kiro, Cursor, Windsurf, Gemini CLI, OpenCode, Ollama, Antigravity IDE, Cline, Android Studio, Zed, Obsidian, and more.
syscall.SyscallN(vtblMethod(obj, N), ...)
with indices hand-verified against Windows SDK headers.If you're building AI agents, this gives them hands.
If you're building desktop automation, this gives you 120 reusable tools.
If you're learning Windows internals, it shows how raw WinRT COM, OCR, ONNX, and UI automation fit together in a real project.
If you're just curious how far one person can get with modern AI tools, this is my answer.
Yes, it can control your computer. So can AutoHotkey, PowerShell, Selenium, and every RPA tool. This is local-first, every action can be logged, privacy controls toggle at runtime. Not spyware. Not a remote admin tool. Just an automation engine for your own machine.
Especially from people into Go, MCP, local AI, computer vision, automation, or Windows internals. Open issues, suggest features, steal patterns — the repo has templates and a security policy now. Tell me what I broke, what to build next, or that I'm insane for hand-writing COM vtables in Go.
AI didn't build this project. AI became my pair programmer.
The architecture, direction, debugging, testing, and endless "why doesn't Windows do what the docs say?" moments were still mine.
It convinced me that one curious computer technician, persistence, and today's AI tools can build things I wouldn't have been able to create on my own just a few years ago.
Links
GitHub: https://github.com/coff33ninja/go-mcp-computer-use
Docs: docs/reference/tools.md
, docs/security.md
, docs/mcp-client-configs.md