Building an outbound-only WebSocket bridge for local AI agents

The article describes the development of CTRL NODE, a browser-based control plane that enables remote management of local AI agents through an outbound-only WebSocket bridge. The Bridge is a lightweight Node.js daemon that connects outward to the cloud, allowing commands to be pushed down the connection without the local machine ever accepting an inbound connection or exposing a public port. Key technical features include heartbeat messages every 20 seconds to prevent cloud load balancers from killing idle connections, exponential backoff for reconnection, and an in-memory queue to prevent data loss during reconnection events.

I work with AI agents every day. Claude Code, Copilot, Gemini CLI — running locally, with access to my filesystem, my repos, my tools. The results are genuinely good. But there's a wall: the moment you leave your desk, you lose control . There's no real way to kick off an agent task from your phone, monitor a long-running pipeline from a coffee shop, or schedule something to run overnight. Every solution I found had the same trade-off: you either open a port, install a tunnel daemon, or upload your code to someone's cloud. None of those felt right for infrastructure that has access to your local filesystem. So I built CTRL NODE https://ctrlnode.ai — a browser-based control plane for local AI agents. The key piece is a process called the Bridge : a lightweight Node.js daemon that runs on your machine and connects to the cloud without ever accepting an inbound connection. This article is about how that works, why the design choices matter, and what the actual code looks like. Why outbound-only? The naive approach is to expose your local agent runtime on a port and let the cloud reach in. Tools like ngrok do exactly this — they create a reverse proxy to your localhost. It works, but it has real costs: - Open port = attack surface. Every ngrok tunnel is a publicly reachable endpoint. If auth breaks, someone else can talk to your agent. - Third-party traffic relay. Your prompts, file paths, and agent responses travel through ngrok's infrastructure. - Daemon complexity. You're running persistent infrastructure that you didn't write and can't audit easily. The alternative: flip the connection direction. The Bridge connects out to the cloud. The cloud pushes commands down that connection. The local machine never listens on a public port. Your machine ctrlnode.ai cloud ────────────────────────────────────────────────── Bridge ──── ws:// connect ────▶ WebSocket server ◀─── {action: "run task", ...} ──────────── ───── stdout/stderr events ──────────────▶ This is the same pattern used by IoT devices, CI agents like the GitHub Actions runner , and remote desktop clients. The cloud doesn't initiate — it waits. The connection lifecycle Here's the core of websocket.ts : js export function connect : void { const url = buildWsUrl ; ws = new WebSocket url, { headers: buildAuthHeaders } ; ws.on "open", = { logger.info "Bridge connected to SAAS" ; flushPendingQueue ; startHeartbeat ; } ; ws.on "message", data: WebSocket.RawData = { const message = JSON.parse data.toString as InboundMessage; handleInboundMessage message ; } ; ws.on "close", code: number, reason: Buffer = { stopHeartbeat ; if isAuthError code, reason.toString { logger.warn Auth error ${code} , retrying in ${AUTH RETRY MS / 1000}s ; setTimeout connect, AUTH RETRY MS ; } else { scheduleReconnect ; } } ; ws.on "error", err: Error = { logger.error WebSocket error: ${err.message} ; } ; } Three things to notice: Auth errors get a longer timeout. If the server returns 1008 Policy Violation or 1002, or the reason string contains "401" / "403" / "Unauthorized" , we wait 30 seconds before retrying. Hammering an auth-rejected endpoint is pointless and noisy. Normal closes trigger exponential backoff. scheduleReconnect uses a standard backoff so a transient network blip doesn't flood logs. On open, we flush the queue. More on this below. Keeping the connection alive through load balancers Cloud load balancers will kill idle WebSocket connections after 30–60 seconds. The fix is a heartbeat: js const HEARTBEAT INTERVAL MS = 20 000; let heartbeatTimer: NodeJS.Timeout | null = null; function startHeartbeat : void { heartbeatTimer = setInterval = { sendToSaas { type: "heartbeat", timestamp: Date.now } ; }, HEARTBEAT INTERVAL MS ; } function stopHeartbeat : void { if heartbeatTimer { clearInterval heartbeatTimer ; heartbeatTimer = null; } } Every 20 seconds, a small message goes up. The server acknowledges it or doesn't — we don't care, the goal is just to keep TCP active . This is cheap and it works reliably with AWS ALB, Cloudflare, and most managed WebSocket proxies. Buffering outbound messages during disconnection When the Bridge is reconnecting, agent output still arrives. If we drop those events, the user watching a pipeline in their browser sees a gap in the live log. The solution is a small in-memory queue: js const PENDING QUEUE MAX = 100; const pendingQueue: OutboundMessage = ; export function sendToSaas message: OutboundMessage : void { if ws || ws.readyState == WebSocket.OPEN { if pendingQueue.length < PENDING QUEUE MAX { pendingQueue.push message ; } return; } ws.send JSON.stringify message ; } function flushPendingQueue : void { while pendingQueue.length 0 { const msg = pendingQueue.shift ; ws .send JSON.stringify msg ; } } Cap at 100 messages, flush on reconnect. Simple, and it handles the common case of a 2–3 second reconnect window without losing events. Multi-agent routing via the filesystem Here's the part that took the most thought: how do you run multiple agents — Claude, Copilot, Gemini — on the same machine, routing tasks to the right one? The answer isn't a routing layer in the WebSocket code. It's the filesystem . Each pipeline task gets an isolated directory: workspace/ tasks/ task-abc123/ input/ TASK.md ← instructions for the agent context-files/ ← any files the user attached output/ TASK.md ← agent writes progress here artifacts/ ← anything the agent produces The Bridge watches these directories. When a run task command arrives: case "run task": { const { taskId, agentProvider, workspacePath } = message.payload; const provider = getProvider agentProvider ; // Claude | Copilot | Gemini | ... await provider.executeTask taskId, workspacePath ; break; } Each provider implementation knows how to invoke its agent CLI with the right arguments and working directory. Claude Code gets claude --print with the task directory. Copilot gets its own invocation. They never share context — each runs in its own subprocess, reading from and writing to its own task folder. This means: - No prompt pollution. Agent A's context doesn't leak into Agent B. - Parallel execution. Two agents can run simultaneously without coordination overhead. - Auditability. Every task leaves a paper trail on disk. - Portability. The cloud control plane never sees your file contents. It only sees task metadata and status events. Provider selection and gating Some actions only make sense for certain providers. The message handler maintains an explicit set: js const OPENCLAW ONLY ACTIONS = new Set "openclaw configure", "openclaw stream chunk", "openclaw reset context", ; function handleInboundMessage message: InboundMessage : void { if OPENCLAW ONLY ACTIONS.has message.action && activeProvider == "openclaw" { logger.warn Received ${message.action} but provider is ${activeProvider} — ignoring ; return; } // ... dispatch to handler } This prevents misconfigured cloud deployments from accidentally sending the wrong command type to the wrong agent. The Bridge is the last line of defense before your filesystem. The startup sequence index.ts ties it together: js async function main : Promise<void { const providers = await createProviders config ; const multi = new MultiProvider providers ; connect ; // start WebSocket, non-blocking const keepaliveInterval = setInterval = {}, 1 << 30 ; keepaliveInterval.unref ; // don't prevent process exit process.on "SIGINT", gracefulShutdown ; process.on "SIGTERM", gracefulShutdown ; await multi.runSyncAgents ; // provider-specific background sync } The keepaliveInterval trick unref is worth noting: it keeps the event loop alive when nothing else is pending, but doesn't prevent a clean SIGINT / SIGTERM from shutting the process down. Without it, connect is async and Node exits immediately after starting. What this enables With the Bridge running, the CTRL NODE web app can: - Launch tasks against any connected agent from any browser, anywhere - Watch live output streamed back over the same WebSocket - Schedule routines — the cloud scheduler wakes the Bridge at the configured time, no cron job needed on the local machine - Run multi-step pipelines where each node can use a different agent None of your code leaves your machine. The cloud only sees: "task started", "task output line", "task completed". The actual file contents, prompts, and agent context stay local. Why open source? The Bridge is MIT licensed github.com/ctrlnode-ai/ctrlnode https://github.com/ctrlnode-ai/ctrlnode . You can read every line of the WebSocket handler, every message type, every auth check. If you don't trust the binary, build it yourself. The rest of CTRL NODE — the cloud scheduler, the web app, the real-time pipeline view — runs as a hosted service. The Bridge is the trust boundary: it's the piece that runs with access to your local system, and it needs to be auditable. Try it If you work with AI agents and want a way to control them remotely without sacrificing privacy: - Install the Bridge: npm install -g @ctrlnode/bridge && ctrlnode bridge start - Sign up at ctrlnode.ai https://ctrlnode.ai — it's free - Open the web app from anywhere and connect Questions, issues, or PRs: github.com/ctrlnode-ai/ctrlnode https://github.com/ctrlnode-ai/ctrlnode or reply here. Javier Vil — Creator of CTRL NODE