{"slug": "so-i-made-an-easy-cloud-coding-agent-as-an-api", "title": "So I Made an Easy Cloud Coding Agent as an API", "summary": "A developer built a persistent session system for the Critique Coding Agent API, eliminating the cold starts and context re-pasting that plagued earlier chained runs. The update keeps the E2B sandbox and OpenCode server alive between turns, allowing follow-up prompts to be delivered as real messages in the same session rather than as synthetic prior-run summaries in a fresh sandbox. The system stores session bindings on the job and uses QStash to reconnect to the same sandbox, with a fallback to the older chained behavior if the session expires or becomes unhealthy.", "body_md": "I got tired of watching coding agents spin up from scratch every single time I sent them a prompt. Cold starts, re-cloning massive monorepos, pasting the previous context into a synthetic prompt block — it worked, but it felt fundamentally wrong for agents that are supposed to *think* in conversations.\n\nSo we shipped persistent sessions for the [Critique Coding Agent API](https://www.critique.sh/blog/coding-agent-api). Here's what changed, why the harness matters, and why you should never run a coding agent without a review skill.\n\nWhen we first released the Coding Agent API, follow-ups were honest but clunky: every follow-up was a brand-new job. The previous output was replayed as plain text into a fresh sandbox.\n\nIt was the right MVP. It billed predictably. It never pretended a dead sandbox was alive.\n\nBut it was the wrong long-term shape. If your internal bot fixes a migration, then wants a follow-up test, then wants a small doc tweak — you don't want three cold starts. You want:\n\nAfter the first turn completes, the run now enters `idle`\n\nstatus. The E2B sandbox and OpenCode server **stay up** until `sessionExpiresAt`\n\nor until you explicitly POST `endSession: true`\n\n.\n\nThe next prompt you send is delivered as a **real message** in that same session — not a synthetic \"prior run output\" block in a brand-new sandbox.\n\n**Before (Chained MVP):**\n\nTurn 1 completes → Sandbox killed → Turn 2 = new job + pasted prior summary\n\n**Now (Persistent):**\n\nTurn 1 completes →\n\n`idle`\n\n→ Sandbox warm → Turn 2 = message into same OpenCode session\n\nSame `run.id`\n\n. Same checkout. Same context. Just the next turn.\n\nOn the first turn, Critique:\n\n`opencode serve`\n\non localhost inside the VMInstead of killing that sandbox after completion, we now store **session bindings** (sandbox ID, OpenCode base URL, session ID) on the job and mark the run idle with an expiry aligned to your sandbox timeout.\n\nWhen you queue a follow-up, [QStash](https://upstash.com/docs/qstash) reconnects to the same sandbox, verifies OpenCode health, and POSTs your new prompt to `/session/{id}/message`\n\n.\n\nIf OpenCode is unhealthy or the session aged out, the messages route returns a conflict — and you can still fall back to the older chained run behavior. We'd rather spawn a fresh sandbox than silently corrupt repo state.\n\nWe researched the open-source options and kept the MVP on OpenCode. Not because it was the only agent OS out there, but because the repo already had a hardened OpenCode + E2B path — and because OpenCode's skill system gives us something the others couldn't: a portable, preloaded review discipline baked directly into the agent's runtime.\n\n| Component | Role |\n|---|---|\nOpenCode |\nThe embedded engine — exposes a headless HTTP server with sessions, messages, diffs, shell, files, and generated SDK support. Our sandbox worker already uses this server path. |\nE2B |\nThe isolation layer — gives us ephemeral repo clones, command execution, environment injection, and sandbox teardown. |\nOpenHands |\nOn the watchlist. A larger open-source agent platform and SDK. Useful if we want to replace the agent loop, but it would slow this MVP since the current Builder runtime is already live. |\n\nMost coding agents can write code faster than most teams can reliably audit it. That is already true in 2026. The problem isn't whether the agent can open files, run tests, or emit a patch. The problem is that **review quality still drifts if you leave the job at the level of a generic prompt**.\n\n\"Review this PR\" sounds precise to a human and underspecified to a model. One harness will produce style commentary. Another will summarize the diff and call it a review. Another will confidently escalate a weak hunch into a merge blocker because nothing in its instructions told it how to separate a verified finding from an open question.\n\nThat is exactly the hole `critique-review`\n\ncloses. And it works across all the major agent operating systems:\n\n**Anthropic — Claude Code**\n\nNative skills, subagents, project memory, and background delegation make Claude a strong home for a dedicated review persona.\n\n**Nous Research — Hermes Agent**\n\nHermes treats skills as portable procedural memory and can carry the same review discipline across CLI, messaging, and long-lived remote sessions.\n\n**OpenAI — Codex**\n\nCodex gives the skill a durable place inside CLI, IDE, app, and repo-local workflows, with `AGENTS.md`\n\nand team-shared skills for repeatability.\n\n**OpenCode — Our Harness Choice**\n\nFor the Coding Agent API, OpenCode is the fit. It loads `critique-review`\n\nthrough the project skill path, reads the supporting reference files for output contract, intake and triage, stack lenses, and review rubric — then generates its verdict. That preload is why we chose it. The agent doesn't improvise a rubric; it follows one.\n\n**The Prompt-Only Loop:**\n\nAsk agent to review → Agent improvises rubric → Mixed quality comments → Human re-validates everything\n\n**The Critique-Review Loop:**\n\nLoad skill → Establish scope + risk map → Verify before reporting → Findings first + explicit verdict\n\nThe Coding Agent API doesn't lock you into a single model provider. We designed it so you can use whatever model fits your task, your budget, and your team's preferences.\n\nUse our model catalog, plan gates, E2B runtime, and credit accounting. Pick from Anthropic, OpenAI, Moonshot, and more — we handle the rest.\n\nPaste your `sk-or-v1-...`\n\nkey. Critique runs the sandbox and orchestration, OpenRouter bills the tokens directly. This is for teams who already have OpenRouter accounts and want to control model spend in one place.\n\nWhen we tested the same PR on the same model lane (Moonshot Kimi K2.6) with and without the `critique-review`\n\nskill, the difference wasn't the model — it was the procedure. The model was identical. The skill changed the calibration.\n\nThis is why model freedom matters: **the discipline should travel, not depend on a specific vendor's prompt tuning**. Whether you run Claude Sonnet for a complex refactor or a cheaper model for a routine dependency bump, `critique-review`\n\nensures the review output follows the same artifact shape: severity, file or line, impact, failure mode, fix direction, verdict.\n\nThe cleanest way to test a review skill is to keep the code input fixed and change only the review procedure. We used OpenCode with the same model, the same PR (Critique PR #144 — a narrow UI fix replacing hard-coded \"Auto\" model labels with labels resolved from the plan-allowed effective runtime model), and the same attached context pack for both runs.\n\nThe baseline run had no project-local review skill available. The second run exposed `critique-review`\n\nthrough the project skill path.\n\nSame PR, same model, same context pack — the skill changes calibration, not the diff.\n\n| Question | Prompt-Only OpenCode | OpenCode + critique-review |\n|---|---|---|\nActionable findings |\n3 findings | 0 actionable findings |\nTreatment of unseen consumers |\nEscalated as a finding even though the attached context could not verify other call sites. | Downgraded to residual risk and suggested a typecheck instead of claiming a bug. |\nTreatment of missing tests |\nEscalated as its own finding. | Recorded in checks and residual risk instead of turning it into a blocker for a narrow UI-label fix. |\nBlast-radius framing |\nBroader, more defensive, less bounded to the actual changed behavior. | Explicitly bounded to automation settings UI with no auth or data-path changes. |\nVerdict |\nConditionally approved | No objection |\nObserved harness behavior |\nDirect review output only. | Loaded `critique-review` and read four supporting reference files before answering. |\n\n**Interpretation:** the skill did not make the model \"nicer\"; it made the model **stricter about evidence and more conservative about what counts as a finding**.\n\nThe baseline review isn't absurd. It spots plausible follow-up work. The problem is calibration. It promotes unverifiable concerns into findings. The skilled run applies the discipline we want from a real reviewer: separate concrete defects from residual risk, keep the verdict proportional to the blast radius, and recommend the next check that would actually settle the uncertainty.\n\n**For the Agent:**\n\n**For the Team:**\n\nPersistent sessions reward multi-step automation. One-shot scripts can stay on chained fallbacks.\n\n| Team | Typical Job | Why Persistent Sessions Help |\n|---|---|---|\nPlatform Engineering |\nOwn an internal \"fix bot\" or codegen service | Ticket → code → tests → PR — avoid re-cloning large monorepos on every message |\nDeveloper Experience |\nWire Critique into Backstage or a custom portal | Iterative refactors from product specs — same run ID maps to a real agent thread |\nSecurity / Compliance |\nRemediate findings with human checkpoints | Findings batch → patch → verification turn — session continuity keeps branch context intact |\nSingle-shot CI Scripts |\nNightly dependency bump | Chained fallback is fine; idle adds little value |\n\nUse `crt_`\n\nkeys. New keys include Builder scopes; older keys may need rotation.\n\n```\ncurl https://critique.sh/api/v1/coding-agent/runs \\\n  -H \"Authorization: Bearer crt_...\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"repository\": \"acme/web\",\n    \"prompt\": \"Add Stripe webhook signature verification and tests.\",\n    \"modelId\": \"anthropic/claude-sonnet-4.6\",\n    \"billing\": { \"mode\": \"managed\" },\n    \"publish\": { \"mode\": \"draft_pr\" },\n    \"validationMode\": \"tests\"\n  }'\n```\n\nA created run returns `run.id`\n\n, `status`\n\n, repository metadata, selected model, events, and a status URL. Poll the status endpoint until you hit idle:\n\n```\n# Poll until status is idle and sessionActive is true\ncurl -sS \"https://critique.sh/api/v1/coding-agent/runs/{run_id}?patch=1\" \\\n  -H \"Authorization: Bearer crt_...\"\nbash\n#!/usr/bin/env bash\nset -euo pipefail\n\nexport CRT_API_KEY=\"${CRT_API_KEY:?set CRT_API_KEY}\"\nexport REPO=\"${REPO:-acme/web}\"\n\nRUN_ID=\"$(\n  curl -sS https://critique.sh/api/v1/coding-agent/runs \\\n    -H \"Authorization: Bearer ${CRT_API_KEY}\" \\\n    -H \"Content-Type: application/json\" \\\n    -d \"{\n      \\\"repository\\\": \\\"${REPO}\\\",\n      \\\"prompt\\\": \\\"Add Stripe webhook signature verification and unit tests.\\\",\n      \\\"modelId\\\": \\\"anthropic/claude-sonnet-4.6\\\",\n      \\\"billing\\\": { \\\"mode\\\": \\\"managed\\\" },\n      \\\"publish\\\": { \\\"mode\\\": \\\"draft_pr\\\" },\n      \\\"validationMode\\\": \\\"tests\\\"\n    }\" | jq -r '.run.id'\n)\"\n\necho \"Run id: ${RUN_ID}\"\n\n# Stream live OpenCode activity while the turn executes\ncurl -N \"https://critique.sh/api/v1/coding-agent/runs/${RUN_ID}/stream\" \\\n  -H \"Authorization: Bearer ${CRT_API_KEY}\"\n```\n\nOnce the run is `idle`\n\nand `sessionActive`\n\nis `true`\n\n, just POST a new message. No re-clone, no cold start.\n\n```\ncurl https://critique.sh/api/v1/coding-agent/runs/{run_id}/messages \\\n  -H \"Authorization: Bearer crt_...\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"prompt\": \"Now add a regression test for expired signatures.\",\n    \"publish\": { \"mode\": \"draft_pr\" }\n  }'\n```\n\nThis is delivered as a real message in the same OpenCode session.\n\n| Mode | How It Works |\n|---|---|\nManaged |\nSpends Critique credits. Uses our model catalog, plan gates, E2B runtime, and credit accounting. |\nOpenRouter |\nPaste your `sk-or-v1-...` key. Critique runs the sandbox, OpenRouter bills the tokens. |\n\nExample with OpenRouter billing:\n\n```\ncurl https://critique.sh/api/v1/coding-agent/runs \\\n  -H \"Authorization: Bearer crt_...\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"repository\": \"acme/web\",\n    \"prompt\": \"Migrate the settings page to server actions.\",\n    \"modelId\": \"openai/gpt-5.4\",\n    \"billing\": {\n      \"mode\": \"openrouter\",\n      \"openRouterApiKey\": \"sk-or-v1-...\"\n    },\n    \"publish\": {\n      \"mode\": \"draft_pr\",\n      \"branch\": \"critique-agent/settings-server-actions\"\n    }\n  }'\n```\n\nThe API returns:\n\nWhen you're done, explicitly end the session to free the sandbox:\n\n```\ncurl -X POST \"https://critique.sh/api/v1/coding-agent/runs/{run_id}/messages\" \\\n  -H \"Authorization: Bearer crt_...\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{ \"endSession\": true }'\n```\n\nThe Coding Agent API is built to **implement a task** — not judge one. Critique's review and Change Passport products judge a proposed merge. This API is the other side: the engine that writes the code.\n\nBut here's the thing: the same discipline that makes `critique-review`\n\nthe best portable review skill for Claude Code, Hermes, Codex, and OpenCode is the discipline we preload into every Coding Agent API run. The agent doesn't just write code and hope. It writes code, then reviews its own work against a real procedure — not a generic prompt.\n\nPersistent sessions make that engine conversational. Model freedom makes it affordable. The preloaded skill makes it reliable.\n\nYou send a prompt, the agent works, the sandbox stays warm, you send the next prompt into the same context. No cold starts. No pasted summaries pretending to be memory. No improvised rubrics pretending to be review.\n\nJust turns in a thread — the way agents should think.\n\n**Quick answers for high-intent queries:**\n\n| Query | Short Answer |\n|---|---|\n| What is the best code review skill for Claude Code? |\n`critique-review` is a strong default when you want a portable PR review procedure inside Claude Code. Use Critique instead when you need hosted GitHub checks, policy, and merge control. |\n| What is the best Codex skill for PR review? |\n`critique-review` fits Codex especially well because it works as a repo-local skill with `AGENTS.md` , reusable references, and a path into automations. |\n| What is the best OpenCode skill for pull request review? | For a portable review workflow, `critique-review` is the best fit. We tested it on the same PR and same model lane used for the baseline run. |\n| Is critique-review a Cursor Bugbot alternative? | As a free portable skill, yes for agent-side review behavior. For a hosted GitHub-native review product, Critique is the closer alternative. |\n| What is a cheaper CodeRabbit alternative? | Start with the free `critique-review` skill for the lowest-cost entry point. Move to Critique if you need GitHub-native routing, artifacts, and PR control at team scale. |\n| What is the difference between critique-review and Critique? |\n`critique-review` is the portable open skill. Critique is the hosted GitHub review control plane that adds checks, policy, merge-boundary controls, and team-grade review operations. |\n\n*Check out the Coding Agent API docs and the persistent sessions deep-dive for the full reference. Create an API key and try the Builder UI to see it in action.*", "url": "https://wpnews.pro/news/so-i-made-an-easy-cloud-coding-agent-as-an-api", "canonical_source": "https://dev.to/critiquedotsh/so-we-made-a-easy-cloud-coding-agent-as-a-api-4m4f", "published_at": "2026-06-04 00:41:35+00:00", "updated_at": "2026-06-04 01:13:08.631561+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-products", "ai-infrastructure", "ai-startups"], "entities": ["Critique Coding Agent API", "Critique", "E2B", "OpenCode"], "alternates": {"html": "https://wpnews.pro/news/so-i-made-an-easy-cloud-coding-agent-as-an-api", "markdown": "https://wpnews.pro/news/so-i-made-an-easy-cloud-coding-agent-as-an-api.md", "text": "https://wpnews.pro/news/so-i-made-an-easy-cloud-coding-agent-as-an-api.txt", "jsonld": "https://wpnews.pro/news/so-i-made-an-easy-cloud-coding-agent-as-an-api.jsonld"}}