{"slug": "how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced", "title": "How to Run Reliable Local LLM Agents on an RTX 3090: A Benchmark (5 Models, Priced in Watts)", "summary": "A developer benchmarked five local LLM agents on an RTX 3090, finding that the orchestrator, not the model, determines success. GLM-4.5-Air scored 0% with opencode but 93% with a LangGraph agent, while Qwen3-Coder 30B-A3B achieved 100% tool adherence. The benchmark also measured electricity cost per correct task.", "body_md": "I gave **GLM-4.5-Air** (106B, open weights) 12 coding tasks through [opencode](https://opencode.ai) on my RTX 3090. It scored **0%** — never edited a single file.\n\nSame model, same GPU, same tasks, but driven by a ~150-line **LangGraph** agent instead: **93%**.\n\nThe model was never the problem. The orchestrator was. Here's the benchmark — including the part nobody else measures, the **electricity cost per correct task**.\n\n| Model | tok/s | opencode adh. | LangGraph adh. | LangGraph coding | LangGraph general |\n|---|---|---|---|---|---|\nQwen3-Coder 30B-A3B |\n130 |\n92% | 100% |\n100% |\n100% |\nGLM-4.5-Air 106B |\n5.7 | 0% | 100% |\n89% |\n100% |\n| Devstral Small 24B | 49 | 8% | 53% | 8% | 40% |\n| Seed-OSS 36B | 9.5 | 0% | 7% | 0% | 20% |\n| DeepSeek-R1-Distill 32B | 6.7 | 0% | 0% | 0% | 0% |\n\n**Tool-adherence** = % of tasks where the model actually *called a tool* instead of just printing code in chat. It was the master variable. (GLM's headline \"93%\" is its blended score across all 17 tasks: 89% coding + 100% general.)\n\nBonus: **128 GB RAM let me run the 106B GLM** (23 GB VRAM + 27 GB spilled to RAM) — it works, at 5.7 tok/s. Great for fire-and-forget batch jobs, not interactive coding.\n\nPick a tool-use-tuned model (**Qwen3-Coder 30B-A3B** is the all-weather winner) → use **native** tool-calling, not an OpenAI-compat path → keep the harness lean → use RAM for reach, not speed → **measure correctness per kWh**.\n\n📖 **Full write-up with methodology, charts, and the deeper \"why\" →** [[https://medium.com/@arsen.apostolov/local-llm-agents-on-an-rtx-3090-i-benchmarked-5-models-2-frameworks-and-the-orchestrator-f5fd600ca221](https://medium.com/@arsen.apostolov/local-llm-agents-on-an-rtx-3090-i-benchmarked-5-models-2-frameworks-and-the-orchestrator-f5fd600ca221)]\n\n⭐ Every number was priced in watts by ** homelab-monitor** — my open-source tool that turns your GPU's power draw into per-task cost.", "url": "https://wpnews.pro/news/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced", "canonical_source": "https://dev.to/sikamikanikobg/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced-in-watts-15d0", "published_at": "2026-06-28 06:54:12+00:00", "updated_at": "2026-06-28 07:03:38.737856+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools", "ai-infrastructure", "machine-learning"], "entities": ["GLM-4.5-Air", "Qwen3-Coder 30B-A3B", "LangGraph", "opencode", "RTX 3090", "DeepSeek-R1-Distill 32B", "Seed-OSS 36B", "Devstral Small 24B"], "alternates": {"html": "https://wpnews.pro/news/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced", "markdown": "https://wpnews.pro/news/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced.md", "text": "https://wpnews.pro/news/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced.txt", "jsonld": "https://wpnews.pro/news/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced.jsonld"}}