# How to Run Reliable Local LLM Agents on an RTX 3090: A Benchmark (5 Models, Priced in Watts)

> Source: <https://dev.to/sikamikanikobg/how-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced-in-watts-15d0>
> Published: 2026-06-28 06:54:12+00:00

I gave **GLM-4.5-Air** (106B, open weights) 12 coding tasks through [opencode](https://opencode.ai) on my RTX 3090. It scored **0%** — never edited a single file.

Same model, same GPU, same tasks, but driven by a ~150-line **LangGraph** agent instead: **93%**.

The model was never the problem. The orchestrator was. Here's the benchmark — including the part nobody else measures, the **electricity cost per correct task**.

| Model | tok/s | opencode adh. | LangGraph adh. | LangGraph coding | LangGraph general |
|---|---|---|---|---|---|
Qwen3-Coder 30B-A3B |
130 |
92% | 100% |
100% |
100% |
GLM-4.5-Air 106B |
5.7 | 0% | 100% |
89% |
100% |
| Devstral Small 24B | 49 | 8% | 53% | 8% | 40% |
| Seed-OSS 36B | 9.5 | 0% | 7% | 0% | 20% |
| DeepSeek-R1-Distill 32B | 6.7 | 0% | 0% | 0% | 0% |

**Tool-adherence** = % of tasks where the model actually *called a tool* instead of just printing code in chat. It was the master variable. (GLM's headline "93%" is its blended score across all 17 tasks: 89% coding + 100% general.)

Bonus: **128 GB RAM let me run the 106B GLM** (23 GB VRAM + 27 GB spilled to RAM) — it works, at 5.7 tok/s. Great for fire-and-forget batch jobs, not interactive coding.

Pick a tool-use-tuned model (**Qwen3-Coder 30B-A3B** is the all-weather winner) → use **native** tool-calling, not an OpenAI-compat path → keep the harness lean → use RAM for reach, not speed → **measure correctness per kWh**.

📖 **Full write-up with methodology, charts, and the deeper "why" →** [[https://medium.com/@arsen.apostolov/local-llm-agents-on-an-rtx-3090-i-benchmarked-5-models-2-frameworks-and-the-orchestrator-f5fd600ca221](https://medium.com/@arsen.apostolov/local-llm-agents-on-an-rtx-3090-i-benchmarked-5-models-2-frameworks-and-the-orchestrator-f5fd600ca221)]

⭐ Every number was priced in watts by ** homelab-monitor** — my open-source tool that turns your GPU's power draw into per-task cost.