How to Run Reliable Local LLM Agents on an RTX 3090: A Benchmark (5 Models, Priced in Watts)

wpnews.pro

cd /news/large-language-models/how-to-run-reliable-local-llm-agents… · home › topics › large-language-models › article

[ARTICLE · art-42320] src=dev.to ↗ pub=2026-06-28T06:54Z topic=large-language-models verified=true sentiment=· neutral

How to Run Reliable Local LLM Agents on an RTX 3090: A Benchmark (5 Models, Priced in Watts)

A developer benchmarked five local LLM agents on an RTX 3090, finding that the orchestrator, not the model, determines success. GLM-4.5-Air scored 0% with opencode but 93% with a LangGraph agent, while Qwen3-Coder 30B-A3B achieved 100% tool adherence. The benchmark also measured electricity cost per correct task.

read1 min views1 publishedJun 28, 2026

I gave GLM-4.5-Air (106B, open weights) 12 coding tasks through opencode on my RTX 3090. It scored 0% — never edited a single file. Same model, same GPU, same tasks, but driven by a ~150-line LangGraph agent instead: 93%.

The model was never the problem. The orchestrator was. Here's the benchmark — including the part nobody else measures, the electricity cost per correct task.

|---|---|---|---|---|---|
Qwen3-Coder 30B-A3B |

130 | 92% | 100% | 100% | 100% | GLM-4.5-Air 106B | 5.7 | 0% | 100% | 89% | 100% | | Devstral Small 24B | 49 | 8% | 53% | 8% | 40% | | Seed-OSS 36B | 9.5 | 0% | 7% | 0% | 20% | | DeepSeek-R1-Distill 32B | 6.7 | 0% | 0% | 0% | 0% |

Tool-adherence = % of tasks where the model actually called a tool instead of just printing code in chat. It was the master variable. (GLM's headline "93%" is its blended score across all 17 tasks: 89% coding + 100% general.)

Bonus: 128 GB RAM let me run the 106B GLM (23 GB VRAM + 27 GB spilled to RAM) — it works, at 5.7 tok/s. Great for fire-and-forget batch jobs, not interactive coding.

Pick a tool-use-tuned model (Qwen3-Coder 30B-A3B is the all-weather winner) → use native tool-calling, not an OpenAI-compat path → keep the harness lean → use RAM for reach, not speed → measure correctness per kWh.

📖 Full write-up with methodology, charts, and the deeper "why" → [https://medium.com/@arsen.apostolov/local-llm-agents-on-an-rtx-3090-i-benchmarked-5-models-2-frameworks-and-the-orchestrator-f5fd600ca221]

⭐ Every number was priced in watts by ** homelab-monitor** — my open-source tool that turns your GPU's power draw into per-task cost.

source & further reading

dev.to — original article I Built 3 MCP Servers for AI Agents — Here's How They Work Agent-Ready Commerce, Part 2: From Product Pages to Commercial I Run DeepSeek on Claude Code — How I Swap Models by Changing Only One File

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-to-run-reliable-loca…

Read original on dev.to → dev.to/sikamikanikobg/how-to-run-reliable-local-…

mentioned entities

GLM-4.5-Air

Qwen3-Coder 30B-A3B

LangGraph

opencode

RTX 3090

DeepSeek-R1-Distill 32B

Seed-OSS 36B

Devstral Small 24B

metadata

slughow-to-run-reliable-local-llm-agents-on-an-rtx-3090-a-benchmark-5-models-priced

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevI built a free planting calendar…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 28 Jun · #large-language-models

🔌 I Tried 100 MCP Servers. These Are The Only 12 Worth Installing.

dev.to · 28 Jun · #large-language-models

I Built 3 MCP Servers for AI Agents — Here's How They Work

dev.to · 28 Jun · #large-language-models

Pinecone vs Weaviate vs Milvus vs Qdrant: Which Vector DB in 2026?

github.com · 28 Jun · #large-language-models

Cerberus – a local firewall for AI agents' tool calls

── more on @glm-4.5-air 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required