I Built a Local LLM Rig to Escape API Bills. Then I Paid OpenAI Again.

wpnews.pro

cd /news/large-language-models/i-built-a-local-llm-rig-to-escape-ap… · home › topics › large-language-models › article

[ARTICLE · art-25856] src=dev.to ↗ pub=2026-06-13T02:19Z topic=large-language-models verified=true sentiment=· neutral

I Built a Local LLM Rig to Escape API Bills. Then I Paid OpenAI Again.

A developer running 2asy.ai's filing pipeline built a local LLM rig to escape API costs, but found that OpenAI's batch API outperformed it for large-scale single-document extractions. The local rig remains for live serving and multimodal tasks, while the batch lane moves to OpenAI, achieving 50% cost reduction and zero rate limits.

read1 min views22 publishedJun 13, 2026

I run a one-person AI shop. For 2asy.ai's filing pipeline that needs thousands of single-document extractions per cycle, the local rig lost the batch lane and OpenAI Batch won. Per-pipeline, not per-company.

The rule that decided it: no cross-document attention. Each filing gets its own prompt window. No string concatenation. The rule came from a Neo4j rollback I already paid for.

Quick results.

GGML_CUDA_DISABLE_GRAPHS=1

keeps llama.cpp alive when graph optimizer segfaults.googleapis/python-genai

issue 1984 is not-planned.gpt-5.4-mini ): JSONL line-isolated, 50 percent off, 100-doc nano gate in 2.7 min, zero 429s, around 1 cent per document.The local rig stays for live serving, ER API LLM gate, multimodal, and ablations. The batch lane moves to OpenAI.

Full retrospective with the side-by-side table: https://hannune.ai/blog/local-llm-to-openai-batch.html

source & further reading

dev.to — original article AgentENV: Distributed Runtime for AI Agents at Scale (Open Source, Rust) I Made REGENT: An MCP Server for Configuring OpenWrt Routers Through an AI Physics-Augmented Diffusion Modeling for satellite anomaly response operations with embodied agent feedback loops

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-a-local-llm-rig-…

Read original on dev.to → dev.to/hannune/i-built-a-local-llm-rig-to-escape…

mentioned entities

2asy.ai

OpenAI

llama.cpp

Neo4j

GGML_CUDA_DISABLE_GRAPHS

googleapis/python-genai

gpt-5.4-mini

metadata

slugi-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again

topic#large-language-models

secondary3 topics

sentimentneutral

canonicaldev.to

navigation

← prevOpen Source AI Must Win

next →Jailbreak that potentially trigg…

── more in #large-language-models 4 stories · sorted by recency

cephalosec.com · 28 Jul · #large-language-models

Cybersecurity harnesses everywhere

github.com · 28 Jul · #large-language-models

Running Kimi K3 on a M1 Mac

simonwillison.net · 28 Jul · #large-language-models

Anatomy of a Frontier Lab Agent Intrusion: A Timeline of the July 2026 Incident

simonwillison.net · 28 Jul · #large-language-models

Quoting Akshat Bubna

── more on @2asy.ai 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required