{"slug": "i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again", "title": "I Built a Local LLM Rig to Escape API Bills. Then I Paid OpenAI Again.", "summary": "A developer running 2asy.ai's filing pipeline built a local LLM rig to escape API costs, but found that OpenAI's batch API outperformed it for large-scale single-document extractions. The local rig remains for live serving and multimodal tasks, while the batch lane moves to OpenAI, achieving 50% cost reduction and zero rate limits.", "body_md": "I run a one-person AI shop. For 2asy.ai's filing pipeline that needs thousands of single-document extractions per cycle, the local rig lost the batch lane and OpenAI Batch won. Per-pipeline, not per-company.\n\nThe rule that decided it: no cross-document attention. Each filing gets its own prompt window. No string concatenation. The rule came from a Neo4j rollback I already paid for.\n\nQuick results.\n\n`GGML_CUDA_DISABLE_GRAPHS=1`\n\nkeeps llama.cpp alive when graph optimizer segfaults.`googleapis/python-genai`\n\nissue 1984 is not-planned.`gpt-5.4-mini`\n\n): JSONL line-isolated, 50 percent off, 100-doc nano gate in 2.7 min, zero 429s, around 1 cent per document.The local rig stays for live serving, ER API LLM gate, multimodal, and ablations. The batch lane moves to OpenAI.\n\nFull retrospective with the side-by-side table: [https://hannune.ai/blog/local-llm-to-openai-batch.html](https://hannune.ai/blog/local-llm-to-openai-batch.html)", "url": "https://wpnews.pro/news/i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again", "canonical_source": "https://dev.to/hannune/i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again-4bi7", "published_at": "2026-06-13 02:19:42+00:00", "updated_at": "2026-06-13 02:47:37.103147+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "developer-tools", "ai-products"], "entities": ["2asy.ai", "OpenAI", "llama.cpp", "Neo4j", "GGML_CUDA_DISABLE_GRAPHS", "googleapis/python-genai", "gpt-5.4-mini"], "alternates": {"html": "https://wpnews.pro/news/i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again", "markdown": "https://wpnews.pro/news/i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again.md", "text": "https://wpnews.pro/news/i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again.txt", "jsonld": "https://wpnews.pro/news/i-built-a-local-llm-rig-to-escape-api-bills-then-i-paid-openai-again.jsonld"}}