Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine

Here is a 2-3 sentence factual summary of the article:

The author argues that scattering direct LLM SDK calls across a codebase creates maintenance nightmares, particularly during SDK upgrades, and proposes a "2-file rule" based on hexagonal architecture. This rule restricts all SDK imports to just two files—an adapter and a provider registry—while the rest of the application interacts with a typed, provider-agnostic interface. The author also introduces "capability factories" to consolidate repeated cognitive operations (like classification or drafting) into reusable, centrally-managed functions, which they have released as an open-source library.

I upgraded an LLM SDK and expected a routine version bump. Instead I had to touch 15+ files, fix breaking changes across four providers, and spend the rest of the day hoping I had not missed one. That was the second time it happened. I knew there would be a third. If you have ever shipped a production LLM system, you probably recognize the smell: - An SDK minor version renames maxTokens to maxOutputTokens and now 15 files break at runtime, not compile time. - Switching one classification task from Claude to a cheaper model means editing import paths and type signatures in business logic. - You have written classifyEmail , scoreLead , triageTicket , and categorizeRequest , and they are all the same function with a different prompt string. This is not an SDK problem. It is an architecture problem. Here is how I fixed it, and the open-source library that came out of it. The 2-file rule I made one rule: only two files in the entire codebase are allowed to import the LLM SDK. One adapter that translates my interface into SDK calls, and one provider registry that creates clients from config. Everything else talks to a typed interface and has no idea which provider, model, or SDK is in play. This is just hexagonal architecture ports and adapters, per Alistair Cockburn applied to LLMs. You already do this for databases and message queues. Nobody scatters raw SQL across business logic. LLM providers belong in the same category. They are infrastructure, not application logic. The dependency flow goes from this: Application code ├─ direct SDK call ├─ direct SDK call └─ model router leaking SDK types To this: Application code ↓ llmClassify , llmDraft , llmScore ... Capabilities ↓ LLM Port TypeScript interface, zero SDK imports ↓ Adapters + Provider Registry the only 2 files that touch the SDK ↓ OpenAI / Anthropic / Gemini / Ollama / Vercel AI SDK The caller says what it wants taskType: "triage" . The infrastructure decides how . No model name parameter. No provider parameter. Policy is deferred to config. The proof: an SDK upgrade that did not hurt The real test came during a major SDK version jump with breaking changes maxTokens to maxOutputTokens , CoreMessage to ModelMessage , and more . Here is what the migration commit looked like: - 2 files changed the adapter and the agent runtime , plus 1 minor fix. - All 18 activity files unchanged. - All 10 agent files unchanged. - The final migration deleted more code than it added: 192 insertions, 688 deletions. 28 out of 31 files did not change, because they do not know the SDK exists. If a core dependency upgrade touches your business logic, your boundaries are wrong. The part that surprised me: the same 7 operations, everywhere I started this to isolate the SDK. Then I noticed the bigger problem. I was not calling LLMs in 21 different places. I was reimplementing the same seven cognitive operations with slight variations: | Capability | What you give it | What you get back | |---|---|---| Classify | content + rubric | one label from an enum + reasoning | Score | content + rubric + axes | numeric ratings per axis | Draft | persona + situation | longer text in a chosen tone | Summarize | long content + length target | shorter content, key points kept | Extract | unstructured text + schema | a typed structured object | Plan | goal + constraints | an ordered list of steps | Analyze | evidence + question | recommendation with caveats | Five activities classified content with five different prompt structures. Nine drafted messages with nine different tone injections. Same operation, no shared implementation. When I improved one classification prompt, I had to remember to update four other places. I usually forgot. You are not writing 47 prompts. You are writing 7 prompts, 47 times, with slightly different ingredients. So I extracted them into capability factories. A factory takes the invariant parts schema, rubric, model routing, observability hooks and returns a function that takes only the varying part the content : js import { createClassifier } from "@llm-ports/capabilities"; import { z } from "zod"; const IntentSchema = z.object { intent: z.enum "question", "request", "complaint", "feedback", "other" , urgency: z.enum "low", "normal", "high" , reasoning: z.string , } ; export const classifyIntent = createClassifier { port: llm, // your provider-agnostic port schema: IntentSchema, schemaName: "user-intent", rubric: question: asking for information request: wants something done complaint: reports a problem feedback: opinion only other: anything else , } ; Then every call site, across all your files, is the same shape: js const result = await classifyIntent { content: userMessage } ; // { intent: "request", urgency: "high", reasoning: "..." } fully typed Improve the rubric once, and every classifier in the system gets better. Prompt engineering stops being scattered strings and becomes a reusable system asset. llm-ports I pulled this pattern out of my production system and shipped it as an open-source, MIT-licensed TypeScript library: llm-ports . 60 second setup Configure providers in .env : LLM PROVIDER FAST=anthropic|<model |cost:50/day LLM PROVIDER SMART=anthropic|<model |cost:200/day LLM TASK ROUTE TRIAGE=fast,smart Create the port once: js import { createRegistryFromEnv } from "@llm-ports/core"; import { createAnthropicAdapter } from "@llm-ports/adapter-anthropic"; export const llm = createRegistryFromEnv { adapters: { anthropic: createAnthropicAdapter { apiKey: process.env.ANTHROPIC API KEY } , }, } .getPort ; Use it anywhere, with no SDK imports: js const result = await llm.generateText { taskType: "triage", prompt: "Classify this email...", } ; The registry selects the right model for the task, enforces cost limits, falls back through the provider chain on budget exhaustion, and records usage, cost, and latency. What you get - Multi-provider routing across OpenAI, Anthropic, Google Gemini, Ollama, and the Vercel AI SDK. - Fallback chains when a provider exceeds budget. - USD-based cost gating with hourly, daily, and monthly limits. Budget exhaustion is a typed exception, not a surprise invoice. - The 7 capability factories : createClassifier , createScorer , createDrafter , createSummarizer , createExtractor , createPlanner , createAnalyzer . - Validation recovery for structured output. If a model returns invalid JSON or a wrong enum, it auto-retries with a correction prompt. Bad output stops at the capability boundary instead of leaking downstream. - Tool-use safety primitives : destructive markers, confirmation-required actions, max output byte limits. - Observability hooks for cost, latency, quality, and outcomes. - No runtime dependency on LangChain or LlamaIndex. Core plus one adapter plus capabilities is a small install footprint, strict TypeScript throughout. How it compares - Vercel AI SDK unifies provider calls. llm-ports adds the registry, fallback chains, USD cost gating, validation recovery, and capability factories on top. There is an adapter to migrate from it incrementally. - LiteLLM is a Python-first HTTP proxy. llm-ports is TypeScript and runs in-process, no extra network hop. - Portkey is a commercial hosted gateway. llm-ports is MIT and has no hosted dependency. - LangChain.js is a framework. llm-ports is a lightweight architecture and control layer, not a framework you build your whole app inside. When to use it and when not to Use it if you run 2+ providers or might switch later , have 5+ call sites, keep getting bitten by SDK upgrades, or need cost control and centralized quality tracking. Skip it if you have 1 or 2 LLM calls, you are just prototyping, or you want a full agent framework with a built-in memory and RAG layer. Honest status llm-ports is pre-release, currently at 0.1.0-alpha.5 . The core architecture is stable with 250+ offline regression tests, but some adapter and agent paths are still being hardened multi-turn agent in the Vercel adapter and retry-on-runtime-error both land in v0.2 . The per-surface status is documented openly so you know what is solid before you adopt it. Try it npm install @llm-ports/core @llm-ports/adapter-anthropic @llm-ports/capabilities - npm: https://www.npmjs.com/package/@llm-ports/core https://www.npmjs.com/package/@llm-ports/core - GitHub 7 runnable examples, including email triage and PDF extraction : https://github.com/baabakk/llm-ports https://github.com/baabakk/llm-ports - Docs: https://baabakk.github.io/llm-ports/ https://baabakk.github.io/llm-ports/ If the capability-factory pattern matches how you are building, I would genuinely like feedback in GitHub Discussions. What shapes are you reimplementing that are not on the list of seven? What knobs do the capabilities need that they do not have yet? The LLM stops being a dependency you manage. It becomes infrastructure you configure. Once you make that shift, everything else gets simpler. Based on two longer write-ups: Ports and Adapters for AI and The 7 LLM Capabilities Every Production AI System Reimplements.