{"slug": "one-core-two-interfaces-no-rewrites", "title": "One Core, Two Interfaces, No Rewrites", "summary": "Developer builds Ask the Canon with a functional core shared by CLI and web interfaces, using caching and pre-warming to avoid latency on first request. The architecture separates ranking logic from presentation, allowing both interfaces to reuse the same engine without rewrites.", "body_md": "# One Core, Two Interfaces, No Rewrites\n\n*Building agentic AI? I co-run a 6-week cohort where you ship a production-ready agent, not another API wrapper.*\n\nWhen building applications, I always build the core first, then the interfaces. It was no different with [Ask the Canon](https://askthecanon.com): a `uv run main.py ask \"...\"`\n\nCLI for quick iteration and validation, then the web app for MVP. Search, ranking, citations, all using the same engine.\n\nAsk the Canon's core is a handful of pure functions in one module. Both interfaces are thin wrappers. This is the second post in a series on how it's built. [The first one](/blog/semantic-search-without-a-vector-database/) was about the retrieval engine. This one is about the wider architecture.\n\n## Functional core, two interfaces\n\nI just have one module with pure functions, clear contracts, and no hidden state:\n\n``` php\ndef embed(texts: list[str]) -> np.ndarray: ...\ndef load_library(book_ids=None) -> tuple[list[Passage], np.ndarray]: ...\ndef search_passages(query, passages, vectors, ...) -> list[tuple[int, float]]: ...\ndef reflow(text: str) -> str: ...\n```\n\n`load_library`\n\nreads the cached `.npy`\n\nfiles off disk and hands back a list of `Passage`\n\ntuples plus the stacked matrix.\n\n`search_passages`\n\ntakes those two and a query and returns ranked `(index, score)`\n\npairs.\n\nThe web layer consumes the core functions, no re-implementation:\n\n``` python\nfrom main import (\n    embed,\n    Passage,\n    humanize_author,\n    load_library,\n    reflow,\n    search_passages,\n)\n```\n\nThe CLI's `ask()`\n\nand the web app's `/api/ask`\n\nshare the same spine: load the library, call `search_passages`\n\n, walk the ranked `(index, score)`\n\npairs. From there each does its own thing. The CLI prints `rich`\n\npanels and offers an interactive deep-read; the web app serializes to `Match`\n\nJSON and logs a bit of analytics on the way out.\n\nThe *ranking* decision, what comes back and in what order, is shared. Everything downstream is presentation, which is exactly where a CLI and a web app *should* differ.\n\nWe do the same in our [agentic AI program](https://pythonagenticai.com): one core engine, three interfaces (CLI, Telegram, API / web dashboard).\n\n## I needed caching\n\n`load_library`\n\nis not cheap. It walks `books/`\n\n, reads a JSON file and an `.npy`\n\nfile per book, and stacks 80k vectors into one matrix with `np.vstack`\n\n. You don't want to pay that overhead on every HTTP request!\n\nIn the CLI that's a non-issue: the process loads once and exits. On the web side, it's one decorator away:\n\n``` php\nfrom functools import cache\n\n@cache\ndef library() -> tuple[list[Passage], np.ndarray]:\n    return load_library()\n```\n\n`@cache`\n\nturns the first call into the real load and every call after into a dictionary lookup, much faster.\n\n``` python\n@app.get(\"/api/ask\")\ndef ask(q: str, k: int = 5, per_book: int = 2, floor: float = 0.6) -> list[Match]:\n    passages, vectors = library()  # cached\n    ...\n```\n\n## Pre-warm on startup, not on the first visitor\n\nThere's a subtlety `@cache`\n\ndoesn't solve on its own. If the *first* request is what triggers `library()`\n\n(\"wakes up PyTorch\"), then the first real visitor pays that tax. App restarts are rare, but making the first visitor wait still isn't acceptable.\n\nFastAPI's `lifespan`\n\noffers a nice fix for this: do it as soon as the app starts, before the first request:\n\n``` python\n@asynccontextmanager\nasync def lifespan(app: FastAPI):\n    init_db()\n    logger.info(\"Pre-warming vector library and loading models into RAM...\")\n    _ = library()          # fills the @cache with the stacked matrix\n    _ = embed([\"warmup\"])   # forces PyTorch to wake up and allocate\n    logger.info(\"Ready for traffic.\")\n    yield\n\napp = FastAPI(title=\"classics\", lifespan=lifespan)\n```\n\n- I left a log line to watch the startup time. I also added some comments for possible collaborators and my future self.\n- I use\n`_`\n\nas a throwaway variable to make it clear the return value is ignored. - You can put shutdown logic after\n`yield`\n\n, similar to how pytest fixtures work. Clean.\n\nBy the time the first request lands, both are warm.\n\n## Lazy loading\n\nI am a proponent of imports at the top, but lazy loading is a serious performance consideration. It's coming in 3.15:\n\nLazy imports defer the loading and execution of a module until the first time the imported name is used, in contrast to ‘normal’ imports, which eagerly load and execute a module at the point of the import statement. -\n\n[PEP 810 – Explicit lazy imports]\n\nThat's the automatic version, landing in 3.15. Here I do it by hand: defer the model import into the function that needs it:\n\n``` python\n@cache\ndef _model():\n    import sentence_transformers as st  # lazy, so the offline env vars take effect first\n    return st.SentenceTransformer(EMBED_MODEL)\n```\n\nSo the model loads once, and only if something actually calls `_model()`\n\n. `@cache`\n\nhands back the same instance every time after.\n\nThe \"offline env vars\" part refers to the second reason I need the import here. At the top of the module I have:\n\n```\nos.environ.setdefault(\"HF_HUB_OFFLINE\", \"1\")\nos.environ.setdefault(\"TRANSFORMERS_OFFLINE\", \"1\")\nos.environ.setdefault(\"TQDM_DISABLE\", \"1\")\n```\n\nHugging Face reads `HF_HUB_OFFLINE`\n\n*at import time*. Import `sentence-transformers`\n\nbefore those are set and it will try to reach out to the internet, which is not what I want because I have the data and model cached locally. Set them first and the model stays fully offline, no surprise network calls.\n\n## Functions vs classes\n\nNone of this needs a class. The core is functions over plain data (`Passage`\n\nand `Chunk`\n\nare `NamedTuple`\n\ns), the only state is a memoized function, and the two interfaces are thin adapters that share common behavior.\n\nThat's the payoff. When I want a third interface tomorrow (e.g. a scheduled job or a different API), it imports the same functions and gets the same behavior for free.\n\nClaude scaffolded a first version fast, which saved time. But the offline-import ordering, the pre-warming, the lazy loading, the thin adapters, and the split between core and interface: all that took multiple iterations and engineering judgment. The kind you only catch if you already know to look, and that a per-session agent easily writes past.\n\nAs [I wrote here](/blog/ai-accelerator-needs-direction/), AI is an accelerator, not a compass. And as [I argued here](/blog/ai-doesnt-change-what-software-engineering-is/), it's this engineering judgment that AI doesn't change. Going from prototype to production is still a complex, human job.\n\nNext up in part 3: the small post-processing tricks that make the results actually good, no bigger model required.\n\nMost AI tutorials end at \"call the API.\" This cohort ends with a deployed agent: function calling, structured outputs, three interfaces, Docker, 95%+ test coverage. Six weeks of real engineering, not notebooks. [Join the next Agentic AI cohort →](https://pythonagenticai.com)", "url": "https://wpnews.pro/news/one-core-two-interfaces-no-rewrites", "canonical_source": "https://belderbos.dev/blog/two-interfaces-one-core/", "published_at": "2026-07-04 00:00:00+00:00", "updated_at": "2026-07-04 17:26:06.352334+00:00", "lang": "en", "topics": ["developer-tools", "ai-tools", "artificial-intelligence"], "entities": ["Ask the Canon", "FastAPI", "PyTorch"], "alternates": {"html": "https://wpnews.pro/news/one-core-two-interfaces-no-rewrites", "markdown": "https://wpnews.pro/news/one-core-two-interfaces-no-rewrites.md", "text": "https://wpnews.pro/news/one-core-two-interfaces-no-rewrites.txt", "jsonld": "https://wpnews.pro/news/one-core-two-interfaces-no-rewrites.jsonld"}}