{"slug": "treat-the-context-window-as-a-data-assembly-problem", "title": "Treat the Context Window as a Data Assembly Problem", "summary": "A developer argues that assembling context for large language models is fundamentally a data assembly problem, not a prompt engineering one. The author introduces pydantic-resolve as a tool to structure context assembly declaratively, similar to how API response assembly is handled in FastAPI, to avoid procedural code that mixes database queries, vector retrieval, and LLM calls.", "body_md": "# Treat the Context Window as a Data Assembly Problem: Where pydantic-resolve Fits in AI Workflows\n\n## A typical piece of AI code\n\nOpen any service in your project that calls an LLM. You will most likely see a function that looks something like this:\n\n``` php\nasync def build_support_context(ticket_id: int) -> str:\n    ticket = await db.get(Ticket, ticket_id)\n    customer = await db.get(Customer, ticket.customer_id)\n\n    recent_tickets = await db.query(Ticket).filter(\n        Ticket.customer_id == customer.id\n    ).order_by(Ticket.created_at.desc()).limit(5).all()\n\n    # Retrieve similar past tickets\n    embedding = await embed(ticket.description)\n    similar = await vector_store.search(embedding, top_k=3)\n\n    # Each similar ticket needs its resolution pulled in\n    similar_with_resolution = []\n    for s in similar:\n        resolution = await db.query(Resolution).filter(\n            Resolution.ticket_id == s.id\n        ).first()\n        similar_with_resolution.append({\n            \"title\": s.title,\n            \"resolution\": resolution.text if resolution else \"\",\n        })\n\n    # Collect tags\n    all_tags = []\n    for t in recent_tickets:\n        all_tags.extend(t.tags)\n\n    # Finally, ask the LLM to summarize\n    summary = await llm.summarize(\n        customer=customer,\n        recent_tickets=recent_tickets,\n        similar=similar_with_resolution,\n    )\n\n    return f\"\"\"\nCustomer: {customer.name} (id={customer.id})\nRecent tickets: {len(recent_tickets)}\nTags: {', '.join(set(all_tags))}\nSimilar past cases:\n{format_similar(similar_with_resolution)}\nSummary: {summary}\n\"\"\"\n```\n\nThe function is not long, but the problem is already visible: **this is a build_context() function that is fundamentally doing data assembly, but its shape is entirely procedural**.\n\nIt is isomorphic to the FastAPI code that [Clean Architecture for Python](../architecture_entity_first/) criticizes — only \"assembling an API response\" has been swapped for \"assembling a prompt context\". The problems are unchanged:\n\n- Data-fetching logic is scattered through the function body with no structure.\n- Dependencies of derived fields (\n`all_tags`\n\n,`summary`\n\n) are held together by comments and line ordering. - Vector retrieval, database queries, and LLM calls live in one function. Every new piece of context means editing this function.\n- Concurrency optimization (fetching similar tickets in parallel) requires a rewrite.\n- Reuse — say, exposing\n`recent_tickets`\n\nto the frontend too — is impossible.\n\nThis code is not badly written. **It has no home**.\n\n## \"The context window\" is a data assembly problem\n\nWhen people discuss LLM applications, attention usually lands first on prompt templates, model choice, and temperature. Those matter — but as applications grow, **the real bottleneck shifts from prompt engineering to context assembly**.\n\nThe reason: prompt templates are stable, model choice is stable, **but \"what data to feed the LLM\" differs on every call**. A support agent handling ticket A and ticket B can share the same prompt template, yet the underlying data-assembly path may diverge completely — A is a VIP customer requiring SLA context and similar-case retrieval; B is a regular customer needing only the basics.\n\nThis \"same template, different data-assembly path\" requirement **is exactly what API response assembly does**. Your FastAPI project already solves it — different endpoints assemble different response trees. An LLM context is just another endpoint, only the consumer is an LLM rather than an HTTP client.\n\nOnce that perspective lands, the problem becomes concrete. The things pydantic-resolve solves well on the API side hold equally well on the LLM side:\n\n| API response assembly | LLM context assembly |\n|---|---|\n| Multi-level nesting (Sprint → Task → Owner) | Multi-level nesting (Customer → Ticket → Similar Ticket) |\n| Batch-load related data | Batch-recall related context |\nDerived fields (`task_count` , `contributors` ) |\nDerived context (`summary` , `aggregated_tags` ) |\n| N+1 database queries | N+1 vector retrievals + N+1 LLM calls |\n| Cross-subtree aggregation (deduplicate all owners) | Cross-subtree aggregation (merge evidence across similar tickets) |\n\n**Every item in the right column already has a solution on the left.** We only need to bring the same machinery over.\n\n## Three classic assembly pain points\n\nBreaking the `build_support_context`\n\nsnippet apart reveals three symptom classes. They are not specific to support scenarios — they recur in nearly every LLM application.\n\n### Pain point 1: N+1 LLM calls\n\n```\nfor s in similar:\n    resolution = await db.query(Resolution).filter(...).first()\n```\n\nThis is a classic N+1 on the ORM side. In the LLM world it gets worse — you might be calling the LLM in the loop:\n\n```\nfor s in similar:\n    s.summary = await llm.summarize(s.description)   # 5 similar tickets = 5 serial LLM calls\n```\n\nLLM calls are an order of magnitude more expensive than database queries. Serial N+1 directly amplifies cost and latency. **And code without a batching abstraction always ends up like this**, because nobody manually maintains a batch queue inside procedural code.\n\nReal-world evidence: open-webui\n\n`backend/open_webui/utils/middleware.py:2635`\n\n(commit `02dc3e6`\n\n, 2026-06)\n\n```\nfor sid in all_skill_ids:\n    if sid in accessible_skill_ids:\n        s = await SkillsModel.get_skill_by_id(sid)   # serial N+1\n```\n\nThe same file has at least three more instances (folder lookup, tool connection, access check), all `await`\n\n-inside-a-`for`\n\n. open-webui is a production-grade AI application, and it still falls into this trap — evidence that the trap is structural, not a coding-quality issue.\n\n### Pain point 2: Cross-subtree aggregation has no home\n\n```\nall_tags = []\nfor t in recent_tickets:\n    all_tags.extend(t.tags)\n```\n\nThis \"walk the subtree and collect things\" logic, in procedural code, can only be written as global variables plus a for loop. As soon as aggregation needs grow — all similar-ticket resolutions, all products mentioned, all features touched — you get a pile of `all_xxx = []`\n\nlists scattered across the function, held together by convention.\n\nWhat makes this worse is that **these aggregations are inherently \"parent depends on children\"**. In procedural code, they are separated from child-fetch logic. Fetching is above the `for`\n\nloop; aggregation is below. The parent→child dependency has been reduced to \"line number ordering\".\n\nReal-world evidence: open-webui\n\n`backend/open_webui/utils/middleware.py`\n\nchat-completion orchestration (commit `02dc3e6`\n\n, 2026-06)\n\n```\nsources = []\nsources.extend(flags.get('sources', []))   # line 2882\nsources.extend(flags.get('sources', []))   # line 2892\nsources = [s for s in sources if ...]      # line 2909: mid-function reassignment\nevents.append({'sources': sources})        # line 2916: another accumulator\n```\n\n`sources`\n\nand `events`\n\nhave no structured parent-child dependency declaration — they're stitched across handlers with `extend`\n\n. This is exactly the \"aggregation has no home\" pattern from the previous section — not a one-off defect, but the inevitable shape of procedural code that has to coordinate context across multiple sources.\n\n### Pain point 3: Prompt shape is welded to data fetching\n\n```\nreturn f\"\"\"\nCustomer: {customer.name} (id={customer.id})\n...\nSummary: {summary}\n\"\"\"\n```\n\nThis final f-string welds three things together: **data fetching, derived computation, prompt format**. Touching the prompt template means touching the data code; touching data fetching means touching the prompt text; adding a field means editing from top to bottom.\n\nThis is the limit of procedural code: **it has no structure, so every change is invasive**.\n\nReal-world evidence: open-webui\n\n`backend/open_webui/utils/middleware.py:931`\n\n`get_source_context`\n\n(commit `02dc3e6`\n\n, 2026-06)\n\n``` php\ndef get_source_context(sources, ...) -> str:\n    context_string = ''\n    for source in sources:\n        for doc, meta in zip(source.get('document', []),\n                             source.get('metadata', [])):\n            context_string += (\n                f'<source id=\"{...}\" name=\"{...}\">{body}</source>\\n'\n            )\n    return context_string\n```\n\nIteration, XML template string, and f-string formatting all welded into one function — structurally identical to the hypothetical `build_support_context()`\n\nat the top of this article. Not a coincidence; this is the typical shape of procedural LLM code.\n\n## Redefinition: LLM context = response tree\n\nWith the three pain points diagnosed, the fix is clear: **assemble the LLM context as a response tree**.\n\nOn the API side, you already speak this language:\n\n```\nclass SprintView(BaseModel):\n    id: int\n    name: str\n    tasks: list[TaskView] = []\n    task_count: int = 0           # post_*\n\n    def resolve_tasks(self, loader=Loader(task_loader)):\n        return loader.load(self.id)\n\n    def post_task_count(self):\n        return len(self.tasks)\n```\n\nBringing this language to the LLM case requires only a change of perspective: **the tree root is no longer Sprint but some conversation context; the leaves are no longer Task but some field an LLM will read**. When you `model_dump()`\n\nthe result, you either feed it to a prompt template or JSON-serialize it as a tool-call argument.\n\n```\nflowchart LR\n    subgraph Tree[\"Context response tree\"]\n        Ctx[\"SupportContext<br/>conversation context\"]\n        Cust[\"CustomerView\"]\n        Tickets[\"list[TicketView]\"]\n        Similar[\"list[SimilarTicketView]\"]\n        Summary[\"summary (post_*)\"]\n        Ctx --> Cust\n        Ctx --> Tickets\n        Tickets --> Similar\n        Ctx --> Summary\n    end\n    Tree -->|model_dump + prompt template| LLM[\"LLM\"]\n```\n\nThe tree shape is defined by your Pydantic model; data is fetched by `resolve_*`\n\n; derived fields are computed by `post_*`\n\n; cross-subtree aggregation is handled by `Collector`\n\n. **Same machinery as an API response, same Resolver, same batch loaders.**\n\n## Mechanism mapping\n\nPutting that perspective into code gives three one-to-one mappings:\n\n| AI assembly need | pydantic-resolve primitive | Role in the LLM scenario |\n|---|---|---|\n| Pull external knowledge (DB, vector store, external APIs) | `resolve_*` + `Loader` |\nRecall related docs, similar tickets, user profile |\n| Call the LLM for derivation after subtree is ready | `post_*` (supports async) |\nSummary, classification, risk assessment — `post_*` execution timing guarantees a complete subtree |\n| Aggregate evidence / tags / fragments across subtrees | `Collector` + `SendTo` |\nPool signals scattered across leaves back to the root, feed them to the LLM as grounding |\n\nThese three primitives cover the three pain points exactly. The next section walks through a concrete example.\n\n## Walkthrough: a customer support agent context\n\nRewriting the opening `build_support_context`\n\nwith pydantic-resolve. First, the model definitions:\n\n``` python\nfrom typing import Annotated, Optional\nfrom pydantic import BaseModel\nfrom pydantic_resolve import (\n    Collector, Loader, Resolver, SendTo, build_list, build_object,\n)\n\n# ---------- Data access layer (loaders) ----------\n\nasync def customer_loader(customer_ids: list[int]) -> list[CustomerView]:\n    rows = await db.query(Customer).filter(Customer.id.in_(customer_ids)).all()\n    return build_object(rows, customer_ids, lambda c: c.id)\n\nasync def ticket_loader(ticket_ids: list[int]) -> list[dict]:\n    # Used to pull a customer's most recent tickets by customer_id\n    rows = await db.query(Ticket).filter(\n        Ticket.customer_id.in_(ticket_ids)\n    ).order_by(Ticket.created_at.desc()).limit(5 * len(ticket_ids)).all()\n    return build_list(rows, ticket_ids, lambda t: t.customer_id)\n\nasync def similar_ticket_loader(ticket_ids: list[int]) -> dict[int, list[dict]]:\n    # One batched vector recall: search all query embeddings at once\n    queries = await db.query(Ticket).filter(Ticket.id.in_(ticket_ids)).all()\n    embeddings = await embed_batch([t.description for t in queries])\n    results = await vector_store.batch_search(embeddings, top_k=3)\n    return {\n        t.id: [r.dict() for r in results[i]]\n        for i, t in enumerate(queries)\n    }\n\n# ---------- Context model tree ----------\n\nclass SimilarTicketView(BaseModel):\n    id: int\n    title: str\n    resolution: str = \"\"\n\n    def resolve_resolution(self, loader=Loader(resolution_loader)):\n        return loader.load(self.id)\n\nclass TicketView(BaseModel):\n    id: int\n    title: str\n    description: str\n    customer_id: int\n    tags: list[str] = []\n    similar: list[SimilarTicketView] = []\n    resolution_summary: str = \"\"   # post_*, LLM-derived\n\n    def resolve_similar(self, loader=Loader(similar_ticket_loader)):\n        return loader.load(self.id)\n\n    async def post_resolution_summary(self):\n        # LLM called after subtree is ready: every similar.resolution has been resolved\n        if not self.similar:\n            return \"\"\n        return await llm.summarize_resolutions(\n            ticket_title=self.title,\n            resolutions=[s.resolution for s in self.similar],\n        )\n\nclass SupportContext(BaseModel):\n    \"\"\"Root context: maps directly to the information one LLM call needs.\"\"\"\n    ticket_id: int\n    ticket: Optional[TicketView] = None\n    customer: Optional[CustomerView] = None\n    recent_tickets: list[TicketView] = []\n\n    # Collector aggregates tags from all child tickets\n    all_tags: list[str] = []\n    # Root-level LLM summary: runs only after the whole subtree is ready\n    grounded_summary: str = \"\"\n\n    def resolve_ticket(self, loader=Loader(ticket_by_id_loader)):\n        return loader.load(self.ticket_id)\n\n    def resolve_customer(self, loader=Loader(customer_loader)):\n        return loader.load(self.ticket.customer_id) if self.ticket else None\n\n    def resolve_recent_tickets(self, loader=Loader(ticket_loader)):\n        return loader.load(self.customer.id) if self.customer else []\n\n    def post_all_tags(self, collector=Collector(\"tag_pool\")):\n        # Collector gathers tags upward from all child TicketViews\n        return sorted(set(collector.values()))\n\n    async def post_grounded_summary(self):\n        # LLM call after the entire tree is ready\n        return await llm.summarize_context(\n            customer=self.customer,\n            ticket=self.ticket,\n            recent=self.recent_tickets,\n            all_tags=self.all_tags,\n        )\n\nclass TicketView(TicketView):  # The same TicketView feeds both recent_tickets and the tag collector\n    tags: Annotated[list[str], SendTo(\"tag_pool\")] = []\n```\n\nInvocation:\n\n```\nctx = SupportContext(ticket_id=42)\nctx = await Resolver().resolve(ctx)\n\nprompt = render_prompt(ctx.model_dump())  # feed straight into a template\nresponse = await llm.chat(prompt)\n```\n\n### Execution flow\n\n``` php\nflowchart TB\n    A[\"Resolver().resolve(SupportContext(ticket_id=42))\"] --> B[\"resolve_ticket<br/>fetch main ticket\"]\n    B --> C[\"resolve_customer<br/>fetch customer\"]\n    C --> D[\"resolve_recent_tickets<br/>batch fetch customer's 5 most recent tickets\"]\n    D --> E[\"each TicketView.resolve_similar<br/>batch vector recall\"]\n    E --> F[\"each SimilarTicketView.resolve_resolution<br/>batch fetch resolutions\"]\n    F --> G[\"each TicketView.post_resolution_summary<br/>batch LLM summary\"]\n    G --> H[\"SupportContext.post_all_tags<br/>Collector aggregates all tags\"]\n    H --> I[\"SupportContext.post_grounded_summary<br/>root-level LLM summary\"]\n    I --> J[\"ctx.model_dump()\"]\n```\n\nEach pain point is addressed in turn:\n\n**Pain point 1 (N+1 LLM calls)**: All`TicketView.post_resolution_summary`\n\ncalls sit at the same depth, and pydantic-resolve**dispatches them in a batch**— no need to manually`gather`\n\ninside a loop. If you want to push batching further, wrap the LLM call itself in a`Loader`\n\n(multiple same-template requests collapse into one batch API call).**Pain point 2 (cross-subtree aggregation)**:`all_tags`\n\nflows through`Collector(\"tag_pool\")`\n\n;`TicketView.tags`\n\ndeclares`SendTo(\"tag_pool\")`\n\nto ship values upward. Aggregation has a fixed home — no more for loops and global variables.**Pain point 3 (shape welded to fetching)**: The prompt template and the model definition are separated —`render_prompt(ctx.model_dump())`\n\n. Editing the prompt text touches no model code; adding a field doesn't move the template; every fetch lives independently inside its`resolve_*`\n\n.\n\n### Output\n\n```\nprint(ctx.model_dump_json(indent=2))\n{\n  \"ticket_id\": 42,\n  \"ticket\": {\n    \"id\": 42,\n    \"title\": \"Login button unresponsive on Safari\",\n    \"description\": \"...\",\n    \"tags\": [\"auth\", \"safari\"],\n    \"similar\": [\n      { \"id\": 101, \"title\": \"Safari click event issue\", \"resolution\": \"...\" },\n      { \"id\": 187, \"title\": \"WebKit pointer-events bug\", \"resolution\": \"...\" }\n    ],\n    \"resolution_summary\": \"Likely a WebKit pointer-events issue; see ticket #187.\"\n  },\n  \"customer\": { \"id\": 7, \"name\": \"Acme Corp\", \"tier\": \"enterprise\" },\n  \"recent_tickets\": [ /* ... */ ],\n  \"all_tags\": [\"auth\", \"billing\", \"safari\", \"webkit\"],\n  \"grounded_summary\": \"Enterprise customer Acme Corp reported a Safari-specific login issue...\"\n}\n```\n\nThis tree can be serialized and fed straight into an LLM, or sliced apart — return the `recent_tickets`\n\nfield to a frontend dashboard with zero extra code.\n\n## Comparison with other approaches\n\n| Approach | Where assembly lives | N+1 protection | Cross-subtree aggregation | Reuse with API responses |\n|---|---|---|---|---|\nHand-written `build_context()` |\nInlined in function body | None | Globals / for loops | None |\n| LangChain retrieval chain | Chained nodes | Implementation-dependent | Glued via chain composition | Fully separated from API |\n| Naked RAG (embed → search → stuff) | A few inlined lines | Usually single-shot | None | Fully separated from API |\n| pydantic-resolve context tree | Model field declarations | Built-in batching | `Collector` / `SendTo` |\nSame source as API responses |\n\nWorth noting: this is **not a replacement** for LangChain. LangChain orchestrates the sequence of LLM calls; pydantic-resolve assembles the structured context each step consumes. In a complex agent pipeline the two stack cleanly: pydantic-resolve prepares structured context for every step; LangChain (or any agent framework) schedules the execution.\n\n## One Entity graph, four consumer types\n\nPush this further and a deeper payoff appears.\n\nOnce a project is in ERD mode, **REST, GraphQL, MCP, and LLM Context all derive from the same Entity graph**:\n\n```\nflowchart TB\n    ERD[\"Entity + ER Diagram<br/>the single source of relationships\"]\n    ERD --> REST[\"REST Response<br/>traditional API consumer\"]\n    ERD --> GQL[\"GraphQL<br/>flexible-query consumer\"]\n    ERD --> MCP[\"MCP Service<br/>AI agent tool consumer\"]\n    ERD --> CTX[\"LLM Context Tree<br/>AI agent context consumer\"]\n    REST --> Resolver[\"same Resolver engine\"]\n    GQL --> Resolver\n    CTX --> Resolver\n    MCP --> GQL\n```\n\nConcretely:\n\n- The\n`TicketView`\n\nyou wrote for the support dashboard**is** the`TicketView`\n\nthe LLM sees,**is** the GraphQL node MCP exposes. - The \"Task has one owner\" relationship is defined once and reused by all four consumers automatically.\n- Change the relationship — all four places update together. Add a consumer — the relationship definition stays untouched.\n\nThis is where pydantic-resolve truly fits in AI workflows — **not another LLM framework, but a stable home for AI context assembly**. As AI agents become a standard consumer in your system, the dividend of \"same source\" compounds.\n\n## When to use it, when not to\n\n**Use it when:**\n\n- LLM context needs 2+ levels of nesting (root + related data + related-of-related).\n- The same domain model serves both an API and an LLM.\n- You have a loop calling the LLM per item — N+1 is burning money.\n- Cross-subtree aggregation is needed to ground the LLM (evidence, tags, fragments).\n- A multi-step agent pipeline where every step needs its own context assembled.\n\n**Skip it when:**\n\n- The context is a static text plus a few variables — f-string it.\n- It's a one-shot script or prototype — procedural code is faster.\n- There's a single LLM call with no related-data fetch —\n`resolve_*`\n\nis unnecessary abstraction. - LangChain is already in place and the chain is stable — adding another layer adds cognitive load without benefit.\n\nThe heuristic is simple: **when you start writing the second build_xxx_context() function and notice it overlaps with the first, it's time to migrate**. This is the same adoption signal pydantic-resolve uses on the API side — only this time, the consumer is an LLM instead of a browser.\n\n## Conclusion\n\nThe complexity of LLM applications ultimately lands on **context assembly**, not on prompt templates. Today's AI projects are full of hand-written `build_context()`\n\nfunctions carrying the same scattered logic that Service/Route layers used to carry in FastAPI projects — and pydantic-resolve already solved that once on the API side.\n\nTreating LLM context as a response tree, three primitives cover three pain points:\n\n`resolve_*`\n\npulls external knowledge with built-in batching, killing N+1.`post_*`\n\nis the LLM hook, batch-dispatched after the subtree is ready — prompt shape decoupled from data fetching.`Collector`\n\n/`SendTo`\n\ngive cross-subtree aggregation a fixed home, replacing global variables.\n\nThe broader payoff: your Entity graph now has **four standard consumers** — REST, GraphQL, MCP, LLM Context — with the relationship defined once. AI is not a special case that needs its own graph. It is just another reader of the same tree.\n\n**Invest in your domain model, not in your prompt template.** The longer the context window and the more complex the agent pipeline, the larger this dividend grows.", "url": "https://wpnews.pro/news/treat-the-context-window-as-a-data-assembly-problem", "canonical_source": "https://klr-pattern.github.io/pydantic-resolve/blog_context_assembly_for_llm/", "published_at": "2026-06-25 05:39:46+00:00", "updated_at": "2026-06-25 06:13:58.299061+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "developer-tools"], "entities": ["pydantic-resolve", "FastAPI", "LLM"], "alternates": {"html": "https://wpnews.pro/news/treat-the-context-window-as-a-data-assembly-problem", "markdown": "https://wpnews.pro/news/treat-the-context-window-as-a-data-assembly-problem.md", "text": "https://wpnews.pro/news/treat-the-context-window-as-a-data-assembly-problem.txt", "jsonld": "https://wpnews.pro/news/treat-the-context-window-as-a-data-assembly-problem.jsonld"}}