{"slug": "how-a-net-dev-built-an-ai-assistant", "title": "How a .NET dev built an AI assistant", "summary": "A .NET developer building an AI assistant for an interactive 3D learning app describes the architectural decisions behind Cori, an assistant that can talk and act simultaneously in a live 3D viewer. The team used Microsoft.Extensions.AI abstractions to keep the codebase vendor-neutral, avoiding lock-in to any single AI provider.", "body_md": "Did you just get the task to “make an AI assistant” — and you mainly do .NET? Same. I’m also one of those people who rolls their eyes at a lot of the AI hype, and the internet is full of articles where every confident tutorial contradicts the previous one.\n\nSo instead of publishing one more “definitive guide” that will age badly in two weeks, here’s the version I wish I’d found: what my team actually decided to build, the wrong turns we took on the way, and the code that finally made it click — written for people who know C# but have never built an AI feature.\n\nThis is not a best-practices sermon. It’s more like: here’s the problem, here’s what we nearly built, here’s what annoyed me, and here’s what we landed on.\n\nI work on an app that helps kids learn through interactive 3D models — a heart, a cell, a volcano — in the browser, AR, and VR. We’re adding **Cori**, an assistant you can talk to about whatever model is currently on screen.\n\n“Rotate the heart left.” →\n\nit rotates\n\n“Why is this chamber bigger?” →it explains\n\nThat means Cori has to **talk and act at the same time**, and both of those outputs have to reach a live 3D viewer.\n\nThat, for me, is the actual problem. Not “which model is smartest?” Not “which SDK has the coolest demo?” The interesting part is **how the output gets to the client**.\n\nBecause once you stop thinking about the model as the product and start thinking about delivery as the problem, the architectural decisions get a lot clearer.\n\n**Our stack, for context:** the backend is .NET with **Wolverine + Marten on PostgreSQL** and the frontend is **Svelte**. Fair warning: this is not exactly the most well-paved road in AI land. A lot of AI tooling assumes Python or TypeScript first, and .NET support often arrives later, half-finished, or not at all. So if you’re on a similar stack, you’re probably not picking from polished examples — you’re cutting the path by hand.\n\nBefore the story, here are the four words every AI article uses as if everybody was born knowing them.\n\n`Rotate`\n\nor `SearchContent`\n\n. Mid-response, it can ask for one of those functions to be called. It does not run your C# code itself — it asks, and your code executes it.`AIAgent`\n\ntype.That’s the whole glossary. Enough vocabulary to survive the rest of the article without having to alt-tab every two minutes.\n\nThe first decision had nothing to do with streaming or transport. It came from plain distrust.\n\nMy biggest fear was not “will the model be smart enough?” It was tying the whole codebase to one vendor SDK, one framework, one opinionated stack, and then getting stranded the moment the ecosystem changed direction — which, in AI, it absolutely will.\n\nSo the first rule became simple: **talk to abstractions**. Use generic interfaces in application code, and keep the actual provider — OpenAI, Deepgram, whoever wins this month — behind DI where it belongs.\n\nPleasant surprise: .NET actually gives you this now. ** Microsoft.Extensions.AI** is basically the\n\n`ILogger`\n\npattern, but for AI:`IChatClient`\n\n— provider-neutral chat / LLM access`IEmbeddingGenerator`\n\n— embeddings for vector or semantic search`ITextToSpeechClient`\n\n— text-to-speech`Microsoft.Extensions.VectorData`\n\n— vendor-neutral vector store abstractionsThat means the provider stays a registration detail:\n\n```\n// Register once: OpenAI hidden behind the generic IChatClient.\nbuilder.Services.AddKeyedSingleton<IChatClient>(\"CoriAI\", (sp, _) =>\n    sp.GetRequiredService<OpenAIClient>()\n      .GetChatClient(\"gpt-4o-mini\")\n      .AsIChatClient()\n      .AsBuilder()\n      .UseFunctionInvocation()\n      .Build());\n```\n\nAnd consuming it is boring in exactly the right way:\n\n```\npublic sealed class Summarizer([FromKeyedServices(\"CoriAI\")] IChatClient chat)\n{\n    public async Task<string> OneLiner(string topic, CancellationToken ct)\n    {\n        var reply = await chat.GetResponseAsync(\n            $\"Explain {topic} in one sentence.\",\n            cancellationToken: ct);\n\n        return reply.Text;\n    }\n}\n```\n\nNothing in that class knows or cares whether OpenAI is behind it. That is the point.\n\nFull honesty: when we built the first version of Cori, we did **not** follow the neat abstraction rule I just described. We wired the code straight to **Semantic Kernel**, using its types directly and leaning on a bunch of APIs politely labeled `[Experimental]`\n\n.\n\nThen Microsoft merged the world around it, **Microsoft Agent Framework** showed up, Semantic Kernel stopped looking like the future, and the framework we had invested time into became “the old path” while we were still building.\n\nWhich meant we had to rewrite more code than I would like to admit.\n\nThat experience is exactly why I’m now stubborn about the abstraction layer. Part 0 is not hindsight wisdom from a calm architect on a mountain. It’s the bruise talking.\n\nThe new pipeline is built on `Microsoft.Extensions.AI`\n\nspecifically because the next time Microsoft changes direction — and there is always a next time — I want that change to be a DI swap, not a weekend of `find in files`\n\nand quiet swearing.\n\nOnce the “brain” side was sorted, the real question became transport: **how do the AI’s words and actions actually reach the client?**\n\nCori produces two very different kinds of output:\n\nAnd at that point, every .NET developer has the same reflex: *we already have a real-time transport, just use SignalR for all of it.*\n\nOn paper, it looked perfect. One connection. One typed client. Browser and Unity both covered.\n\n```\n                ┌──────── ONE SignalR hub ────────┐\n  client  ◄────►│  text + tool calls + state      │  ← semantic\n                │  mic audio up / voice down      │  ← audio\n                └──────────────────────────────────┘\n```\n\nThen we wrote down what that would actually mean building:\n\n`start → args → end`\n\nThat is a surprising amount of custom transport code.\n\nAnd none of that code is the product. It is just plumbing we invented for ourselves, in a format we would have to maintain forever.\n\nThat was the first important moment: we realized we were about to spend serious effort hand-building a private AI event protocol when maybe, just maybe, someone had already done that part for us.\n\nThe reason we switched was the same instinct as before: I really did not want to own a custom message format for the next several years.\n\nMAF ships with support for **AG-UI**, an open streaming protocol from the CopilotKit world. And annoyingly enough, it already defines almost the exact event shape we were preparing to invent by hand.\n\nTurning the agent into a streaming endpoint is basically one line:\n\n``` js\nvar agent = app.Services.GetRequiredKeyedService<AIAgent>(\"Cori\");\napp.MapAGUI(\"/cori\", agent);\n```\n\nFrom the frontend side, it is just an HTTP POST and a streamed response:\n\n``` js\nconst res = await fetch(\"/cori\", {\n  method: \"POST\",\n  headers: { \"Content-Type\": \"application/json\" },\n  body: JSON.stringify({ message: \"Rotate the heart left\" }),\n});\n\nconst reader = res.body!.getReader();\n// decode each event/data line as it arrives\n```\n\nThe response comes back as **Server-Sent Events** — a long-lived HTTP response that keeps writing labeled lines until the run is finished.\n\nAnd the event stream is exactly the sort of thing we needed:\n\n```\nevent: TEXT_MESSAGE_CONTENT   data: {\"delta\":\"Sure, rotating \"}\nevent: TEXT_MESSAGE_CONTENT   data: {\"delta\":\"the heart now…\"}\nevent: TOOL_CALL_START        data: {\"name\":\"Rotate\"}\nevent: TOOL_CALL_ARGS         data: {\"direction\":\"LEFT\",\"degrees\":45}\nevent: TOOL_CALL_END          data: {}\nevent: RUN_FINISHED           data: {}\n```\n\nThat is the whole magic trick.\n\nThe text types itself into the UI. Then the model decides to call `Rotate`\n\n. Then the 3D viewer reacts. **Words and actions travel together on one stream**, and the streaming protocol, event lifecycle, and session mechanics are not our responsibility anymore.\n\nThat was the first time the architecture started to feel sane.\n\nThe agent registration stays pleasantly small:\n\n``` js\nbuilder.Services.AddKeyedSingleton<AIAgent>(\"Cori\", (sp, _) =>\n    sp.GetRequiredKeyedService<IChatClient>(\"CoriAI\")\n      .AsAIAgent(new ChatClientAgentOptions\n      {\n          ChatOptions = new() { Instructions = CoriSystemPrompt.Base },\n          Tools = [ /* Rotate, Zoom, SearchContent, ... */ ]\n      }));\n```\n\nAnd grounding it in our own curriculum is just a context provider:\n\n```\npublic sealed class ContentSearchProvider(IHybridContentSearch search) : AIContextProvider\n{\n    public override async ValueTask<AIContext> ProvideAIContextAsync(\n        InvokingContext context,\n        CancellationToken ct)\n    {\n        var q = context.RequestMessages.LastOrDefault(m => m.Role == ChatRole.User)?.Text;\n        var hits = await search.SearchAsync(q, topK: 3, ct);\n\n        return new AIContext\n        {\n            Instructions = $\"Use this curriculum if relevant:\\n{Format(hits)}\"\n        };\n    }\n}\n```\n\nThen you attach it to the agent and MAF calls it before each turn:\n\n```\n.AsAIAgent(new ChatClientAgentOptions\n{\n    ChatOptions = new() { Instructions = CoriSystemPrompt.Base },\n    AIContextProviders = [ new ContentSearchProvider(search) ],\n})\n```\n\nThat matters because it keeps Cori anchored in our own educational content instead of free-associating from half-remembered internet knowledge.\n\nHere is the important catch: **AG-UI only carries text and JSON**. No binary audio.\n\nAt first that sounds like a limitation. In practice, it turned out to be the insight.\n\nBecause audio was always going to be its own problem anyway.\n\nSo the real choice was never “SignalR or AG-UI?” The real choice was:\n\ndo we hand-build the text channel on top of SignalR, or take AG-UI for free and solve audio separately — which we were going to have to do either way?\n\nOnce we phrased it like that, the argument mostly ended itself.\n\nThis was the actual architectural decision.\n\nWe never found one ready-made approach that gave us everything — text streaming, tool calls, state, audio, cross-platform friendliness, decent developer ergonomics — in one clean package.\n\nSo instead of forcing everything through one pipe, we split the problem into two planes.\n\n```\n        ┌──────── Browser / Unity VR client ────────┐\n        │  AG-UI (POST + SSE)      Audio (WebSocket) │\n        └──────┬──────────────────────────┬─────────┘\n   text/tools/ │ POST turn / SSE reply     │ mic up / voice down\n   state       ▼                           ▼\n     ┌──────────────────────┐   ┌──────────────────────────┐\n     │ MapAGUI(\"/cori\",agent)│  │ Audio gateway (no agent) │\n     │  AIAgent — TEXT ONLY  │◄─┤  speech→text, voice down │\n     │  tools = 3D commands  │ transcript cancel on barge-in│\n     └──────────────────────┘   └──────────────────────────┘\n```\n\nThat last point mattered more than expected.\n\nBecause once the agent no longer cares whether the input came from typing, speech-to-text, browser chat, or VR, the same brain can serve all of them. The channel becomes an integration detail instead of something smeared through the entire design.\n\nThis is not a fairy-tale architecture with no cost.\n\nSplitting the planes means coordinating two channels. Barge-in gets trickier. The Unity story for AG-UI still feels less proven than I’d like. And the .NET AG-UI host is preview enough that version pinning is not optional.\n\nStill, the trade-off felt worth it.\n\nOwning an open protocol is better than owning a private one. POST/SSE and WebSocket also pass more easily through school networks than anything that smells like WebRTC or UDP, which matters a lot more in education than flashy diagrams on conference slides.\n\nI’m not going to fake certainty here. The text side feels understandable now. Voice still does not.\n\nText is nice because it is turn-based and bounded. One request, one response. Voice is continuous, latency-sensitive, messy, and full of edge cases: users interrupting, classroom noise, weird pauses, headsets with personality disorders, and all the other things reality likes to contribute.\n\nWhat we have right now is a **shape**, not a finished answer.\n\nConceptually, it looks like this:\n\n```\nmic ─► speech-to-text ─► transcript ─► FE ─► MapAGUI run ─► text + tool calls\n                                        FE ─► audio channel ─► text-to-speech ─► voice down\n```\n\nBecause “the technically correct real-time media stack” and “the thing that behaves on random school devices, Macs, standalone VR headsets, and mystery tablets” are not always the same thing.\n\nWebRTC is great in the right environment. Our environment is not the right environment often enough.\n\nSo the pragmatic approach is simpler: capture **raw PCM** on the client and send it over a plain WebSocket. It costs more bandwidth, but it is easier to reason about, easier to debug, and much more predictable across devices.\n\nSometimes the fancy option is right. Sometimes the right option is the one a tired developer can actually support in a school deployment without becoming a part-time audio detective.\n\nThe obvious concern with this design is hop count:\n\nThat sounds slow on paper.\n\nBut in practice, for turn-based conversation, the added delay measured under a second. That is not zero, but it is also not enough to make the interaction feel broken.\n\nSo we are deliberately not optimizing that path yet.\n\nIf seamless speech-to-speech becomes a hard requirement, that decision probably changes.\n\nThis is the honest backlog:\n\nSo no, this is not the satisfying part of the blog post where everything is solved and there is triumphant orchestral music in the background.\n\nVoice is still where the dragons are.\n\nIf I had to compress the whole thing into a few lessons, it would be these:\n\n`Microsoft.Extensions.AI`\n\noutlives whatever framework is fashionable this quarter.That last point is probably my favorite. In AI work, there is always pressure to make the “brain” feel magical. Most of the time, the better move is to make it boring, predictable, and easy to swap around.\n\nOn the surface, this post is about a .NET team building an assistant for a 3D learning product.\n\nBut underneath that, it is really about something more familiar: trying to add a new capability without letting the hype blow up the architecture.\n\nThe decisions we’ve landed on so far are these:\n\n`Microsoft.Extensions.AI`\n\n`AIAgent`\n\nexposed over AG-UI`MapAGUI`\n\n.Each of those will probably become its own follow-up article because each one hid more detail than expected.\n\nThere is still a lot of implementation left. None of this is “battle-tested and perfect.” It is “this is the architecture we chose, this is why we chose it, and this is the shape of the problems we are still solving.”\n\nWhich, honestly, is probably more useful than another article pretending the road was smooth.\n\nIf you are a .NET developer trying to build your first AI feature, that is the main thing I would want to pass on: you do not need to start by finding the smartest model. You need to find the seam in your system, protect your abstractions, and avoid writing infrastructure you do not actually want to own.", "url": "https://wpnews.pro/news/how-a-net-dev-built-an-ai-assistant", "canonical_source": "https://dev.to/manhhungtran/how-a-net-dev-built-an-ai-assistant-36cm", "published_at": "2026-06-26 19:26:28+00:00", "updated_at": "2026-06-26 19:34:03.947207+00:00", "lang": "en", "topics": ["artificial-intelligence", "developer-tools", "ai-agents", "natural-language-processing", "ai-infrastructure"], "entities": ["Microsoft.Extensions.AI", "OpenAI", "Wolverine", "Marten", "PostgreSQL", "Svelte", "Cori", ".NET"], "alternates": {"html": "https://wpnews.pro/news/how-a-net-dev-built-an-ai-assistant", "markdown": "https://wpnews.pro/news/how-a-net-dev-built-an-ai-assistant.md", "text": "https://wpnews.pro/news/how-a-net-dev-built-an-ai-assistant.txt", "jsonld": "https://wpnews.pro/news/how-a-net-dev-built-an-ai-assistant.jsonld"}}