Agent Framework RAG for Agents: Giving Your Agent the Right Context

A developer building on the Microsoft Agent Framework describes how to connect agents to private knowledge using RAG (Retrieval-Augmented Generation). The approach exposes retrieval as a controlled tool, SearchKnowledgeAsync, rather than giving the agent direct access to databases or all documents. The agent fetches relevant context only when needed, keeping the retrieval layer separate from the agent runtime.

This is Part 13 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev . In the previous article https://www.lukaswalter.dev/posts/agentframework 1 12/ , we looked at workflows. Workflows make sense when the process itself needs structure: state, checkpoints, events, human approvals, and resumable execution. This post is the bridge from Agent Framework into RAG. I plan on doing a full RAG deep dive sometime later. The practical question for now is smaller: How do I connect an Agent Framework agent to private application knowledge without stuffing every document into the prompt? For agents, RAG is less about adding more text and more about giving the agent a controlled retrieval path. The agent should fetch the right context at the point where it needs it. Your company documents, product catalog, tickets, rules, policies, runbooks, and internal knowledge base live outside the model. The model has generic knowledge. Your application has private knowledge. Treat those as separate systems. You can paste some private data into the prompt, and for a demo that may be enough. But this falls apart quickly: The last point is easy to underestimate. A larger context window lets you send more text. It does not decide which text is correct, current, relevant, or permitted. Do not give the agent all knowledge. Give it the right context at the moment it needs it. Retrieval owns that job. The basic RAG loop is small: php user question - retrieve relevant chunks - pass chunks to the agent - agent answers using that context For documents, the longer pipeline usually looks like this: php documents - chunks - embeddings - vector store - search - retrieved context - agent response Documents are split into smaller chunks. Those chunks are embedded into vectors. The vectors and source metadata are stored. When a user asks a question, the question is embedded too. The search layer finds nearby chunks and returns only those chunks to the agent. Stop there for now. There are some hard parts here: chunk boundaries, embedding model choice, hybrid search, reranking, freshness, access control, observability, and evals. They are just not the point yet. For now, keep the boundary clear: RAG is the retrieval layer around the agent. The agent is not the retrieval layer. Microsoft Agent Framework gives you the agent runtime. It does not give you a finished ingestion pipeline, chunking strategy, embedding setup, vector store, ranking model, permission model, freshness process, or retrieval eval suite. Agent Framework helps you decide how the agent receives and uses context: The retrieval system still belongs to your application architecture. It might use Azure AI Search, PostgreSQL with pgvector https://www.lukaswalter.dev/posts/rag-efcore-pgvector/ , SQL Server vector search, Cosmos DB, Qdrant, Redis, a normal search index, or an internal HTTP API. The agent does not need to care. The agent needs a focused capability. Not direct database access. For many agent apps, I would start by exposing retrieval as a tool. The tool is narrow: SearchKnowledgeAsync string query, string? category, int limit The agent can call it when the answer depends on private knowledge. Your application decides what the tool is allowed to search. This matches the tool-design rule from earlier in the series: Tools should expose controlled capabilities, not raw infrastructure. A small version looks like this: using System.ComponentModel; using Microsoft.Agents.AI; using Microsoft.Extensions.AI; using Microsoft.Extensions.DependencyInjection; public sealed record KnowledgeSearchResult string Title, string Source, string Snippet, double Score ; public interface IKnowledgeSearch { Task<IReadOnlyList<KnowledgeSearchResult SearchAsync string query, string? category, int limit, CancellationToken cancellationToken ; } Description "Searches approved internal knowledge articles, policies, and runbooks." public static Task<IReadOnlyList<KnowledgeSearchResult SearchKnowledgeAsync Description "Focused search query. Rewrite the user's message into search terms." string query, Description "Optional source category such as policy, runbook, product, support, or architecture." string? category, Description "Maximum number of results to return. Use 3 to 5 for normal questions." int limit, IServiceProvider services, CancellationToken cancellationToken { var search = services.GetRequiredService<IKnowledgeSearch ; return search.SearchAsync query, category, Math.Clamp limit, 1, 5 , cancellationToken ; } The model supplies query , category , and limit . The application supplies IKnowledgeSearch . Keep that split. The model can ask for a search. It does not get a connection string, a database client, or permission to browse every source. Then attach the tool to the agent: AIAgent supportAgent = chatClient.AsAIAgent instructions: """ You answer questions about the internal engineering platform. Use SearchKnowledgeAsync when the answer depends on private company documentation, runbooks, policies, known issues, or product rules. If the search results do not contain enough evidence, say that the indexed sources do not answer the question. Do not invent policy details, limits, prices, permissions, or operational steps. """, tools: AIFunctionFactory.Create SearchKnowledgeAsync , services: app.Services ; The agent-side RAG flow is: At that point, retrieval is just another tool. The pattern fits Agent Framework because tools already give you that controlled application boundary. Users ask messy questions. For example: What were the most important changes in our cancellation policy last year? A better retrieval query might be: cancellation policy changes last year Or, if you expose metadata filters: await SearchKnowledgeAsync query: "cancellation policy changes last year", category: "policy", limit: 5, services, cancellationToken ; The agent can help here. It can translate a conversational request into a smaller retrieval query. But do not overcomplicate this too early. Start by logging the generated tool query and checking whether it actually finds better results than the raw user message. Bad query rewriting is worse than no query rewriting. It can remove the term that mattered. Vector similarity finds related text. It does not know whether that text belongs to the right tenant, product, language, version, source system, or user permission scope. You often need filters. Common filters include: Some filters can be model supplied. category is a reasonable example because the model can often infer whether a question is about a policy, runbook, product, or support article. Some filters should not be model supplied. Tenant, user ID, role, entitlement, and document permissions should come from your authenticated application context. The model should not be allowed to say: Search tenant = admin and suddenly see admin-only documents. A better application boundary looks like this: public interface IKnowledgeSearch { Task<IReadOnlyList<KnowledgeSearchResult SearchAsync string query, string? category, int limit, UserKnowledgeScope scope, CancellationToken cancellationToken ; } The tool can accept the search query and category. Your application adds UserKnowledgeScope from the current user. Similarity search finds related text. Metadata filters keep the search inside the right boundary. Exposing retrieval as a tool is not the only option. For a pure documentation assistant, you may not want the model to decide whether to search. You may want retrieval on every request. Plain application code is enough: IReadOnlyList<KnowledgeSearchResult results = await knowledgeSearch.SearchAsync query: userQuestion, category: null, limit: 5, cancellationToken ; string context = string.Join "\n\n", results.Select result = $""" Source: {result.Title} {result.Snippet} """ ; AgentResponse response = await supportAgent.RunAsync $""" Answer the user's question using the retrieved context. If the context is not enough, say so. Retrieved context: {context} User question: {userQuestion} """, cancellationToken: cancellationToken ; You can also use Agent Framework context providers, such as TextSearchProvider , when that fits your setup. The tradeoff is the same either way: If almost every request needs private knowledge, retrieve before the agent call. If retrieval is one capability among several, expose it as a tool. RAG is for finding relevant context. Code is for exact operations. If the user asks: What are the top 5 products by revenue? that should probably be SQL or an analytics API, not vector search. The same applies to: Vector search is good at finding related text. It is not a calculator, database constraint, authorization system, or reporting engine. If the answer must be exact, use normal code behind a tool. For example: Description "Returns the top products by revenue for an authorized reporting period." public static Task<IReadOnlyList<ProductRevenue GetTopProductsByRevenueAsync DateOnly from, DateOnly to, int limit, IServiceProvider services, CancellationToken cancellationToken { var reporting = services.GetRequiredService<IRevenueReporting ; return reporting.GetTopProductsByRevenueAsync from, to, Math.Clamp limit, 1, 20 , cancellationToken ; } This still gives the agent a tool. It is just not RAG. Use retrieval with an Agent Framework agent when: Start with a narrow search tool. Log the query the agent sends. Log the sources returned. Check whether the answer actually used those sources. That gives you enough signal to see where the retrieval design is weak. Do not use RAG when the task needs deterministic data access or computation. Use normal code for current state, totals, rankings, exact IDs, prices, permissions, and business rules. Do not use RAG as a way to bypass application boundaries. If a user cannot access a document in the product, the retrieval tool should not return it to the agent. Also avoid building the full ingestion and retrieval platform before you have a real use case. Start with one domain, a small corpus, and a handful of questions you can verify. Agent Framework gives you a clean place to put retrieval into the agent loop. It does not make RAG automatic. The design I would carry forward is simple: As I said before, I will do a deep dive into RAG later on. So in the next Agent Framework post we will move to multimodal agents: images, PDFs, and provider differences. The agent boundary gets messy there in a different way. Some providers can work with images or document inputs natively, some need different message formats, and some scenarios are still better handled by manual preprocessing before the agent sees anything.