Agentic Search for Context Engineering Leonie Monigatti presented a workshop titled "Agentic Search for Context Engineering" at AI Engineer Europe 2026 in London on April 8, 2026, arguing that context engineering is 80% agentic search. The workshop traced the evolution from fixed retrieval pipelines in Retrieval-Augmented Generation (RAG) to agent-driven search tools, and now to context engineering, which manages multiple context sources like local files, databases, and web search. Monigatti made the case that the process of curating what goes into an agent's context window is critical for generating meaningful responses. This post is an edited long-form version of the workshop titled “Agentic Search for Context Engineering” I gave at AI Engineer Europe 2026 https://www.ai.engineer/europe/2026 on April 8, 2026 in London. The slides, code, and diagrams are available in the workshop repository https://github.com/iamleonie/workshop-agentic-search . If you prefer video, the full recording is available on YouTube: If you’ve built an agent before, you know that context engineering is a cruical part of making an agent return meaningful responses. Context engineering is the process of deciding what from all possible context sources actually goes into the agent’s context window so the LLM can generate the best response. This is also referred to as the process of “context curation”, which is depicted as the arrow from the possible context sources to the context window in the diagram below. I think we don’t give this context curation arrow enough credit because it does almost all of the heavy lifting. What hides behind it is a set of search tools the agent can decide to use. That’s why my personal hot take is that: Context engineering is about 80% agentic search. History Let’s take a step back and look at how search in the AI stack has evolved from Retrieval-Augmented Generated RAG ../blog/retrieval-augmented-generation-langchain.html , to agentic search/agentic RAG, and now to context engineering in the last three years. If you’re interested I’ve also written about the evolution from RAG to agent memory ../blog/from-rag-to-agent-memory.html . Retrieval-Augmented Generation When we first started building with LLMs in 2024, we started implementing RAG systems with fixed retrieval pipelines: - The user message is used - more or less verbatim - as a search query often vector search . - Chunks are pulled from a database once. - Retrieved chunks are combined with the user message in the prompt and fed to the LLM. This design is straightforward and still works for narrow Q&A. But it also breaks in predictable ways because you retrieve exactly once : - You retrieve once even when the model does not need external context, and irrelevant chunks can confuse it. - You retrieve once and have no option to correct the query. What if the returned results don’t contain the relevant information and you’d need to run a second search? - You retrieve once even when the question needs multi-hop retrieval. For example, the first batch of chunks might only tell you what to search for next, but the pipeline never runs a second pass. Agentic RAG To overcome these limitations, we replaced the fixed pipeline with a search tool and called it “agentic RAG”, “agentic retrieval”, or “agentic search”. In this scenario, the agent decides whether to call a search tool, whether results are relevant, whether to retrieve more, and whether to rewrite the search query. In many setups this is straightforward because you often have only one or at least a limited amount of context sources and one retrieval tool. Context engineering In context engineering we now have many different context sources: - Local files e.g., your repo, scratch pads with plan.md or todo.md files, or Agent Skills on disk, - Databases e.g., storing large-scale enterprise data , - Web, - Long-term memory I left this ambiguous on purpose because whether memory lives in files or a database is still debated filesystem-vs-database-for-agent-memory.qmd And depending on the context source, we often have a native search tool: - file search for local files, - skill loading for Agent Skills, - dedicated database tools semantic search or query execution for databases, - web search for the web, and - memory tools for long-term memory. If that’s not overwhelming enough, we now also have a tool that lets the agent run terminal commands. This tool has many different names: LangChain calls it the “shell tool” https://reference.langchain.com/python/langchain-community/tools/shell/tool/ShellTool , Anthropic the “bash tool” https://platform.claude.com/docs/en/agents-and-tools/tool-use/bash-tool , OpenClaw the “exec tool” https://docs.openclaw.ai/tools/exec . In the following, we will refer to it as the “shell tool” . The shell tool is a versatile tool because it can interact with most context sources: It can run against local files ls , grep , databases CLIs, scripts, curl to HTTPS APIs , and the web curl . That’s why currently, there’s been a lot of discussion whether a shell tool is all an agent needs. If you’re interested in a deep dive into this discussion, I’ve written about it in this blog titled “The shell tool is not a silver bullet for context engineering” https://www.elastic.co/search-labs/blog/search-tools-context-engineering . The TLDR of that blog and the core topic of this workshop is that the practical question is which search tools belong in your stack , not shell tool versus all others. If you take away one thing from this workshop, let it be this: Doing good search is difficult. That is why we have so many different search techniques and why you curate a stack for your latency and quality requirements. In this workshop, we take a look at a small selection of search tools to give you an intuition on their strengths and weaknesses. Fundamentals of building search tools Before we dive into some concrete search tools, let’s review some fundamentals of building effective search tools. Challenges of agentic search In theory agentic search looks simple: - The user makes a request - The agent calls the right tool with the right parameters - The tool returns search results - The agent responds with the correct answer. But in reality, this can break in many different ways. At Elastic we help teams build agents on Elasticsearch data and these are the three most common failure modes we observe: - The agent doesn’t call any tool at all and answers from parametric knowledge. - The agent calls the wrong tool e.g., web search instead of the company index . - The agent calls the right tool with wrong parameters. Let’s discuss some general best practices to overcome these challenges, so that when I demonstrate them to you in the next section, you will have an intuition on what to do. If you’re interested in more best practices, I’ve also written a guide on what we’ve learned about building effective database retrieval tools at Elastic https://www.elastic.co/search-labs/blog/database-retrieval-tools-context-engineering . Tool descriptions I don’t like this slide because I feel like everyone knows that the tool description is the most important part of any tool . However, I’m going to quickly give you a refresher on it because any time I see a tool description, it’s the least effort one-line description and you’re wondering why your agent is failing to use it. Admittedly, this template is quite long. I’m not saying that you need to follow this template at all times. Instead, start with a clear core purpose and trigger conditions. And when you notice your agent is struggling to use your tool, start adding more components of this template, such as actions, relationships, limitations, and examples. If that is still not enough, repeat the same rules in the system prompt. Parameter complexity Another aspect to be aware of is parameter complexity. Because once the right tool is selected, the agent still has to fill in the arguments and some are easier to get right than others. For example, the parameter for a tool get customer by id is easy when the ID is in the message. Or a semantic search tool that takes in any string as a search query is usually fine. But when you start adding more parameters, such as filters or top k , it can become more challenging for the LLM to generate valid parameters. But what if you have a general-purpose search tool that lets your agent write full SQL or ES|QL https://www.elastic.co/docs/reference/query-languages/esql queries from scratch as the parameter? Most models are decent at generating valid SQL queries, but it increases the risk of failures. Code walkthrough Let’s look at a small selection of search tool implementations in practice. I’d love to show you all of them but time is unfortunately limited. You can find the full code in the workshop repository https://github.com/iamleonie/workshop-agentic-search . This code walkthrough uses LangChain https://docs.langchain.com/ for orchestration because it abstracts a lot of complexity away, so that we can focus on the high level concepts, and also because it provides a shell tool and other useful things out of the box. For the demo, we use the AI Engineer Europe 2026 schedule https://www.ai.engineer/europe/schedule dataset to search over. The first two demos use a local Elasticsearch cluster, which you can spin up using the start-local https://www.elastic.co/guide/en/elasticsearch/reference/current/run-elasticsearch-locally.html script. The third reads the same data from disk. To focus on the core mechanics of the search tools, we will skip over the exact data preparation and ingestion steps, which are available in the notebook 00 prepare data.ipynb https://github.com/iamleonie/workshop-agentic-search/blob/main/notebooks/00 prepare data.ipynb . Vanilla agentic search Let’s refresh our memory on the minimal vanilla agentic search setup implemented in the first notebook 01 vanilla agentic search.ipynb https://github.com/iamleonie/workshop-agentic-search/blob/main/notebooks/01 vanilla agentic search.ipynb first. It consists of one semantic search tool over the index conference schedule , which contains one document per session. First, we set up an LLM, which is the core of any agent. Here we use gpt-5.4-nano through LiteLLM: python from langchain openai import ChatOpenAI llm = ChatOpenAI openai api base=LITELLM API BASE, api key=LITELLM API KEY, model="llm-gateway/gpt-5.4-nano", temperature=0.5, Then we define a minimal system prompt, which tells the agent that it is a search agent, when to retrieve, and how the index is shaped: You are a search agent tasked with answering questions about the AI Engineer Europe 2026 Conference. You have access to different context retrieval tools to help you answer user queries. Before answering a question decide whether or not you need to retrieve additional context to answer the question correctly. If the retrieved context does not contain relevant information to answer the query, say that you don't know. Elasticsearch conference schedule The conference sessions are indexed in Elasticsearch. One document per session. | Field | Description | |--------------|------------| | text | The string that was embedded: each session’s title plus description blank line between them . It does not include day, time, room, or speakers. | | vector | Dense embedding of text , used for similarity search. | | metadata | Structured fields see metadata description below | | Field | Description | |--------------|------------| | metadata.title | Title of the session| | metadata.day | Date of the session Example format: April 10 | | metadata.time | Time slot of the session Example format: 12:40-1:00pm | | metadata.room | Room where the session takes place | | metadata.type | One of 'keynote', 'workshop', 'talk', 'track keynote', 'lightning', 'expo session' | | metadata.track | Track | | metadata.speakers | Name s of the speaker s as a single comma-separated string | Next, we prepare the Elasticsearch client. For this, we prepare jina-embeddings-v5-text-small as the embedding model, which embeds queries at search time and connect to our ElasticsearchStore : python from langchain community.embeddings import JinaEmbeddings from langchain elasticsearch import ElasticsearchStore embeddings = JinaEmbeddings jina api key=JINA API KEY, model name="jina-embeddings-v5-text-small", vector store = ElasticsearchStore es url=ES URL, es user=ES USERNAME, es password=ES PASSWORD, index name="conference schedule", embedding=embeddings, Here comes the interesting part: Let’s define the search tool, which takes in a search query, runs a semantic search, and returns the top 3 results. LangChain’s @tool decorator lets us convert any Python function into an agent tool and uses the function name as the tool name and the docstring as the description by default. As you can see below, I already broke my own recommendation on tool descriptions from above tool-descriptions because I’m using only a short docstring here. In this case, however, that’s enough because we only have one tool and a vanilla demo use case. python from langchain.tools import tool @tool def semantic search conference sessions query: str - str: """Runs a semantic search query to find conference sessions by concept or topic. Args: query: The topic or concept used as search query to search for. Returns: A string containing the information about the sessions. """ docs = vector store.similarity search query, k=3 return "\n\n".join f" {doc.metadata 'type' } by {doc.metadata 'speakers' } : " f"{doc.metadata 'title' }\nDescription:{doc.page content}" for doc in docs Now, we have all the components that make up an AI agent usually, you’d also have “memory”, but it is omitted here on purpose to focus on the search tools and all we have to do is to plug them together. python from langchain.agents import create agent agent = create agent model=llm, system prompt=SYSTEM PROMPT, tools= semantic search conference sessions , For an example semantic search query “Which sessions discuss regulatory constraints in AI systems?” the agent now calls the tool and correctly finds Bilge Yücel’s talk on engineering AI under sovereignity constraints https://www.youtube.com/watch?v=x2bH0RKPgdc . ================================ Human Message ================================= Which sessions discuss regulatory constraints in AI systems? ================================== Ai Message ================================== Tool Calls: semantic search conference sessions call FE570VnExWZOmupAtjduUcIB Call ID: call FE570VnExWZOmupAtjduUcIB Args: query: regulatory constraints in AI systems law regulation compliance governance EU AI Act policy constraints ================================= Tool Message ================================= Name: semantic search conference sessions talk by Bilge Yücel : Engineering AI Systems Under Sovereignty Constraints Description: Regulatory and jurisdictional constraints are no longer an edge case in AI system design; ... talk by Dan James : Building the Justice AI Unit: Shipping Production AI Inside Government Description: ... workshop by Liam McGarrigle : Building Your Own Secure AI Workflows: Human-in-the-Loop Automation with n8n Description: ... ================================== Ai Message ================================== The conference sessions that discuss regulatory constraints / legal or jurisdictional limitations on AI systems are: - Bilge Yücel — “Engineering AI Systems Under Sovereignty Constraints” talk Focus: how legal/jurisdictional limits affect AI architecture—e.g., data localization, auditability, model transparency, and jurisdictional control . - Dan James — “Building the Justice AI Unit: Shipping Production AI Inside Government” talk Focus: building and deploying production AI inside government , where regulatory/operational constraints apply. If you want, tell me whether you mean EU AI Act–specific compliance or broader regulation/governance , and I can narrow it further. This is where most agentic search demos stop. This is also where it starts to get interesting. To break this demo that’s built on a semantic search tool, all you have to do is to ask it something where a keyword-based search would be more suitable than a semantic search. For example, if I ask “Which sessions should I visit to learn more about GEPA?” , the semantic search returns unrelated sessions, but not the correct one by Samuel Colvin https://www.youtube.com/watch?v=A48uhxfxbsM . ================================ Human Message ================================= Which sessions should I visit to learn more about GEPA? ================================== Ai Message ================================== Tool Calls: semantic search conference sessions call fV9IHwnZVgqM8RPdfyT9pNr8 Call ID: call fV9IHwnZVgqM8RPdfyT9pNr8 Args: query: GEPA ================================= Tool Message ================================= Name: semantic search conference sessions keynote by Omar Sanseviero: Gemma, DeepMind's Family of Open Models Description:Gemma, DeepMind's Family of Open Models Google DeepMind’s Gemma family is expanding. Join us for a deep dive into the latest models of the Gemma ecosystem. From vibe fine-tuning to Sovereign AI, you'll learn about the latest model capabilities, how to build high-performance applications, and how to get started with open models. track keynote by Ryan Lopopolo: Harness Engineering AMA Description:Harness Engineering AMA talk by Dan James: Building the Justice AI Unit: Shipping Production AI Inside Government Description:Building the Justice AI Unit: Shipping Production AI Inside Government Building the Justice AI Unit https://ai.justice.gov.uk/ — forward-deployed engineers, entrepreneurial model, shipping production AI every day, but inside government. ================================== Ai Message ================================== I searched the conference schedule for “GEPA” , but the only matching items I found were about DeepMind’s “Gemma” an open models family , not GEPA itself. So I don’t know which sessions are specifically about GEPA based on the available schedule data. If you meant Gemma DeepMind , the relevant session is: - Keynote — “Gemma, DeepMind's Family of Open Models” Omar Sanseviero Can you confirm what GEPA stands for or share a speaker/session name ? Then I can re-search and give you the right sessions. Agentic search with general-purpose database search tool What if the agent could fully autonomously write entire search queries from scratch? Let’s replace the narrow scoped semantic search tool with a more general purpose tool execute esql query for this as implemented in the notebook 02 agentic search with db query tool.ipynb https://github.com/iamleonie/workshop-agentic-search/blob/main/notebooks/02 agentic search with db query tool.ipynb . First, we switch from gpt-5.4-nano to gpt-5.4-mini because query generation is harder than passing a topic. Then, we define a general-purpose database query tool that let’s the agent run full ES|QL queries against Elasticsearch. php @tool def execute esql query esql query: str - str: """Execute an ES|QL query against the conference schedule index in Elasticsearch. Args: esql query: The ES|QL query to execute Returns: A string containing the information about the sessions. """ try: response = es client.esql.query query=esql query, format="csv" return response.body except Exception as e: return f"Error executing ES|QL query: {e}" One thing to note is that we also add a try/except block here that returns some meaningful information if the ES|QL query throws an error, so the agent can self-correct. Now, when we run the same question “Which sessions should I visit to learn more about GEPA?” as earlier, we can see that the agent generates something that looks like valid ES|QL at first glance: FROM conference schedule | WHERE text LIKE '%GEPA%' | KEEP metadata.speakers, metadata.title, text | LIMIT 3 However, this tool call returns zero search results because %GEPA% , which uses % wildcards, which is a SQL habit. ES|QL instead uses for wildcards. Note, that you should also think about how to handle zero search results. Decide in your product whether zero rows means “nothing found” or “retry with different parameters.” There are many options to overcome this, such as improving the instructions or adding ES|QL specific agent skills. For this tutorial, we will define a minimal skill. For production, I recommend to checkout Elastic Agent Skills https://github.com/elastic/agent-skills . python from typing import TypedDict class Skill TypedDict : name: str description: str content: str SKILLS: list Skill = { "name": "elasticsearch-esql", "description": """Execute ES|QL Elasticsearch Query Language queries, use when the user wants to query Elasticsearch data, analyze logs, aggregate metrics, explore data, or create charts and dashboards from ES|QL results.""", "content": """ Elasticsearch ES|QL ... Pattern Matching - LIKE Wildcard pattern zero or more chars, ? single char ... ES|QL uses double quotes for string literals, never single quotes. """, }, To use the Agent Skills, we will copy the Basic implementation for Agent Skills from the LangChain documentation https://docs.langchain.com/oss/python/langchain/multi-agent/skills basic-implementation for a skill loading tool load skill and a SkillMiddleware , which injects skill descriptions and registers load skill . php @tool def load skill skill name: str - str: """Load the full content of a skill into the agent's context.""" for skill in SKILLS: if skill "name" == skill name: return f"Loaded skill: {skill name}\n\n{skill 'content' }" available = ", ".join s "name" for s in SKILLS return f"Skill '{skill name}' not found. Available skills: {available}" Now we also have to tell the agent to call the ES|QL skill before calling the execute esql query tool. For this we adjust the tool description by adding this relationship: php @tool def execute esql query esql query: str - str: """Execute an ES|QL query against an index in Elasticsearch. Always use the Elasticsearch ES|QL skill to generate the ES|QL query before using this tool to execute the query. ... """ And we re-enforce this in the system prompt: SYSTEM PROMPT WITH SKILL = SYSTEM PROMPT + "If you need to execute an ES|QL query, use the Elasticsearch ES|QL skill " "to generate the query before using this tool to execute the query. " "If an ESQL query returns an error use the Elasticsearch ES|QL skill " "to generate a new query." Now we plug in all components including the SkillMiddleware : agent = create agent model=llm, system prompt=SYSTEM PROMPT WITH SKILL, middleware= SkillMiddleware , tools= execute esql query , The traces now show that the agent first calls the ES|QL skill and uses it to write valid ES|QL queries when calling the execute esql query tool. The results show that this way the agent is able to find the correct session. ================================ Human Message ================================= Which sessions should I visit to learn more about GEPA? ================================== Ai Message ================================== Tool Calls: load skill call pUblqD2S7iLb6hjEaI7xZZJr Call ID: call pUblqD2S7iLb6hjEaI7xZZJr Args: skill name: elasticsearch-esql ================================= Tool Message ================================= Name: load skill Loaded skill: elasticsearch-esql Elasticsearch ES|QL Execute ES|QL queries against Elasticsearch. What is ES|QL? ES|QL Elasticsearch Query Language is a piped query language for Elasticsearch. It is NOT the same as SQL. ... ================================== Ai Message ================================== Tool Calls: execute esql query call rYOHyKLoB9vsAF77NKGJD9vr Call ID: call rYOHyKLoB9vsAF77NKGJD9vr Args: esql query: FROM conference schedule | WHERE text LIKE " GEPA " OR metadata.title LIKE " GEPA " | KEEP metadata.title, metadata.day, metadata.time, metadata.room, metadata.type, metadata.track, metadata.speakers | SORT metadata.day, metadata.time ================================= Tool Message ================================= Name: execute esql query metadata.title,metadata.day,metadata.time,metadata.room,metadata.type,metadata.track,metadata.speakers Playground in Prod - Optimising Agents in Production Environments,April 8,10:40am-12:00pm,Westminster,workshop,,Samuel Colvin ================================== Ai Message ================================== The session I found related to GEPA is: - Playground in Prod - Optimising Agents in Production Environments - When: April 8, 10:40am–12:00pm - Where: Westminster - Type: Workshop - Speaker: Samuel Colvin If you want, I can also look for sessions about agents , optimization , or production environments that may be relevant to GEPA even if they don’t mention it by name. Since this tool is general-purpose, the agent can also use it for analytical questions , such as “How many sessions are on April 8?” by using STATS . This is especially useful because otherwise, the agent would have to filter for all sessions on April 8th, and then count them. And we all know how notoriously bad LLMs are at counting. ================================ Human Message ================================= How many sessions are on April 8th? ================================== Ai Message ================================== Tool Calls: load skill call 5Ki25LYVnkx245nEDSM3xwwz Call ID: call 5Ki25LYVnkx245nEDSM3xwwz Args: skill name: elasticsearch-esql ================================= Tool Message ================================= Name: load skill Loaded skill: elasticsearch-esql Elasticsearch ES|QL Execute ES|QL queries against Elasticsearch. What is ES|QL? ES|QL Elasticsearch Query Language is a piped query language for Elasticsearch. It is NOT the same as SQL. ... ================================== Ai Message ================================== Tool Calls: execute esql query call 0TSZEVO7uRGKGj2sfOE2VcEV Call ID: call 0TSZEVO7uRGKGj2sfOE2VcEV Args: esql query: FROM conference schedule | WHERE metadata.day == "April 8" | STATS count = COUNT ================================= Tool Message ================================= Name: execute esql query count 27 ================================== Ai Message ================================== There are 27 sessions on April 8th . As you can see, the general-purpose execute esql query is more powerful than the narrow scoped semantic search tool because it can handle more ambiguous or complex queries. At the same time, adding a skill loading tool adds cost and latency and it requires a more powerful LLM. Agentic search with shell tool Finally, let’s talk about the hottest topic in agentic search at the moment: “Bash + Filesystem is all you need”. You can find the related implementation in 03 agentic search with shell tool.ipynb https://github.com/iamleonie/workshop-agentic-search/blob/main/notebooks/03 agentic search with shell tool.ipynb . For this, we move the same data from the Elasticsearch cluster to local files under ../data/session data/ one .txt per session and let the agent search over them with LangChain’s ShellTool . First, we instantiate the shell tool: python from langchain community.tools import ShellTool shell tool = ShellTool no safeguards by default; sandbox in production Note that the shell tool doesn’t have any safeguards by default and it is recommended to use it only in a sandboxes environment. The system prompt for this example is similar to the ones described previously, expect that we now describe the data in the filesystem instead of the Elasticsearch cluster: You are a search agent tasked with ... Local filesystem session data The conference sessions are available under under ../data/session data/ . One file per session. File structure: ../ data/ └── session data/ ├── workshop/ │ ├──