Building a RAG System from Scratch — Tool Use: Let the LLM Search Autonomously

A developer building a RAG system from scratch implemented Tool Use, allowing the LLM to autonomously decide when to call search functions. The approach replaces a hardcoded search-then-answer flow with a flexible loop where the LLM can call functions, receive results, and decide whether to continue searching or generate a final answer. The implementation uses Google's Gemini API and a PostgreSQL vector database.

In the previous article https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-design-decisions-explained-40hd , we examined the design decisions behind our RAG pipeline. Now we'll give the LLM the ability to call our search functions autonomously — this is Tool Use . In our RAG pipeline so far, we always called search before generating an answer. The flow was hardcoded: question → search → generate answer With Tool Use, the LLM decides whether to search, what to search for, and when it has enough information to answer: question → LLM decides → search if needed → LLM decides → answer This matters when: The LLM is given a list of available functions with their signatures and descriptions. It responds with either: function call — "call this function with these arguments"Your code executes the function call and sends the result back. The LLM then decides whether to call another function or produce a final answer. You → LLM: "here are available tools + user question" LLM → You: function call { name: "search documents", args: { query: "F1 score" } } You → execute search documents "F1 score" → results You → LLM: function result { ... } LLM → You: "The F1 score is calculated as..." 06 tool basic.py python 06 tool basic.py import psycopg2 from google import genai from google.genai import types from dotenv import load dotenv import os load dotenv client = genai.Client api key=os.getenv "GEMINI API KEY" conn = psycopg2.connect host=os.getenv "DB HOST" , port=os.getenv "DB PORT" , dbname=os.getenv "DB NAME" , user=os.getenv "DB USER" , password=os.getenv "DB PASSWORD" , cur = conn.cursor def get embedding text: str - list float : result = client.models.embed content model="gemini-embedding-001", contents=text, config=types.EmbedContentConfig task type="RETRIEVAL QUERY", output dimensionality=768, , return result.embeddings 0 .values def search documents query: str, top k: int = 3 - list dict : query embedding = get embedding query cur.execute """ SELECT title, body, category, 1 - embedding <= %s::vector AS similarity FROM documents ORDER BY embedding <= %s::vector LIMIT %s; """, query embedding, query embedding, top k rows = cur.fetchall return {"title": r 0 , "body": r 1 , "category": r 2 , "similarity": round r 3 , 4 } for r in rows ── Tool definition ────────────────────────────────────────── Instead of calling search documents directly, we describe it to the LLM. The description is what the LLM uses to decide when to call it. tools = types.Tool function declarations= types.FunctionDeclaration name="search documents", description="Search documents in the vector DB for a given query. " "Use this when you need information to answer the question.", parameters=types.Schema type=types.Type.OBJECT, properties={ "query": types.Schema type=types.Type.STRING, description="The search query", , "top k": types.Schema type=types.Type.INTEGER, description="Number of documents to retrieve default: 3 ", , }, required= "query" , , , def run question: str : print f"Question: {question}\n" response = client.models.generate content model="gemini-2.5-flash", contents=question, config=types.GenerateContentConfig tools= tools , part = response.candidates 0 .content.parts 0 if part.function call: LLM decided to call a tool func name = part.function call.name func args = dict part.function call.args print f"→ LLM called: {func name} {func args} " result = search documents func args print f"→ Retrieved {len result } documents" print f"→ Top result: {result 0 'title' }" else: LLM answered directly without searching print f"→ LLM answered directly no search needed " print part.text run "How do you calculate the F1 score?" run "What is 2 + 2?" LLM should answer this without searching python 06 tool basic.py Question: How do you calculate the F1 score? → LLM called: search documents {'query': 'F1 score calculation'} → Retrieved 3 documents → Top result: ML Model Evaluation Metrics Question: What is 2 + 2? → LLM answered directly no search needed → 4 The LLM correctly decides when to search and when not to. 07 tool multi.py Now we give the LLM two tools: one for general search and one for category-filtered search. The LLM picks the right one based on the question. python 07 tool multi.py key additions def search by category query: str, category: str, top k: int = 3 - list dict : query embedding = get embedding query cur.execute """ SELECT title, body, category, 1 - embedding <= %s::vector AS similarity FROM documents WHERE category = %s ORDER BY embedding <= %s::vector LIMIT %s; """, query embedding, category, query embedding, top k rows = cur.fetchall return {"title": r 0 , "body": r 1 , "category": r 2 , "similarity": round r 3 , 4 } for r in rows tools = types.Tool function declarations= types.FunctionDeclaration name="search documents", description="Search all categories when the category is unknown " "or the question spans multiple categories.", parameters=types.Schema type=types.Type.OBJECT, properties={ "query": types.Schema type=types.Type.STRING , "top k": types.Schema type=types.Type.INTEGER , }, required= "query" , , , types.FunctionDeclaration name="search by category", description="Search within a specific category ML, Python, or Cloud . " "Use this when the question clearly targets one category.", parameters=types.Schema type=types.Type.OBJECT, properties={ "query": types.Schema type=types.Type.STRING , "category": types.Schema type=types.Type.STRING, description="Category name: ML, Python, or Cloud", , "top k": types.Schema type=types.Type.INTEGER , }, required= "query", "category" , , , The description is the routing logic.The LLM reads the description field to decide which tool to call. Write descriptions that clearly distinguish when to use each tool — this is prompt engineering for tool selection. 08 tool agent.py The real power of Tool Use is the agentic loop : the LLM can call multiple tools in sequence, building up context before producing a final answer. python 08 tool agent.py def dispatch func name: str, func args: dict : """Route function calls to the right Python function.""" if func name == "search documents": return search documents func args elif func name == "search by category": return search by category func args return {"error": f"Unknown function: {func name}"} def run agent task: str, max steps: int = 8 : print f"\nTask: {task}" print "=" 60 Conversation history — this is what enables multi-step reasoning contents = types.Content role="user", parts= types.Part text=task for step in range max steps : response = client.models.generate content model="gemini-2.5-flash", contents=contents, config=types.GenerateContentConfig tools= tools , part = response.candidates 0 .content.parts 0 if part.function call: func name = part.function call.name func args = dict part.function call.args print f" Step {step+1} → {func name} {func args} " result = dispatch func name, func args Append the tool call and result to conversation history contents.append types.Content role="model", parts= types.Part function call=part.function call contents.append types.Content role="user", parts= types.Part function response=types.FunctionResponse name=func name, response={"result": result}, else: LLM produced a final answer text parts = p.text for p in response.candidates 0 .content.parts if p.text print f"\n Done in {step+1} steps " return "\n".join text parts return "Max steps reached." result = run agent "What evaluation metrics are available for ML models? " "Show me both the metric names and how to implement them in Python." print f"\nFinal answer:\n{result}" python 08 tool agent.py Task: What evaluation metrics are available for ML models?... Step 1 → search by category {'query': 'ML evaluation metrics', 'category': 'ML'} Step 2 → search by category {'query': 'scikit-learn model evaluation', 'category': 'ML'} Done in 3 steps Final answer: ML models are evaluated using... The agent searched twice with different queries, gathered complementary information, then synthesized a comprehensive answer. The conversation history is the agent's memory. Each tool call and its result gets appended to contents . The LLM sees the full history on every step, which is how it knows what it has already retrieved and what it still needs. dispatch is the bridge. It maps function names strings from the LLM to actual Python functions. Keep it simple and exhaustive — every tool the LLM can call must have an entry here. The description field does the routing. Spend time on tool descriptions. A vague description leads to random tool selection. A precise description "use this when the category is explicitly mentioned" leads to correct routing almost every time. Before Tool Use: hardcoded: question → search → answer After Tool Use: autonomous: question → LLM decides → search maybe → LLM decides → answer In the next article, we'll build a full AI Agent with memory, planning, and multiple tools working together. Full source code: github.com/qameqame/pgvector-tutorial