{"slug": "building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously", "title": "Building a RAG System from Scratch — Tool Use: Let the LLM Search Autonomously", "summary": "A developer building a RAG system from scratch implemented Tool Use, allowing the LLM to autonomously decide when to call search functions. The approach replaces a hardcoded search-then-answer flow with a flexible loop where the LLM can call functions, receive results, and decide whether to continue searching or generate a final answer. The implementation uses Google's Gemini API and a PostgreSQL vector database.", "body_md": "In the [previous article](https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-design-decisions-explained-40hd), we examined the design decisions behind our RAG pipeline. Now we'll give the LLM the ability to call our search functions autonomously — this is **Tool Use**.\n\nIn our RAG pipeline so far, we always called `search()`\n\nbefore generating an answer. The flow was hardcoded:\n\n```\nquestion → search() → generate_answer()\n```\n\nWith Tool Use, the LLM decides whether to search, what to search for, and when it has enough information to answer:\n\n```\nquestion → LLM decides → search() if needed → LLM decides → answer\n```\n\nThis matters when:\n\nThe LLM is given a list of available functions with their signatures and descriptions. It responds with either:\n\n`function_call`\n\n— \"call this function with these arguments\"Your code executes the function call and sends the result back. The LLM then decides whether to call another function or produce a final answer.\n\n```\nYou → LLM: \"here are available tools + user question\"\nLLM → You: function_call { name: \"search_documents\", args: { query: \"F1 score\" } }\nYou → execute search_documents(\"F1 score\") → results\nYou → LLM: function_result { ... }\nLLM → You: \"The F1 score is calculated as...\"\n```\n\n`06_tool_basic.py`\n\n``` python\n# 06_tool_basic.py\nimport psycopg2\nfrom google import genai\nfrom google.genai import types\nfrom dotenv import load_dotenv\nimport os\n\nload_dotenv()\n\nclient = genai.Client(api_key=os.getenv(\"GEMINI_API_KEY\"))\nconn = psycopg2.connect(\n    host=os.getenv(\"DB_HOST\"), port=os.getenv(\"DB_PORT\"),\n    dbname=os.getenv(\"DB_NAME\"), user=os.getenv(\"DB_USER\"),\n    password=os.getenv(\"DB_PASSWORD\"),\n)\ncur = conn.cursor()\n\ndef get_embedding(text: str) -> list[float]:\n    result = client.models.embed_content(\n        model=\"gemini-embedding-001\",\n        contents=text,\n        config=types.EmbedContentConfig(\n            task_type=\"RETRIEVAL_QUERY\",\n            output_dimensionality=768,\n        ),\n    )\n    return result.embeddings[0].values\n\ndef search_documents(query: str, top_k: int = 3) -> list[dict]:\n    query_embedding = get_embedding(query)\n    cur.execute(\"\"\"\n        SELECT title, body, category,\n               1 - (embedding <=> %s::vector) AS similarity\n        FROM documents\n        ORDER BY embedding <=> %s::vector\n        LIMIT %s;\n    \"\"\", (query_embedding, query_embedding, top_k))\n    rows = cur.fetchall()\n    return [\n        {\"title\": r[0], \"body\": r[1], \"category\": r[2], \"similarity\": round(r[3], 4)}\n        for r in rows\n    ]\n\n# ── Tool definition ──────────────────────────────────────────\n# Instead of calling search_documents() directly, we describe it to the LLM.\n# The description is what the LLM uses to decide when to call it.\ntools = types.Tool(\n    function_declarations=[\n        types.FunctionDeclaration(\n            name=\"search_documents\",\n            description=\"Search documents in the vector DB for a given query. \"\n                        \"Use this when you need information to answer the question.\",\n            parameters=types.Schema(\n                type=types.Type.OBJECT,\n                properties={\n                    \"query\": types.Schema(\n                        type=types.Type.STRING,\n                        description=\"The search query\",\n                    ),\n                    \"top_k\": types.Schema(\n                        type=types.Type.INTEGER,\n                        description=\"Number of documents to retrieve (default: 3)\",\n                    ),\n                },\n                required=[\"query\"],\n            ),\n        ),\n    ]\n)\n\ndef run(question: str):\n    print(f\"Question: {question}\\n\")\n\n    response = client.models.generate_content(\n        model=\"gemini-2.5-flash\",\n        contents=question,\n        config=types.GenerateContentConfig(tools=[tools]),\n    )\n\n    part = response.candidates[0].content.parts[0]\n\n    if part.function_call:\n        # LLM decided to call a tool\n        func_name = part.function_call.name\n        func_args = dict(part.function_call.args)\n        print(f\"→ LLM called: {func_name}({func_args})\")\n\n        result = search_documents(**func_args)\n        print(f\"→ Retrieved {len(result)} documents\")\n        print(f\"→ Top result: {result[0]['title']}\")\n    else:\n        # LLM answered directly without searching\n        print(f\"→ LLM answered directly (no search needed)\")\n        print(part.text)\n\nrun(\"How do you calculate the F1 score?\")\nrun(\"What is 2 + 2?\")  # LLM should answer this without searching\npython 06_tool_basic.py\n# Question: How do you calculate the F1 score?\n# → LLM called: search_documents({'query': 'F1 score calculation'})\n# → Retrieved 3 documents\n# → Top result: ML Model Evaluation Metrics\n#\n# Question: What is 2 + 2?\n# → LLM answered directly (no search needed)\n# → 4\n```\n\nThe LLM correctly decides when to search and when not to.\n\n`07_tool_multi.py`\n\nNow we give the LLM two tools: one for general search and one for category-filtered search. The LLM picks the right one based on the question.\n\n``` python\n# 07_tool_multi.py (key additions)\n\ndef search_by_category(query: str, category: str, top_k: int = 3) -> list[dict]:\n    query_embedding = get_embedding(query)\n    cur.execute(\"\"\"\n        SELECT title, body, category,\n               1 - (embedding <=> %s::vector) AS similarity\n        FROM documents\n        WHERE category = %s\n        ORDER BY embedding <=> %s::vector\n        LIMIT %s;\n    \"\"\", (query_embedding, category, query_embedding, top_k))\n    rows = cur.fetchall()\n    return [\n        {\"title\": r[0], \"body\": r[1], \"category\": r[2], \"similarity\": round(r[3], 4)}\n        for r in rows\n    ]\n\ntools = types.Tool(\n    function_declarations=[\n        types.FunctionDeclaration(\n            name=\"search_documents\",\n            description=\"Search all categories when the category is unknown \"\n                        \"or the question spans multiple categories.\",\n            parameters=types.Schema(\n                type=types.Type.OBJECT,\n                properties={\n                    \"query\": types.Schema(type=types.Type.STRING),\n                    \"top_k\": types.Schema(type=types.Type.INTEGER),\n                },\n                required=[\"query\"],\n            ),\n        ),\n        types.FunctionDeclaration(\n            name=\"search_by_category\",\n            description=\"Search within a specific category (ML, Python, or Cloud). \"\n                        \"Use this when the question clearly targets one category.\",\n            parameters=types.Schema(\n                type=types.Type.OBJECT,\n                properties={\n                    \"query\": types.Schema(type=types.Type.STRING),\n                    \"category\": types.Schema(\n                        type=types.Type.STRING,\n                        description=\"Category name: ML, Python, or Cloud\",\n                    ),\n                    \"top_k\": types.Schema(type=types.Type.INTEGER),\n                },\n                required=[\"query\", \"category\"],\n            ),\n        ),\n    ]\n)\n```\n\nThe description is the routing logic.The LLM reads the`description`\n\nfield to decide which tool to call. Write descriptions that clearly distinguish when to use each tool — this is prompt engineering for tool selection.\n\n`08_tool_agent.py`\n\nThe real power of Tool Use is the **agentic loop**: the LLM can call multiple tools in sequence, building up context before producing a final answer.\n\n``` python\n# 08_tool_agent.py\n\ndef dispatch(func_name: str, func_args: dict):\n    \"\"\"Route function calls to the right Python function.\"\"\"\n    if func_name == \"search_documents\":\n        return search_documents(**func_args)\n    elif func_name == \"search_by_category\":\n        return search_by_category(**func_args)\n    return {\"error\": f\"Unknown function: {func_name}\"}\n\ndef run_agent(task: str, max_steps: int = 8):\n    print(f\"\\nTask: {task}\")\n    print(\"=\" * 60)\n\n    # Conversation history — this is what enables multi-step reasoning\n    contents = [types.Content(role=\"user\", parts=[types.Part(text=task)])]\n\n    for step in range(max_steps):\n        response = client.models.generate_content(\n            model=\"gemini-2.5-flash\",\n            contents=contents,\n            config=types.GenerateContentConfig(tools=[tools]),\n        )\n\n        part = response.candidates[0].content.parts[0]\n\n        if part.function_call:\n            func_name = part.function_call.name\n            func_args = dict(part.function_call.args)\n            print(f\"[Step {step+1}] → {func_name}({func_args})\")\n\n            result = dispatch(func_name, func_args)\n\n            # Append the tool call and result to conversation history\n            contents.append(\n                types.Content(role=\"model\", parts=[types.Part(function_call=part.function_call)])\n            )\n            contents.append(\n                types.Content(\n                    role=\"user\",\n                    parts=[types.Part(\n                        function_response=types.FunctionResponse(\n                            name=func_name,\n                            response={\"result\": result},\n                        )\n                    )]\n                )\n            )\n        else:\n            # LLM produced a final answer\n            text_parts = [p.text for p in response.candidates[0].content.parts if p.text]\n            print(f\"\\n[Done in {step+1} steps]\")\n            return \"\\n\".join(text_parts)\n\n    return \"Max steps reached.\"\n\nresult = run_agent(\n    \"What evaluation metrics are available for ML models? \"\n    \"Show me both the metric names and how to implement them in Python.\"\n)\nprint(f\"\\nFinal answer:\\n{result}\")\npython 08_tool_agent.py\n# Task: What evaluation metrics are available for ML models?...\n# [Step 1] → search_by_category({'query': 'ML evaluation metrics', 'category': 'ML'})\n# [Step 2] → search_by_category({'query': 'scikit-learn model evaluation', 'category': 'ML'})\n# [Done in 3 steps]\n# Final answer: ML models are evaluated using...\n```\n\nThe agent searched twice with different queries, gathered complementary information, then synthesized a comprehensive answer.\n\n**The conversation history is the agent's memory.** Each tool call and its result gets appended to `contents`\n\n. The LLM sees the full history on every step, which is how it knows what it has already retrieved and what it still needs.\n\n** dispatch() is the bridge.** It maps function names (strings from the LLM) to actual Python functions. Keep it simple and exhaustive — every tool the LLM can call must have an entry here.\n\n**The description field does the routing.** Spend time on tool descriptions. A vague description leads to random tool selection. A precise description (\"use this when the category is explicitly mentioned\") leads to correct routing almost every time.\n\n```\nBefore Tool Use:\n  hardcoded: question → search → answer\n\nAfter Tool Use:\n  autonomous: question → LLM decides → search (maybe) → LLM decides → answer\n```\n\nIn the next article, we'll build a full **AI Agent** with memory, planning, and multiple tools working together.\n\n*Full source code: github.com/qameqame/pgvector-tutorial*", "url": "https://wpnews.pro/news/building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously", "canonical_source": "https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously-29ho", "published_at": "2026-06-27 22:14:30+00:00", "updated_at": "2026-06-27 22:35:53.171026+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "developer-tools"], "entities": ["Google", "Gemini", "PostgreSQL", "Gemini API"], "alternates": {"html": "https://wpnews.pro/news/building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously", "markdown": "https://wpnews.pro/news/building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously.md", "text": "https://wpnews.pro/news/building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously.txt", "jsonld": "https://wpnews.pro/news/building-a-rag-system-from-scratch-tool-use-let-the-llm-search-autonomously.jsonld"}}