Overcoming LLM Limitations

A developer shipped a small research agent that addresses three common LLM limitations: stale training data, hallucinated facts, and arithmetic errors. The agent uses tool calling to look up facts and verify math before answering, and the implementation is demonstrated using Oxlo.ai's API with Llama 3.3 70B.

I recently shipped a small research agent that beats three recurring LLM problems: stale training data, hallucinated facts, and arithmetic errors. Instead of hoping the model memorized everything correctly, I gave it tools to look up facts and verify math, then cite sources before answering. In this tutorial I will walk you through the exact code so you can run it against Oxlo.ai today. Python 3.10 or newer, the OpenAI SDK, and an Oxlo.ai API key from https://portal.oxlo.ai https://portal.oxlo.ai . Install the SDK with pip: pip install openai I start by instantiating the OpenAI SDK against Oxlo.ai and declaring the two tools the agent can call. Oxlo.ai exposes function calling through the standard chat completions endpoint, so the schema is identical to what you already know. python from openai import OpenAI client = OpenAI base url="https://api.oxlo.ai/v1", api key="YOUR OXLO API KEY" TOOLS = { "type": "function", "function": { "name": "search", "description": "Search the local knowledge base for a topic.", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Topic to look up."} }, "required": "query" } } }, { "type": "function", "function": { "name": "calculator", "description": "Evaluate a mathematical expression safely.", "parameters": { "type": "object", "properties": { "expression": {"type": "string", "description": "Math expression using +, -, , /, and parentheses."} }, "required": "expression" } } } The system prompt is the contract. It forces the model to use tools before answering and to cite every claim with a source label. I keep it strict because the whole point is to stop hallucinations. SYSTEM PROMPT = """You are a grounded research assistant. Your job is to answer user questions accurately. Rules: 1. If the question involves facts, dates, or named entities, call the 'search' tool first. 2. If the question involves arithmetic, call the 'calculator' tool first. 3. Only answer after you have tool results. Cite sources like Source: search . 4. If the tools return nothing, say you do not know. Do not guess.""" Next I write the actual tool implementations. I use a tiny in-memory knowledge base so the script is fully runnable without signing up for extra APIs. The calculator locks down the allowed character set so eval stays safe. python import json import re KNOWLEDGE BASE = { "oxlo.ai pricing": "Oxlo.ai uses flat per-request pricing. One API call costs the same regardless of prompt length, which makes it cheaper than token-based providers for long-context workloads.", "oxlo.ai models": "Oxlo.ai hosts 45+ models including Llama 3.3 70B, DeepSeek R1 671B, Qwen 3 32B, and Kimi K2.6.", "moon landing": "The first crewed moon landing was Apollo 11 on July 20, 1969." } def search query: str - str: q = query.lower for key, value in KNOWLEDGE BASE.items : if key in q or q in key: return value return "No relevant information found." def calculator expression: str - str: if not re.fullmatch r" \d\s\.\+\-\ /\ \ +", expression : return "Error: invalid characters in expression." try: result = eval expression, {" builtins ": {}}, {} return str result except Exception as e: return f"Error: {e}" def dispatch tool name: str, arguments: str - str: args = json.loads arguments if name == "search": return search args "query" if name == "calculator": return calculator args "expression" return "Unknown tool." This is the core loop. I send the conversation to Llama 3.3 70B on Oxlo.ai with tools enabled. If the model requests tool calls, I execute them locally, append the results, and send the updated conversation back for the final answer. php def ask agent user message: str - str: messages = {"role": "system", "content": SYSTEM PROMPT}, {"role": "user", "content": user message}, while True: response = client.chat.completions.create model="llama-3.3-70b", messages=messages, tools=TOOLS, tool choice="auto", msg = response.choices 0 .message if msg.tool calls: messages.append { "role": "assistant", "content": msg.content or "", "tool calls": { "id": tc.id, "type": tc.type, "function": { "name": tc.function.name, "arguments": tc.function.arguments, }, } for tc in msg.tool calls , } for tc in msg.tool calls: result = dispatch tool tc.function.name, tc.function.arguments messages.append { "role": "tool", "tool call id": tc.id, "content": result, } else: return msg.content Repeated lookups waste time. An LRU cache on the search function cuts redundant work and keeps the agent snappy, which is especially helpful when you are iterating quickly against a per-request pricing model. python from functools import lru cache @lru cache maxsize=128 def cached search query: str - str: return search query def dispatch tool name: str, arguments: str - str: args = json.loads arguments if name == "search": return cached search args "query" if name == "calculator": return calculator args "expression" return "Unknown tool." Here is the entry point. I ask a question that requires both a fact lookup and a hypothetical calculation so you can see both tools fire. if name == " main ": question = "How much would 1000 API requests cost on Oxlo.ai per day if each request is flat priced? " "Also, what models are available?" answer = ask agent question print answer Example output: Based on the search results: - Oxlo.ai uses flat per-request pricing. One API call costs the same regardless of prompt length, which makes it cheaper than token-based providers for long-context workloads. Source: search - Oxlo.ai hosts 45+ models including Llama 3.3 70B, DeepSeek R1 671B, Qwen 3 32B, and Kimi K2.6. Source: search For 1000 API requests per day, you would pay 1000 times the flat per-request rate. Because Oxlo.ai does not use token-based billing, the total is predictable and does not scale with input length. That is the entire agent. By offloading facts and math to deterministic tools, you eliminate the most common failure modes of raw LLM outputs. Because Oxlo.ai charges a flat rate per request, you can feed long tool results back into context without watching token meters run up, which makes this pattern cheap to operate at scale. Two concrete next steps. First, swap the in-memory KNOWLEDGE BASE for a real vector database like Qdrant so the agent can search your own documents. Second, add Pydantic validation to the tool arguments so malformed calls fail fast before they hit your logic.