# Overcoming LLM Limitations

> Source: <https://dev.to/shashank_ms_6a35baa4be138/overcoming-llm-limitations-3din>
> Published: 2026-06-16 19:35:11+00:00

I recently shipped a small research agent that beats three recurring LLM problems: stale training data, hallucinated facts, and arithmetic errors. Instead of hoping the model memorized everything correctly, I gave it tools to look up facts and verify math, then cite sources before answering. In this tutorial I will walk you through the exact code so you can run it against Oxlo.ai today.

Python 3.10 or newer, the OpenAI SDK, and an Oxlo.ai API key from [https://portal.oxlo.ai](https://portal.oxlo.ai). Install the SDK with pip:

```
pip install openai
```

I start by instantiating the OpenAI SDK against Oxlo.ai and declaring the two tools the agent can call. Oxlo.ai exposes function calling through the standard chat completions endpoint, so the schema is identical to what you already know.

``` python
from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search",
            "description": "Search the local knowledge base for a topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Topic to look up."}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate a mathematical expression safely.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression using +, -, *, /, and parentheses."}
                },
                "required": ["expression"]
            }
        }
    }
]
```

The system prompt is the contract. It forces the model to use tools before answering and to cite every claim with a source label. I keep it strict because the whole point is to stop hallucinations.

```
SYSTEM_PROMPT = """You are a grounded research assistant. Your job is to answer user questions accurately.

Rules:
1. If the question involves facts, dates, or named entities, call the 'search' tool first.
2. If the question involves arithmetic, call the 'calculator' tool first.
3. Only answer after you have tool results. Cite sources like [Source: search].
4. If the tools return nothing, say you do not know. Do not guess."""
```

Next I write the actual tool implementations. I use a tiny in-memory knowledge base so the script is fully runnable without signing up for extra APIs. The calculator locks down the allowed character set so eval stays safe.

``` python
import json
import re

KNOWLEDGE_BASE = {
    "oxlo.ai pricing": "Oxlo.ai uses flat per-request pricing. One API call costs the same regardless of prompt length, which makes it cheaper than token-based providers for long-context workloads.",
    "oxlo.ai models": "Oxlo.ai hosts 45+ models including Llama 3.3 70B, DeepSeek R1 671B, Qwen 3 32B, and Kimi K2.6.",
    "moon landing": "The first crewed moon landing was Apollo 11 on July 20, 1969."
}

def search(query: str) -> str:
    q = query.lower()
    for key, value in KNOWLEDGE_BASE.items():
        if key in q or q in key:
            return value
    return "No relevant information found."

def calculator(expression: str) -> str:
    if not re.fullmatch(r"[\d\s\.\+\-\*/\(\)]+", expression):
        return "Error: invalid characters in expression."
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {e}"

def dispatch_tool(name: str, arguments: str) -> str:
    args = json.loads(arguments)
    if name == "search":
        return search(args["query"])
    if name == "calculator":
        return calculator(args["expression"])
    return "Unknown tool."
```

This is the core loop. I send the conversation to Llama 3.3 70B on Oxlo.ai with tools enabled. If the model requests tool calls, I execute them locally, append the results, and send the updated conversation back for the final answer.

``` php
def ask_agent(user_message: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ]

    while True:
        response = client.chat.completions.create(
            model="llama-3.3-70b",
            messages=messages,
            tools=TOOLS,
            tool_choice="auto",
        )

        msg = response.choices[0].message

        if msg.tool_calls:
            messages.append({
                "role": "assistant",
                "content": msg.content or "",
                "tool_calls": [
                    {
                        "id": tc.id,
                        "type": tc.type,
                        "function": {
                            "name": tc.function.name,
                            "arguments": tc.function.arguments,
                        },
                    } for tc in msg.tool_calls
                ],
            })

            for tc in msg.tool_calls:
                result = dispatch_tool(tc.function.name, tc.function.arguments)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": result,
                })
        else:
            return msg.content
```

Repeated lookups waste time. An LRU cache on the search function cuts redundant work and keeps the agent snappy, which is especially helpful when you are iterating quickly against a per-request pricing model.

``` python
from functools import lru_cache

@lru_cache(maxsize=128)
def cached_search(query: str) -> str:
    return search(query)

def dispatch_tool(name: str, arguments: str) -> str:
    args = json.loads(arguments)
    if name == "search":
        return cached_search(args["query"])
    if name == "calculator":
        return calculator(args["expression"])
    return "Unknown tool."
```

Here is the entry point. I ask a question that requires both a fact lookup and a hypothetical calculation so you can see both tools fire.

```
if __name__ == "__main__":
    question = (
        "How much would 1000 API requests cost on Oxlo.ai per day if each request is flat priced? "
        "Also, what models are available?"
    )
    answer = ask_agent(question)
    print(answer)
```

Example output:

```
Based on the search results:

- Oxlo.ai uses flat per-request pricing. One API call costs the same regardless of prompt length, which makes it cheaper than token-based providers for long-context workloads. [Source: search]
- Oxlo.ai hosts 45+ models including Llama 3.3 70B, DeepSeek R1 671B, Qwen 3 32B, and Kimi K2.6. [Source: search]

For 1000 API requests per day, you would pay 1000 times the flat per-request rate. Because Oxlo.ai does not use token-based billing, the total is predictable and does not scale with input length.
```

That is the entire agent. By offloading facts and math to deterministic tools, you eliminate the most common failure modes of raw LLM outputs. Because Oxlo.ai charges a flat rate per request, you can feed long tool results back into context without watching token meters run up, which makes this pattern cheap to operate at scale.

Two concrete next steps. First, swap the in-memory KNOWLEDGE_BASE for a real vector database like Qdrant so the agent can search your own documents. Second, add Pydantic validation to the tool arguments so malformed calls fail fast before they hit your logic.
