{"slug": "overcoming-llm-limitations", "title": "Overcoming LLM Limitations", "summary": "A developer shipped a small research agent that addresses three common LLM limitations: stale training data, hallucinated facts, and arithmetic errors. The agent uses tool calling to look up facts and verify math before answering, and the implementation is demonstrated using Oxlo.ai's API with Llama 3.3 70B.", "body_md": "I recently shipped a small research agent that beats three recurring LLM problems: stale training data, hallucinated facts, and arithmetic errors. Instead of hoping the model memorized everything correctly, I gave it tools to look up facts and verify math, then cite sources before answering. In this tutorial I will walk you through the exact code so you can run it against Oxlo.ai today.\n\nPython 3.10 or newer, the OpenAI SDK, and an Oxlo.ai API key from [https://portal.oxlo.ai](https://portal.oxlo.ai). Install the SDK with pip:\n\n```\npip install openai\n```\n\nI start by instantiating the OpenAI SDK against Oxlo.ai and declaring the two tools the agent can call. Oxlo.ai exposes function calling through the standard chat completions endpoint, so the schema is identical to what you already know.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nTOOLS = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"search\",\n            \"description\": \"Search the local knowledge base for a topic.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"query\": {\"type\": \"string\", \"description\": \"Topic to look up.\"}\n                },\n                \"required\": [\"query\"]\n            }\n        }\n    },\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"calculator\",\n            \"description\": \"Evaluate a mathematical expression safely.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"expression\": {\"type\": \"string\", \"description\": \"Math expression using +, -, *, /, and parentheses.\"}\n                },\n                \"required\": [\"expression\"]\n            }\n        }\n    }\n]\n```\n\nThe system prompt is the contract. It forces the model to use tools before answering and to cite every claim with a source label. I keep it strict because the whole point is to stop hallucinations.\n\n```\nSYSTEM_PROMPT = \"\"\"You are a grounded research assistant. Your job is to answer user questions accurately.\n\nRules:\n1. If the question involves facts, dates, or named entities, call the 'search' tool first.\n2. If the question involves arithmetic, call the 'calculator' tool first.\n3. Only answer after you have tool results. Cite sources like [Source: search].\n4. If the tools return nothing, say you do not know. Do not guess.\"\"\"\n```\n\nNext I write the actual tool implementations. I use a tiny in-memory knowledge base so the script is fully runnable without signing up for extra APIs. The calculator locks down the allowed character set so eval stays safe.\n\n``` python\nimport json\nimport re\n\nKNOWLEDGE_BASE = {\n    \"oxlo.ai pricing\": \"Oxlo.ai uses flat per-request pricing. One API call costs the same regardless of prompt length, which makes it cheaper than token-based providers for long-context workloads.\",\n    \"oxlo.ai models\": \"Oxlo.ai hosts 45+ models including Llama 3.3 70B, DeepSeek R1 671B, Qwen 3 32B, and Kimi K2.6.\",\n    \"moon landing\": \"The first crewed moon landing was Apollo 11 on July 20, 1969.\"\n}\n\ndef search(query: str) -> str:\n    q = query.lower()\n    for key, value in KNOWLEDGE_BASE.items():\n        if key in q or q in key:\n            return value\n    return \"No relevant information found.\"\n\ndef calculator(expression: str) -> str:\n    if not re.fullmatch(r\"[\\d\\s\\.\\+\\-\\*/\\(\\)]+\", expression):\n        return \"Error: invalid characters in expression.\"\n    try:\n        result = eval(expression, {\"__builtins__\": {}}, {})\n        return str(result)\n    except Exception as e:\n        return f\"Error: {e}\"\n\ndef dispatch_tool(name: str, arguments: str) -> str:\n    args = json.loads(arguments)\n    if name == \"search\":\n        return search(args[\"query\"])\n    if name == \"calculator\":\n        return calculator(args[\"expression\"])\n    return \"Unknown tool.\"\n```\n\nThis is the core loop. I send the conversation to Llama 3.3 70B on Oxlo.ai with tools enabled. If the model requests tool calls, I execute them locally, append the results, and send the updated conversation back for the final answer.\n\n``` php\ndef ask_agent(user_message: str) -> str:\n    messages = [\n        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n        {\"role\": \"user\", \"content\": user_message},\n    ]\n\n    while True:\n        response = client.chat.completions.create(\n            model=\"llama-3.3-70b\",\n            messages=messages,\n            tools=TOOLS,\n            tool_choice=\"auto\",\n        )\n\n        msg = response.choices[0].message\n\n        if msg.tool_calls:\n            messages.append({\n                \"role\": \"assistant\",\n                \"content\": msg.content or \"\",\n                \"tool_calls\": [\n                    {\n                        \"id\": tc.id,\n                        \"type\": tc.type,\n                        \"function\": {\n                            \"name\": tc.function.name,\n                            \"arguments\": tc.function.arguments,\n                        },\n                    } for tc in msg.tool_calls\n                ],\n            })\n\n            for tc in msg.tool_calls:\n                result = dispatch_tool(tc.function.name, tc.function.arguments)\n                messages.append({\n                    \"role\": \"tool\",\n                    \"tool_call_id\": tc.id,\n                    \"content\": result,\n                })\n        else:\n            return msg.content\n```\n\nRepeated lookups waste time. An LRU cache on the search function cuts redundant work and keeps the agent snappy, which is especially helpful when you are iterating quickly against a per-request pricing model.\n\n``` python\nfrom functools import lru_cache\n\n@lru_cache(maxsize=128)\ndef cached_search(query: str) -> str:\n    return search(query)\n\ndef dispatch_tool(name: str, arguments: str) -> str:\n    args = json.loads(arguments)\n    if name == \"search\":\n        return cached_search(args[\"query\"])\n    if name == \"calculator\":\n        return calculator(args[\"expression\"])\n    return \"Unknown tool.\"\n```\n\nHere is the entry point. I ask a question that requires both a fact lookup and a hypothetical calculation so you can see both tools fire.\n\n```\nif __name__ == \"__main__\":\n    question = (\n        \"How much would 1000 API requests cost on Oxlo.ai per day if each request is flat priced? \"\n        \"Also, what models are available?\"\n    )\n    answer = ask_agent(question)\n    print(answer)\n```\n\nExample output:\n\n```\nBased on the search results:\n\n- Oxlo.ai uses flat per-request pricing. One API call costs the same regardless of prompt length, which makes it cheaper than token-based providers for long-context workloads. [Source: search]\n- Oxlo.ai hosts 45+ models including Llama 3.3 70B, DeepSeek R1 671B, Qwen 3 32B, and Kimi K2.6. [Source: search]\n\nFor 1000 API requests per day, you would pay 1000 times the flat per-request rate. Because Oxlo.ai does not use token-based billing, the total is predictable and does not scale with input length.\n```\n\nThat is the entire agent. By offloading facts and math to deterministic tools, you eliminate the most common failure modes of raw LLM outputs. Because Oxlo.ai charges a flat rate per request, you can feed long tool results back into context without watching token meters run up, which makes this pattern cheap to operate at scale.\n\nTwo concrete next steps. First, swap the in-memory KNOWLEDGE_BASE for a real vector database like Qdrant so the agent can search your own documents. Second, add Pydantic validation to the tool arguments so malformed calls fail fast before they hit your logic.", "url": "https://wpnews.pro/news/overcoming-llm-limitations", "canonical_source": "https://dev.to/shashank_ms_6a35baa4be138/overcoming-llm-limitations-3din", "published_at": "2026-06-16 19:35:11+00:00", "updated_at": "2026-06-16 19:47:12.338757+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools"], "entities": ["Oxlo.ai", "Llama 3.3 70B", "OpenAI SDK"], "alternates": {"html": "https://wpnews.pro/news/overcoming-llm-limitations", "markdown": "https://wpnews.pro/news/overcoming-llm-limitations.md", "text": "https://wpnews.pro/news/overcoming-llm-limitations.txt", "jsonld": "https://wpnews.pro/news/overcoming-llm-limitations.jsonld"}}