{"slug": "day-5-100-tool-use-and-function-calling-explained-from-scratch", "title": "[Day 5/100] Tool Use and Function Calling, Explained from Scratch", "summary": "A developer explains that large language model function calling is actually trained, constrained text generation, not real code execution, and walks through the end-to-end cycle of tool calls in agentic AI systems.", "body_md": "In [ Day 1](https://medium.com/towards-artificial-intelligence/day-1-100-what-is-agentic-ai-beyond-chatbots-and-copilots-63bf6cbec971) we built a weather agent that called a tool. In\n\nWe have skipped over the mechanism that makes the whole thing work: how does the model actually call a function?\n\nToday we open that black box. By the end you will know exactly what happens between *the user asks a question* and *Python executes your code*. You will be able to write tool schemas the model picks correctly, debug tool calls when they go wrong, and reason about why parallel tool calls are sometimes faster.\n\nFunction calling sounds magical. The model *calls a function*. It does not. The model generates text, the same way it always has. Function calling is a convention layered on top.\n\nWhen you give the OpenAI API a tools argument, three things happen behind the scenes.\n\nFirst, the API serializes your tool schemas into a special part of the prompt the model has been trained on. Anthropic, OpenAI, and Google all do this slightly differently, but the principle is the same: your {\"name\": \"get_weather\", ...} JSON gets turned into text the model can read.\n\nSecond, the model is biased (through training and through grammar-constrained sampling) to either generate a normal text response or generate a structured tool call in a specific format.\n\nThird, the API parses that structured output back out and hands it to you as a Python object with tool_calls on it.\n\nFunction calling is *trained, constrained text generation, parsed back into structured data*. There is no actual function being called. Your code calls the function. The model just *requests* the call.\n\nWhy does this matter? Because every quirk of function calling makes sense once you remember it is text generation. The model invents tool names that do not exist? It generated text that looked like a tool call. The arguments are malformed JSON? It generated text that almost matched the JSON grammar. Tool descriptions matter a lot? Of course they do. They are the only thing the model sees about what each tool does.\n\nHere is what happens end to end on a single function call.\n\n```\n1. User input ─────────────────────────► You2. Build messages + tool schemas ──────► OpenAI API3. Model generates response ───────────► OpenAI API4. API parses response ────────────────► You receive `tool_calls` 5. You execute the function ───────────► Your Python runs6. You append the result to messages ──► Conversation grows7. Send back to model ─────────────────► OpenAI API8. Model generates final answer ───────► You9. You return text to user ────────────► Done\n```\n\nEvery loop in an agent is this cycle, repeated. Step 5 is the one place real-world effects happen. Everything else is text in, text out.\n\nIn code it looks like this. Pay attention to the message types.\n\n``` python\nfrom openai import OpenAIimport jsonclient = OpenAI()def get_weather(city: str) -> str:    return f\"Sunny in {city}, 22°C\"tools = [{    \"type\": \"function\",    \"function\": {        \"name\": \"get_weather\",        \"description\": \"Get the current weather in a given city.\",        \"parameters\": {            \"type\": \"object\",            \"properties\": {                \"city\": {                    \"type\": \"string\",                    \"description\": \"City name, e.g. 'Paris' or 'Tokyo'.\"                }            },            \"required\": [\"city\"]        }    }}]messages = [    {\"role\": \"system\", \"content\": \"You are a weather assistant.\"},    {\"role\": \"user\", \"content\": \"What is the weather in Paris?\"}]# Step 1: Model decides to call the tool.resp = client.chat.completions.create(    model=\"gpt-4o\", messages=messages, tools=tools, temperature=0)assistant_msg = resp.choices[0].messagemessages.append(assistant_msg)# Step 2: We execute the tool.for call in assistant_msg.tool_calls:    args = json.loads(call.function.arguments)    result = get_weather(**args)    # Step 3: Append the tool result so the model can see it.    messages.append({        \"role\": \"tool\",        \"tool_call_id\": call.id,        \"content\": result,    })# Step 4: Model produces the final answer.final = client.chat.completions.create(    model=\"gpt-4o\", messages=messages, tools=tools, temperature=0)print(final.choices[0].message.content)\n```\n\nNotice the four message roles in play: system, user, assistant (containing the tool_calls), and tool (containing the result). The tool_call_id connects a result back to the call that produced it. Get this wrong and the model will be confused about which tool returned what.\n\nA tool schema gives the model three things: the name, the description, and the parameter schemas. All three matter, and most beginners only think about the first.\n\nNames should be short, clear verbs. get_weather, search_orders, send_email. Not weather (noun, ambiguous), not do_weather_lookup_for_a_city (long, awkward), not get (no information). The model reads names as part of its decision about which tool to pick.\n\nDescriptions are where most schemas fail. Compare:\n\nBad:\n\n```\n\"description\": \"Gets weather\"\n```\n\nGood:\n\n```\n\"description\": \"Get the current weather in a given city. Use this whenever the user asks about weather, temperature, or what to wear. Returns a one-line summary including temperature and conditions.\"\n```\n\nThe good description tells the model *when* to use the tool, *what* it returns, and *what counts as a relevant question*. The model uses this text to decide between competing tools. If two tools have similar descriptions, the model will pick randomly.\n\nFor each parameter, include the type (string, integer, number, boolean, array, object), a description with an example, an enum if the parameter only accepts a fixed set of values, and the required array for parameters that must be present.\n\n```\n\"parameters\": {    \"type\": \"object\",    \"properties\": {        \"city\": {            \"type\": \"string\",            \"description\": \"City name in English, e.g. 'Paris' or 'Tokyo'.\"        },        \"units\": {            \"type\": \"string\",            \"enum\": [\"celsius\", \"fahrenheit\"],            \"description\": \"Temperature units.\"        }    },    \"required\": [\"city\"]}\n```\n\nThe enum is doing real work. Without it, the model will sometimes pass \"C\", sometimes \"metric\", sometimes \"Celsius\". With it, you get one of two values, every time.\n\nA few mistakes you will make exactly once.\n\n**Tools that overlap.** If search_users and find_users both exist, the model will guess. Pick one.\n\n**Tools that are too coarse.** A single do_everything(action, params) tool destroys the model's ability to reason about which action to take. Split it.\n\n**Tools that are too fine.** A separate tool for get_user_first_name, get_user_last_name, and get_user_email is exhausting. The model will call all three when one would do. Combine them into get_user_profile.\n\n**Tools that hide important information from the model.** If your tool returns a giant nested JSON, the model will pick the wrong fields. Pre-format important results into a short summary string when you can.\n\n**Tools that fail silently.** Returning an empty string or None when something went wrong tells the model nothing. Return an explicit error message. *Tool error: customer_id 12345 not found* is something the model can act on.\n\nA modern feature worth knowing. When the model is confident that two tool calls are independent, it can request both in the same response.\n\n```\n# The model returns two tool_calls in one assistant_msg.for call in assistant_msg.tool_calls:    # Run them in parallel with asyncio or threads.    ...\n```\n\nFor *What is the weather in Paris and Tokyo?* a modern model with a good schema will emit two parallel get_weather calls. Run them concurrently and your latency drops roughly in half.\n\nThis only works if your tools are truly independent. If tool_b depends on the output of tool_a, the model has to do them sequentially across two model turns. Design your tools to be independent when you can. Future you and your latency budget will be glad.\n\nA tool can fail. Network is down. Database returned no rows. The user does not have permission. How you communicate that failure back to the model determines how the agent recovers.\n\nThree rules.\n\n**Return errors as tool results, not exceptions.** Catch the exception inside the tool, format it as a string, and return it.\n\n``` php\ndef get_weather(city: str) -> str:    try:        r = requests.get(f\"https://wttr.in/{city}?format=3\", timeout=5)        r.raise_for_status()        return r.text    except Exception as e:        return f\"Error fetching weather for {city}: {e}\"\n```\n\n**Be specific.** *Error: HTTP 404* is less useful than *Error: city ‘Mars’ not found. Try a real Earth city.* The model can act on the second message and cannot act on the first.\n\n**Distinguish recoverable from terminal errors.** If the agent can fix the call by trying again with different arguments, say so. *Try a shorter time range* or *Use the customer ID, not the customer name* turns a dead end into a retry.\n\nOpenAI, Anthropic, and Google all support function calling, but the message formats differ.\n\nThe mechanics are identical. Only the JSON shape differs. Frameworks like LangChain and LiteLLM exist mostly to paper over this so you can swap providers without rewriting your agent. We will cover this in detail in Phase 2.\n\nThree things, in order.\n\nWatching the model react to a well-described error is the moment tool design becomes a craft. After today, every tool you write should have a clear name, a teaching description, typed parameters with examples, and a thoughtful error contract.\n\nSee you tomorrow for *[Day 6/100] The ReAct Pattern: Reasoning Plus Acting in a Loop*.\n\n*This is Day 5 of the* **100 Days of Learning Agentic AI** *series. See the **full 100-day roadma** p**for everything we will cover. Follow along to build production-grade agents from scratch with LangChain, LangGraph, Langfuse, RAG, local models, and ten end-to-end capstone projects.*\n\n[[Day 5/100] Tool Use and Function Calling, Explained from Scratch](https://pub.towardsai.net/day-5-100-tool-use-and-function-calling-explained-from-scratch-5d3ac2a6b9b2) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.", "url": "https://wpnews.pro/news/day-5-100-tool-use-and-function-calling-explained-from-scratch", "canonical_source": "https://pub.towardsai.net/day-5-100-tool-use-and-function-calling-explained-from-scratch-5d3ac2a6b9b2?source=rss----98111c9905da---4", "published_at": "2026-06-21 18:01:01+00:00", "updated_at": "2026-06-21 18:10:22.074107+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-tools", "natural-language-processing", "developer-tools"], "entities": ["OpenAI", "Anthropic", "Google", "GPT-4o"], "alternates": {"html": "https://wpnews.pro/news/day-5-100-tool-use-and-function-calling-explained-from-scratch", "markdown": "https://wpnews.pro/news/day-5-100-tool-use-and-function-calling-explained-from-scratch.md", "text": "https://wpnews.pro/news/day-5-100-tool-use-and-function-calling-explained-from-scratch.txt", "jsonld": "https://wpnews.pro/news/day-5-100-tool-use-and-function-calling-explained-from-scratch.jsonld"}}