Beyond Function Calling: How the Model Context Protocol (MCP) Turns AI Agents into Self-Evolving Systems

wpnews.pro

Imagine building a highly skilled master craftsman. This craftsman possesses immense cognitive power—the ability to reason, plan, and decompose incredibly complex problems. But there’s a catch: they are locked in an empty, windowless room. They have no raw materials, no specialized tools, and no way to interact with the outside world. Their brilliant cognitive power remains entirely theoretical.

This is the state of most modern Large Language Models (LLMs). They are intellectual giants trapped in digital sensory deprivation chambers.

To break them out, we historically relied on hardcoded "tool calling" or custom API integrations. But anyone who has built production-grade AI agents knows the painful truth: hardcoded tool execution is brittle, monolithic, and incredibly difficult to scale. Every time you add a new tool, you risk confusing the model, breaking your prompts, or introducing critical security vulnerabilities.

A quiet revolution is underway to solve this once and for all. It is called the Model Context Protocol (MCP).

In this deep dive, we will explore how the Hermes Agent architecture implements MCP not just as a way to call tools, but as a universal, bidirectional, and standardized integration bus. We will look at the production-grade Python patterns that turn an isolated LLM into a modular, self-improving "system of systems."

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)

To understand the Model Context Protocol, we must first discard the mental model of a simple function call. MCP is not an API endpoint; it is a standardized workshop interface.

It defines the exact specifications for every tool, every drawer, every power outlet, and every raw material bin in our craftsman's workshop. It doesn't matter if a tool is a simple local file writer or a complex browser automation suite hosted on a remote server. As long as it adheres to the MCP standard, the agent can pick it up and use it without any retraining.

This architectural shift achieves a clean separation of cognitive capability (the agent) from operational capability (the tools).

In the Hermes codebase, this separation is stark:

AIAgent

class is the craftsman. It doesn't know how to search the web, execute code, or read databases. It only knows how to reason and issue intent.model_tools.py

) acts as the "nervous system," translating the agent's intent into standardized protocol calls and routing them to the appropriate tool hosts.This architecture stands on three core pillars: Standardized Schema Definition, Secure Client-Server Communication, and Closed-Loop Observability. Let's break down how each of these is implemented in production code.

In traditional software engineering, we rely on rigid API contracts. In an agentic architecture, the contract must be understood by both machines and probabilistic neural networks.

Under MCP, this contract is a JSON Schema that serves three distinct purposes simultaneously:

But static schemas are a recipe for failure. If you present a model with 100 tools at once, its reasoning capability degrades due to context distraction. The solution? Dynamic, context-aware schema generation.

Below is how Hermes dynamically computes tool definitions at runtime:

def get_tool_definitions(
    enabled_toolsets: List[str] = None,
    disabled_toolsets: List[str] = None,
    quiet_mode: bool = False,
) -> List[Dict[str, Any]]:
    """
    Get tool definitions for model API calls with toolset-based filtering.
    All tools must be part of a toolset to be accessible.
    """

    filtered_tools = registry.get_definitions(tools_to_include, quiet=quiet_mode)

    if "execute_code" in available_tool_names:
        sandbox_enabled = SANDBOX_ALLOWED_TOOLS & available_tool_names
        dynamic_schema = build_execute_code_schema(sandbox_enabled, mode=_get_execution_mode())
        for tool in filtered_tools:
            if tool["name"] == "execute_code":
                tool["parameter_schema"] = dynamic_schema
                break

    if discord_tool_name in available_tool_names:
        dynamic_schema = build_discord_schema_based_on_intents()
        for tool in filtered_tools:
            if tool["name"] == discord_tool_name:
                tool["parameter_schema"] = dynamic_schema
                break

    return filtered_tools

The schema is not a static document; it is a living contract. If the agent's code execution sandbox loses access to a specific library, the execute_code

schema is instantly rebuilt to omit that capability. If a Discord bot lacks certain admin permissions, those tools vanish from the schema.

By dynamically tailoring the schema to the environment, you prevent the LLM from attempting impossible actions, dramatically cutting down on execution errors and wasted API tokens.

Even with perfect schemas, LLMs occasionally output malformed JSON (e.g., trailing commas, unclosed brackets, or Python-style None

instead of JSON null

). To maintain system reliability, the orchestrator must perform self-healing on the incoming data before validation:

import re

def _repair_tool_call_arguments(raw_args: str, tool_name: str = "?") -> str:
    """Attempt to repair common LLM-generated malformed JSON arguments."""
    raw_stripped = raw_args.strip()

    if not raw_stripped:
        return "{}"
    if raw_stripped == "None":
        return "{}"

    fixed = raw_stripped
    fixed = re.sub(r',\s*([}\]])', r'\1', fixed)

    return fixed

By placing this validation and repair layer directly in the orchestrator, we prevent raw, malformed syntax from crashing the underlying tool servers.

MCP decouples the agent from its tools by running them in separate processes, containers, or even different machines. This separation provides:

However, this introduces a major technical hurdle: the async impedance mismatch.

Modern LLM orchestrators often run in synchronous, multi-threaded environments (like CLI loops or synchronous web workers), while MCP servers are inherently asynchronous (relying on non-blocking network I/O, WebSockets, or subprocess pipes).

If you try to block an active async event loop from a sync context, you will quickly run into the dreaded RuntimeError: This event loop is already running

or Event loop is closed

errors.

To solve this, Hermes implements a robust asynchronous bridge that manages three distinct event loop strategies depending on the calling thread's state:

import asyncio
import threading
import concurrent.futures

def _run_async(coro):
    """Run an async coroutine safely from any synchronous context."""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None

    if loop and loop.is_running():
        pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
        future = pool.submit(_run_in_worker, coro)
        try:
            return future.result(timeout=300)
        except concurrent.futures.TimeoutError:
            _cancel_all_worker_tasks()
            raise
        finally:
            pool.shutdown(wait=False)

    if threading.current_thread() is not threading.main_thread():
        worker_loop = _get_worker_loop()
        return worker_loop.run_until_complete(coro)

    tool_loop = _get_tool_loop()
    return tool_loop.run_until_complete(coro)

The true magic of the Model Context Protocol is not just that it allows an agent to act, but that it enables the agent to learn from its actions. Every tool call is a telemetry event that feeds back into the agent's memory.

When the agent calls a tool, the orchestrator doesn't just return the raw string output. It measures execution latency, captures system logs, tracks resource consumption, and triggers hooks that modify the agent's internal state.

Here is how the central dispatch function handles this feedback loop:

import time

def handle_function_call(
    function_name: str,
    function_args: Dict[str, Any],
    task_id: Optional[str] = None,
    tool_call_id: Optional[str] = None,
    session_id: Optional[str] = None,
) -> str:
    coerced_args = validate_and_coerce(function_name, function_args)

    dispatch_start = time.monotonic()

    try:
        result = registry.dispatch(function_name, coerced_args)
        is_error = False
    except Exception as e:
        result = str(e)
        is_error = True

    duration_ms = int((time.monotonic() - dispatch_start) * 1000)

    invoke_hook(
        "post_tool_call",
        tool_name=function_name,
        args=coerced_args,
        result=result,
        duration_ms=duration_ms,
        failed=is_error
    )

    hook_results = invoke_hook("transform_tool_result", tool_name=function_name, result=result)
    for hook_result in hook_results:
        if isinstance(hook_result, str):
            result = hook_result
            break

    return result

This telemetry data doesn't just sit in a log file; it is consumed live by the agent to make strategic decisions:

failed

flag and automatically attempts a fallback strategy (e.g., querying an alternate search index).The pinnacle of this closed-loop observability is what we call the Ouroboros Pattern—an agent recursively using its own tools to review and optimize its own behavior.

In Hermes, when a main task is completed, the orchestrator spawns a background "Review Agent." This review agent is given access to a highly specialized subset of tools: memory

and skills

. It reads the transaction log of the conversation that just occurred, analyzes what went right and what went wrong, and writes new procedural knowledge directly back to the main agent's persistent memory.

def _spawn_background_review(self, messages_snapshot, review_memory, review_skills):
    """Spawn a background thread to review the conversation and save new skills/memories."""
    def _run_review():
        review_agent = AIAgent(
            model=self.model,
            max_iterations=16,
            quiet_mode=True,
            provider=self.provider,
            api_key=self.api_key,
            enabled_toolsets=["memory", "skills"],  # Restrict tools to memory writing
        )

        review_prompt = (
            "Analyze the conversation history. Extract key user preferences, "
            "successful code patterns, or tool execution failures. Use the "
            "provided tools to save these as persistent memories or skills."
        )

        review_agent.run_conversation(
            user_message=review_prompt,
            conversation_history=messages_snapshot,
        )

        actions = self._summarize_background_review_actions(review_agent.history)
        if actions:
            summary = " · ".join(dict.fromkeys(actions))
            self._safe_print(f"  💾 Self-improvement complete: {summary}")

    threading.Thread(target=_run_review, daemon=True).start()

This background review loop is completely non-blocking. While the user is reading the agent's response, a background thread is spinning up a separate context, evaluating the tool execution latency, and updating the agent's "Soul," "Memory," and "Skills" databases. On the very next prompt, the agent is already smarter, faster, and more aligned with the user's workflow.

To visualize how these components interact, let's look at the flow of a single user interaction through this multi-layered architecture:

This is the power of the MCP Revolution: action and learning are two sides of the same coin.

For years, developers treated AI agents like traditional software programs—writing rigid, hardcoded wrappers around API calls. The Model Context Protocol changes the paradigm.

By standardizing the communication layer, dynamically generating schemas, building robust async bridges, and hooking telemetry directly into self-improvement loops, we transition from building static tool users to deploying dynamic, self-evolving tool weavers.

If you are still writing custom wrapper functions for every API you want your LLM to use, it is time to step into the workshop. The tools are ready. The craftsman is waiting. It's time to build.

Leave a comment below with your thoughts and architectural approaches!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.

source & further reading

dev.to — original article Rivalry Roast** — you talk smack about your team or favorite player, out loud, and an AI rival fan claps back at you, in voice, in real time. LLM Inference Latency: Why Your 7B Model Gets 15 tok/s on a T4 but 3,500 tok/s on an H100 What should an AI agent publish as its first public observation?

Beyond Function Calling: How the Model Context Protocol (MCP) Turns AI Agents into Self-Evolving Systems

Run your AI side-project on zahid.host