Production-ready Python MCP server for LLM delegation and sub-agent orchestration. A parent LLM (for example, Cursor’s agent) connects to this server and delegates work by calling a tool named after each agent defined in YAML (for example, researcher
).
Each sub-agent is defined in YAML with its own LLM, system prompt, and optional downstream MCP tool servers.
What is this?Why use it?FeaturesPrerequisitesQuick startVerify installationConnect CursorHow it worksAgent configurationMCP tools referenceEnvironment variablesProject layoutDevelopmentDocker image and releasesTroubleshootingLicense
Sub-Agent MCP sits between a parent LLM and one or more specialized sub-agents:
- The parent connects to this server over
Streamable HTTP at
/mcp
. - At startup, each agent in
agents.yaml
is registered as an MCP tool named by itsid
. - The parent calls that tool with a
prompt
; the server runs the sub-agent and returns the final response.
Each sub-agent is a LangChain agent with its own OpenAI-compatible LLM, system prompt, and optional connections to other MCP servers (filesystem, search, your own tools, and so on). Tool access can be restricted per agent with an allowlist.
sequenceDiagram
participant Parent as Parent_LLM
participant SAMCP as Sub_Agent_MCP
participant Agent as LangChain_sub_agent
participant LLM as OpenAI_compatible_LLM
participant Tools as Downstream_MCP_servers
Parent->>SAMCP: researcher(prompt)
SAMCP->>Agent: build agent + downstream tools
Agent->>LLM: reasoning loop
Agent->>Tools: MCP tool calls
Tools-->>Agent: tool results
LLM-->>Agent: final answer
Agent-->>SAMCP: response text
SAMCP-->>Parent: {"response": "..."}
This is different from giving one agent every tool in the workspace: the parent stays lightweight, roles stay explicit, and each sub-agent only sees the MCP servers and tools you configure for it.
Delegation without context bloat— The parent calls agent tools directly; it does not need every downstream tool schema in its own context.** Per-role configuration**— Differentid
s can use different models, prompts, MCP servers, and tool allowlists.Production-oriented— Pydantic-validated YAML, structured logging, Docker health checks, CI, and GHCR images on release tags.** OpenAI-compatible providers**— Pointllm.base_uri
at OpenAI, Azure, Ollama, LM Studio, or any compatible API.
Transport
- Streamable HTTP only (
streamable-http
); no stdio or legacy SSE.
Configuration
- YAML agent definitions with strict Pydantic validation.
- Environment substitution:
${VAR}
and${VAR:-default}
.
Runtime
- LangChain 1.x
create_agent
with OpenAI-compatible chat models. - Per-agent MCP connections via
langchain-mcp-adapters
, with optionalserver.tool
allowlists.
Security
- Agent tool descriptions never expose API keys.
Operations
- Structured logging ( structlog). - Docker image with health check; GitHub Actions CI.
- Container images published to GHCR on version tags (
v0.x.y
).
MCP tools exposed by this server
Each agent in agents.yaml
is registered as a tool named by its id
(for example, researcher
). Each tool accepts a single prompt
argument and runs that sub-agent.
| Requirement | Notes |
|---|---|
| Python 3.10+ | CI uses 3.12; requires-python >= 3.10 in pyproject.toml |
| API key for your LLM provider | Default example uses OPENAI_API_KEY |
pip |
Choose one path. Docker Compose is the fastest way to run the full demo stack (server + mock tool servers).
export OPENAI_API_KEY=sk-...
docker compose up --build
| Service | Port | Endpoint |
|---|---|---|
| Sub-Agent MCP | 8000 | http://localhost:8000/mcp |
| Mock filesystem MCP | 8001 | http://localhost:8001/mcp |
| Mock search MCP | 8002 | http://localhost:8002/mcp |
The bundled config/agents.yaml uses Docker network hostnames (filesystem-mcp
, search-mcp
), which resolve correctly inside Compose.
uv sync --dev
cp config/agents.example.yaml config/agents.yaml # if you don't have config/agents.yaml yet
export OPENAI_API_KEY=sk-...
python -m sub_agent_mcp.main
Server listens at http://0.0.0.0:8000/mcp
(reachable as http://localhost:8000/mcp
from your machine).
Important: The example config points MCP servers at Docker service names. For local Python you must either run the mock servers and use localhost
URLs (see table below), or run only the mocks via Compose:
docker compose up filesystem-mcp search-mcp
export OPENAI_API_KEY=sk-...
python -m sub_agent_mcp.main
| Environment | filesystem MCP URL | search MCP URL |
|---|---|---|
| Docker Compose network | http://filesystem-mcp:8001/mcp |
|
http://search-mcp:8002/mcp |
||
| Host machine / local Python | http://localhost:8001/mcp |
|
http://localhost:8002/mcp |
Images are published on git tags matching v*
(for example v0.1.2
), not on every push to main
.
docker run -p 8000:8000 \
-e OPENAI_API_KEY=sk-... \
-v "$(pwd)/config/agents.yaml:/app/config/agents.yaml:ro" \
ghcr.io/stormaref/sub-agent-mcp:latest
Mount your own agents.yaml
and ensure MCP url
values are reachable from inside the container (use host networking, service names, or host.docker.internal
as appropriate).
OpenAPI document (generated from registered MCP tools):
curl -s http://localhost:8000/mcp/openapi.json | head
Open WebUI (OpenAPI tool server) — Register as OpenAPI (not MCP):
| Field | Value |
|---|---|
| URL | |
http://localhost:8000/mcp (or http://mcp.example.com/sub-agent/mcp behind a proxy) |
|
| OpenAPI path | |
openapi.json (default) |
Open WebUI appends tool paths from the spec (for example /tools/researcher
) to that URL:
curl -s -X POST http://localhost:8000/mcp/tools/researcher \
-H 'Content-Type: application/json' \
-d '{"prompt":"Summarize what tools you have access to."}'
Docker health — The image health check probes http://127.0.0.1:8000/mcp
.
Functional check — After connecting Cursor (below), ask the parent agent to call the researcher
tool with a short prompt. A successful run returns { "response": "..." }
.
- Open Cursor Settings → MCP(or edit your MCP config JSON). - Add the server:
{
"mcpServers": {
"sub-agent-mcp": {
"url": "http://localhost:8000/mcp"
}
}
}
- Ensure the Sub-Agent MCP process is running and Cursor can reach
localhost:8000
. On WSL or Docker Desktop, confirm port forwarding if the server runs in a VM or container. - Reload MCP tools. You should see one tool per agent (for example,
).researcher
Example delegation prompt for the parent agent:
Call the
researcher
tool with prompt: "Summarize what tools you have access to."
Other MCP clients — Any client that supports Streamable HTTP can connect to http://<host>:8000/mcp
. Refer to your client’s MCP documentation for URL-based server configuration; this server does not use stdio transport.
flowchart LR
subgraph parent [Parent]
Cursor[Cursor agent]
end
subgraph subagentmcp [Sub_Agent_MCP]
ToolsMCP[agent tools e.g. researcher]
Builder[Agent builder]
end
subgraph subagent [Sub_agent runtime]
LC[LangChain create_agent]
LLM[OpenAI_compatible LLM]
end
subgraph downstream [Per_agent MCP servers]
FS[filesystem MCP]
SRCH[search MCP]
end
Cursor -->|Streamable HTTP /mcp| ToolsMCP
ToolsMCP --> Builder
Builder --> LC
LC --> LLM
LC --> FS
LC --> SRCH
Agent tool flow
- Load and validate
config/agents.yaml(or
AGENTS_CONFIG_PATH
) at startup. - Register one MCP tool per agent, named by
id
. - When a tool is called, build an OpenAI-compatible chat model from that agent’s
llm.*
. - Connect to the agent’s
mcp_servers
, discover tools, applytool_allowlist
if set. - Run the LangChain agent loop (bounded by
AGENT_RECURSION_LIMIT
). - Return the final assistant message as
{ "response": "..." }
, or{ "error": "..." }
on failure.
Agents are loaded from config/agents.yaml
at startup. Override the path with AGENTS_CONFIG_PATH
.
agents:
- id: researcher
title: Research Agent
description: "Agent specialized in research tasks"
llm:
base_uri: https://api.openai.com/v1
api_key: ${OPENAI_API_KEY}
model_id: gpt-4.1-mini
system_prompt: |
You are a helpful research assistant.
mcp_servers:
- name: filesystem
transport: streamable_http
url: http://filesystem-mcp:8001/mcp
headers: {}
- name: search
transport: streamable_http
url: http://search-mcp:8002/mcp
tool_allowlist:
- filesystem.read_file
- search.web_search
Copy config/agents.example.yaml as a starting point.
| Field | Description |
|---|---|
id |
|
Unique slug; must start with a lowercase letter, then lowercase letters, digits, - , or _ |
|
title |
|
| Human-readable name | |
description |
|
| Agent purpose (shown in the tool description exposed to the parent) | |
llm.base_uri |
|
| OpenAI-compatible API base URL | |
llm.api_key |
|
API key; supports ${ENV_VAR} substitution |
|
llm.model_id |
|
| Model identifier for the provider | |
llm.reasoning_effort |
|
Optional reasoning budget: none , minimal , low , medium , high , xhigh |
|
llm.reasoning_summary |
|
Optional reasoning summary style: auto , concise , detailed (use detailed for longer) |
|
llm.verbosity |
|
Optional response verbosity for reasoning models: low , medium , high |
|
llm.max_tokens |
|
| Optional max output tokens (increase if answers are cut short) | |
llm.temperature |
|
Optional sampling temperature (0.0 –2.0 ) |
|
system_prompt |
|
| System message for the sub-agent | |
mcp_servers |
|
List of remote MCP servers (transport must be streamable_http ) |
|
mcp_servers[].name |
|
Short name used in qualified tool names (name.tool ) |
|
mcp_servers[].url |
|
Streamable HTTP MCP endpoint (must end with /mcp for standard layouts) |
|
mcp_servers[].bearer_token |
|
Optional bearer token; sent as Authorization: Bearer ... (supports ${ENV_VAR} ) |
|
mcp_servers[].headers |
|
| Optional extra HTTP headers merged with bearer auth | |
tool_allowlist |
|
Optional list of server.tool names; omit to allow all tools from connected servers |
Environment variable substitution supports ${VAR}
and ${VAR:-default}
. If VAR
is unset and no default is provided, startup fails with a clear error.
Second agent (different role, no extra MCP servers)
- id: writer
title: Writing Agent
description: "Drafts and edits text"
llm:
base_uri: https://api.openai.com/v1
api_key: ${OPENAI_API_KEY}
model_id: gpt-4.1-mini
system_prompt: |
You are a concise technical writer.
mcp_servers: []
Local model via Ollama (OpenAI-compatible)
llm:
base_uri: http://localhost:11434/v1
api_key: ${OLLAMA_API_KEY:-ollama}
model_id: llama3.2
Authenticated downstream MCP server
mcp_servers:
- name: my_api
transport: streamable_http
url: https://mcp.example.com/mcp
bearer_token: ${MY_MCP_TOKEN}
You can also set auth manually via headers
:
headers:
Authorization: "Bearer ${MY_MCP_TOKEN}"
Allow all tools from connected servers — Remove tool_allowlist
or set it to null
in YAML (omit the key).
| Error | Typical cause |
|---|---|
| Config file not found | Missing config/agents.yaml ; copy from agents.example.yaml |
| Environment variable not set | ${VAR} without value or ${VAR:-} default |
| Pydantic validation failed | Invalid id , duplicate ids, empty system_prompt , wrong transport |
| Duplicate agent ids | Two agents share the same id |
Each agent in config/agents.yaml
is exposed as an MCP tool named by its id
.
| Parameter | Type | Description |
|---|---|---|
prompt |
||
| string | User message for the sub-agent |
Returns
| Shape | Meaning |
|---|---|
{ "response": "..." } |
|
| Success; final assistant text | |
{ "error": "..." } |
|
| Failure (LLM error, MCP connection error, and so on) |
Tool descriptions include title
, description
, and model_id
but never API keys.
Errors are returned in the result object; they do not crash the MCP server process.
| Variable | Default | Description |
|---|---|---|
AGENTS_CONFIG_PATH |
||
config/agents.yaml |
||
| Path to agents YAML | ||
HOST |
||
0.0.0.0 |
||
| Server bind host | ||
PORT |
||
8000 |
||
| Server bind port | ||
LOG_LEVEL |
||
INFO |
||
Log level (DEBUG , INFO , …) |
||
MCP_CLIENT_TIMEOUT |
||
30 |
||
| Timeout in seconds when connecting to downstream MCP servers; increase for slow tools | ||
AGENT_RECURSION_LIMIT |
||
25 |
||
| Maximum LangChain agent tool-loop steps per agent tool call; increase for multi-step tasks |
config/
agents.example.yaml # Template; copy to agents.yaml
agents.yaml # Runtime config (example committed in repo)
src/sub_agent_mcp/
main.py # FastMCP entry point
server/ # Per-agent MCP tools, OpenAPI route
agent/ # LangChain builder and executor
config/ # YAML and Pydantic schema
mcp_client/ # Downstream MCP connections and tool registry
docker/mock_mcp/ # Dev mock filesystem and search MCP servers
tests/ # pytest suite
scripts/bump-tag.sh # Release tag helper (v0.major.minor)
uv sync --dev
uv run pytest -v
uv run ruff check src tests
Pull requests and pushes to main
run lint and tests in .github/workflows/ci.yml.
Registry: ghcr.io/stormaref/sub-agent-mcp
When images publish: Pushing a git tag v0.*
(for example v0.1.2
) runs the Docker job after tests pass. Pushes to main
alone do not publish images.
Zero versioning: Tags use v0.major.minor
(for example v0.1.0
, v0.1.1
, v0.2.0
).
In VS Code: Terminal → Run Task → Release: bump zero-version tag and push
Or manually:
./scripts/bump-tag.sh minor # v0.1.0 → v0.1.1
./scripts/bump-tag.sh major # v0.1.0 → v0.2.0
Image tags include the semver, latest
, and major.minor aliases per the metadata action in CI.
| Symptom | Likely cause | Fix |
|---|---|---|
Configuration error on startup |
||
Missing or invalid agents.yaml |
||
AGENTS_CONFIG_PATH |
Environment variable 'X' is not set
${X}
without default in YAMLexport X=...
or use ${X:-default}
Local vs Docker MCP URLstable401
/ invalid credentialsllm.api_key
for that agentbase_uri
filesystem-mcp
/ search-mcp
; review tool_allowlist
curl http://localhost:8000/mcp/openapi.json
; check firewall / WSL networkingAGENT_RECURSION_LIMIT