In the previous posts of this serie, I covered the why behind this journey: reclaiming building time as an EM, adopting an AI-native SDLC, and understanding what MCP servers actually are.
Now it's time to get into the how.
Our Product team wanted to explore whether AI assistants could interact with our accounting API and allow customers to query their data and execute transactions through natural language instead of the UI of our application or ( for the more tech-savyy users) making API calls.
The user (or an agent) could simply ask for invoices, contacts, or transactions and get answers, in a human readable format.
The first version was deliberately simple. As I described in this post my team and I had very limited knowledge of MCP servers and add to discover the topic, and learn it while (vibe)coding the Proof of Concept.
It turned out that creating an MCP server with STDIO transport is really trivial. Especially thanks to FastMCP: in the end it's about wrapping the desired endpoints of our API with the @mcp.tool()
annotation.
from fastmcp import FastMCP
from typing import Optional
import httpx
import os
mcp = FastMCP("my-first-mcp")
API_BASE_URL = "https://my.app.de/api/v1"
API_TOKEN = os.getenv("MY_API_TOKEN")
HEADERS = {
"Authorization": API_TOKEN,
"Content-Type": "application/json",
"Accept": "application/json"
}
@mcp.tool()
async def get_contacts(limit: Optional[int] = 10) -> str:
"""Retrieve contacts from our API."""
async with httpx.AsyncClient() as client:
response = await client.get(
f"{API_BASE_URL}/Contact",
headers=HEADERS,
params={"limit": limit}
)
response.raise_for_status()
data = response.json()
if not data.get("objects"):
return "No contacts found."
result = f"Contacts (showing {len(data['objects'])}):\n"
for contact in data["objects"]:
name = contact.get("name") or "Unknown"
customer_number = contact.get("customerNumber", "N/A")
result += f"• {name} (#{customer_number})\n"
return result.strip()
Replicate for a minimal set of endpoints/methods (get_invoices
, get_vouchers
, get_check_accounts
, get_transactions
, create_contact
), all following the same pattern: call the API, format the response as text, return it - and that's it.
Of course, no classes, no separation of concerns, no error handling beyond a basic try/except, but enough to prove the concept works.
And it did. We configured it in Claude Desktop and Kiro :
{
"mcpServers": {
"accounting_server": {
"command": "/Users/you/.local/bin/uv",
"args": ["--directory", "/path/to/mcp_prototype", "run", "poc_mcp-server.py"],
"env": {
"API_TOKEN": "<YOUR_TOKEN>"
}
}
}
}
Within minutes, our AI assistant could list our contacts, pull invoices, create new client-contacts. The "magic" of MCP just worked: self-describing tools, the LLM deciding which tool to call and with what parameters.
STDIO is great for local use, but we needed the server accessible over the network. The goal: connect non-technical colleagues using Langdock (an AI tool for business users), allow remote access and make the MCP available to other colleages to test it via HTTP with their own AI Tools.
Again with FastMCP it was very simple. just change a couple of parameters, add a custom route for the health check and it was already working:
from mcp.server.fastmcp import FastMCP
from starlette.responses import JSONResponse
mcp = FastMCP(host="0.0.0.0", stateless_http=True)
@mcp.custom_route("/health", methods=["GET"])
async def health_check(request):
return JSONResponse({"status": "healthy"})
@mcp.tool()
async def get_contacts(limit: Optional[int] = 10) -> str:
logger.info(f"[MCP TOOL CALLED] get_contacts(limit={limit})")
We just added basic logging (logger.info
on every tool call) because once you go HTTP, you lose the direct terminal output you had with stdio.
A quick Dockerfile
:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY poc_server_http.py run_http_mcp.py ./
EXPOSE 8000
CMD ["python3", "run_http_mcp.py"]
Build it, run it, and expose it with ngrok for remote access:
docker build -t mcp-server .
docker run -p 8000:8000 -e API_TOKEN=${API_TOKEN} mcp-server
ngrok http 8000
ngrok creates a temporary public URL that tunnels traffic to your local machine - so http://localhost:8000
becomes something like https://abc123.ngrok.io
that anyone on the internet can reach. Super handy for quick demos and testing with remote tools.
⚠️ A word of caution: ngrok exposes your local service to the public internet. In our case, that meant our internal API (with a real token baked in) was temporarily reachable by anyone with the URL. For a quick test of the PoC among colleagues, we accepted the risk but don't leave ngrok running overnight with credentials. And for anything beyond that, you'd want proper authentication, rate limiting, and a real deployment.
That was enough to connect Langdock and Claude to our MCP server remotely. Non-technical colleagues could now ask "show me the last 5 invoices" and get real data back.
In this phase the api-token needed by the API exposed by the MCP server was baked in the running server, so everyone testing the remote server was getting the data from the same test account ( but those who were able to try the STDIO version or start the http server locally themselves, could pass their own token at
docker run
). More on this below. see[this]
Throughout all of this, the MCP Inspector was invaluable. Whether you're building your own MCP server or just trying to understand one you're using, the Inspector lets you see exactly what's going on under the hood.
npx @modelcontextprotocol/inspector
This opens a browser UI at http://localhost:6274
where you point it at your running server (e.g. http://localhost:8000/mcp
) and can:
Essentially the MCP UI Inspector is similar to using Postman to test your APIs. You see exactly what the AI sees: tool names, descriptions, parameter schemas, and responses. If something doesn't work in Claude or Kiro, the Inspector tells you whether the problem is your server or the client.
We used it constantly - to validate tools, debug response formats, and understand what was actually happening under the hood. For local development, the workflow was simple: run the server in Docker, point the Inspector at http://localhost:8000/mcp
, and iterate.
With the Dockerized version working locally, the question became: where do we deploy it?
We had a working container. Now we needed it accessible to multiple users - colleagues testing with Langdock, AI tools connecting remotely, potentially customers down the line. ngrok was fine for a demo, not for anything persistent.
We considered a few options:
AWS Lambda Serverless, pay-per-invocation, no infra to manage. But the interaction with an MCP servers could mean a conversation envolving multiple tool calls. Every new request could spin up a (potentially cold) Lambda, adding latency. Costs could also spike quickly with many short-lived invocations.
EKS (our existing cluster) We already run workloads on EKS, so deploying another pod would have been trivial. But it raised questions considering the long-run: all users hitting the same pod means shared state. Could potentially user A's conversation context leak into user B's session? Even without explicit state, things like connection pools, cached tokens, or in-memory variables could bleed across requests. We'd need to think seriously about multi-tenancy, resource isolation, and session boundaries before putting this anywhere near real users.
Just a few months before AWS launched AWS AgentCore Runtime: a managed service specifically designed to host MCP servers and AI agents. Serverless, auto-scaling, built-in Cognito auth, CloudWatch observability, MCP protocol support out of the box.
We went with AgentCore. The whole point of this PoC was to explore - not just proving "it works" but surfacing the hard questions early. EKS would have taken 30 minutes and given us a green checkmark. But it would have also given Product and management a false sense of confidence: everything's sorted, production is around the corner. In reality, we'd have kicked authentication, multi-tenancy, cost, and operational concerns down the road; exactly the kind of surprises that blow up timelines later.
A PoC that only validates the happy path isn't a PoC.
We wanted to learn how AWS AgentCore Runtime would solve our problem: what it costs, how auth works, where it breaks, and what's still missing.
The deployment config (.bedrock_agentcore.yaml
) looked like this:
default_agent: accounting_server_http
agents:
accounting_server_http:
deployment_type: container
platform: linux/arm64
container_runtime: docker
aws:
region: eu-central-1
network_configuration:
network_mode: PUBLIC
protocol_configuration:
server_protocol: MCP
observability:
enabled: true
We got it deployed. But despite the dedicated AWS MCP Server for AgentCore and even the Kiro Power, the experience was not so smooth.
Documentation was a moving target. AgentCore was (and still is) evolving fast. The docs and the actual CLI behavior didn't always match. AI assistants hallucinated AgentCore commands that didn't exist or had changed.
update:at almost 6 months after the POC i tried again the Agent Core Kiro power to support writing this post - refreshing my memory and validating some of the things we have done, and i must say that it had significantly improved.
So it was deployed? but did it work? How did we invoke it?
AgentCore supports two auth methods: either AWS SigV4 (IAM credentials) or JWT Bearer tokens (via a Cognito authorizer). If you configure the JWT authorizer, SigV4 is disabled. We learned this the hard way.
First deployment: no auth. With authorizer_configuration: null
, agentcore invoke
worked with just your AWS credentials (SigV4):
authorizer_configuration: null
agentcore invoke '{"method": "tools/list"}'
Good enough for "does this thing even start?" but that was not bringing us much further in our PoC, non-technical people were still not able to use the MCP server, as anyone who had no access to CLI and AWS credentials.
Adding Cognito. Our main app is already relying on Amazon Cognito to manage user access, so the natural approach would be using it to allow invocation of Agent Core Runtime.
For this PoC we did not want to connect any existing Pool so we scripted a basic Cognito setup:
COGNITO_USERNAME=testuser COGNITO_PASSWORD=MyPass123! ./utils/setup_cognito.sh
The script outputs the values you need:
export POOL_ID="eu-central-1_abCDEFGHGij"
export CLIENT_ID="6c054s3sajofv7e8u4209bkvmf"
export BEARER_TOKEN="eyJraWQiOi..."
export DISCOVERY_URL="https://cognito-idp.eu-central-1.amazonaws.com/eu-central-1_abCDEFGHGij/.well-known/openid-configuration"
Then we reconfigured AgentCore to require JWT auth:
authorizer_configuration:
customJWTAuthorizer:
discoveryUrl: https://cognito-idp.eu-central-1.amazonaws.com/eu-central-1_abCDEFGHGij/.well-known/openid-configuration
allowedClients:
- 6c054s3sajofv7e8u4209bkvmf
The customJWTAuthorizer
tells AgentCore: "before letting any request through, validate the Bearer token in the Authorization header." Here's how it works:
** discoveryUrl** - Every Cognito pool exposes a discovery URL at
https://cognito-idp.{region}.amazonaws.com/{pool_id}/.well-known/openid-configuration
. AgentCore fetches this to find the public keys used to verify JWT signatures. ** allowedClients** - When you create an "App Client" in your User Pool, Cognito gives you a client ID. Only tokens issued for this specific client are accepted - like saying "only tokens from
We applied this configuration via the CLI:
agentcore configure \
--entrypoint accounting_server_http.py \
--protocol MCP \
--deployment-type container \
--region eu-central-1 \
--disable-memory \
--authorizer-config "{\"customJWTAuthorizer\":{\"discoveryUrl\":\"$DISCOVERY_URL\",\"allowedClients\":[\"$CLIENT_ID\"]}}"
After redeploying, the endpoint required a Bearer token. Trying again with SigV4 would respond - "The agent is configured for a different authorization method than what was used in your request."
agentcore invoke '{"method": "tools/list"}' \
--bearer-token "$BEARER_TOKEN" \
--session-id "test-session-123456789012345678901234567890"
The --session-id
is just a string you make up - any random identifier of 33+ characters. There's nothing to fetch from AWS. You generate it yourself (e.g., a UUID). AgentCore uses it to track a conversation: each unique session ID gets its own isolated microVM. Reuse the same ID across multiple invocations to maintain context; use a new one to start fresh. It's also what drives billing, the session stays alive (and billable) until the idle timeout expires.
Connecting Langdock with the real endpoint. With Cognito in place, we configured Langdock to point at the AgentCore URL with the Bearer token:
{
"name": "Accounting MCP server",
"url": "https://bedrock-agentcore.eu-central-1.amazonaws.com/runtimes/arn%3Aaws%3Abedrock-agentcore%3Aeu-central-1%3A257174212998%3Aruntime%2Faccounting_server_http-nfZ30p98Ct/mcp?qualifier=DEFAULT",
"transport": "streamable-http",
"authentication": {
"type": "bearer",
"token": "eyJraWQiOi..."
}
}
And it worked. Non-technical colleagues could configure the MCP server in Langdock and chat with the AI to get real accounting data back. The PoC was validated end-to-end.
But it immediately surfaced the next problem.
In our PoC, we (the developers) generated the token by running a shell script, then pasted it into Langdock's config. That's not a customer workflow - it's a developer hack.
export BEARER_TOKEN=$(aws cognito-idp initiate-auth \
--client-id "$CLIENT_ID" \
--auth-flow USER_PASSWORD_AUTH \
--auth-parameters USERNAME=$COGNITO_USERNAME,PASSWORD=$COGNITO_PASSWORD \
--region eu-central-1 | jq -r '.AuthenticationResult.AccessToken')
The token expires every hour. When it did, someone had to re-run the script and paste the new token. Non-technical people can't do that - they don't have AWS CLI access, they don't know what a Bearer token is, and they shouldn't have to.
On top of that, the MCP server itself still needs our API token to call the downstream accounting API, currently baked into the container as an env var. So you have two auth layers: Cognito for accessing AgentCore, and the API token for the upstream system. For a real product, we'd need to connect our actual Cognito pools (the ones our main app uses for customer login) so that the same login mechanism gives users a Bearer token for AgentCore and the credentials to invoke the underlying API.
Think of it this way: AgentCore's customJWTAuthorizer
is like putting a lock on a door. We proved the lock works. But we haven't built the key-dispensing machine for customers yet. In the PoC, we were the locksmith - we made the key ourselves with CLI tools.
We initially thought this might be the answer, but it's not. AgentCore Identity is designed for the agent accessing external services on behalf of users (e.g., your agent calling GitHub with the user's OAuth token). It solves a different problem: agent-to-service auth, not user-to-agent auth.
Unfortunately we don't know yet. When we were working on the PoC it was late December, Christmas was around the corner, and we'd proven enough to present the results internally.
When January came, a major organisational change completely shifted our priorities. The PoC was left in this state: working, demonstrated, but with the final authentication mystery unsolved. Maybe one day we'll pick it back up. For now, it remains an open question, although we do have a couple of options to evolve the poc:
the app issues the token. Customer logs into your existing product (your CIAM). Your backend generates a JWT and returns it. You expose a "Get MCP Token" button in your app's settings page. Customer copies it into their MCP client config. Still manual copy-paste, but at least the customer doesn't need AWS access. Token refresh could be automated via your app's UI.
OAuth2 flow in the MCP client. The MCP client (Langdock, Claude, etc.) supports OAuth2 natively - clicking "connect" opens a browser, customer logs in via your login page, the token flows back automatically. No copy-paste at all. This is how the GitHub MCP server and Atlassian MCP server work. But it requires the MCP client to implement the OAuth redirect flow, and your identity provider to support it.
There's also the ugly URL problem. You can't exactly hand customers https://bedrock-agentcore.eu-central-1.amazonaws.com/runtimes/arn%3Aaws%3A.../mcp?qualifier=DEFAULT
and call it a day. AWS published a solution for this: put CloudFront in front as a reverse proxy, attach a custom domain via Route 53 + ACM, and your customers see https://agent.yourcompany.com/
instead. CloudFront passes the Authorization header through, so the Cognito Bearer token still works. It doesn't solve the token-dispensing problem, but at least the endpoint looks like something you'd put in a product.
While a colleague from the Data team and I were exploring AgentCore Runtime, another colleague tried a different approach with AgentCore Gateway.
The idea is compelling: instead of writing MCP tool wrappers by hand, you feed your OpenAPI spec to the Gateway and it generates MCP tools automatically.
agentcore gateway create-mcp-gateway \
--name accounting-gateway \
--region eu-central-1
agentcore gateway create-mcp-gateway-target \
--gateway-arn <arn> \
--target-type openApiSchema \
--target-payload file://our-openapi.yaml
No custom Python code. No FastMCP. The Gateway reads your API spec and exposes each endpoint as an MCP tool.
In practice, it didn't go smoothly either. Our OpenAPI spec was large and not perfectly RESTful - the Gateway choked on it. The solution was to extract a subset of endpoints manually.
He also built the full infrastructure with Terraform (Runtime, Gateway, Memory, IAM, ECR) - which gave us a reproducible setup but added complexity.
We instead chose the AgentCore CLI. I found the CLI quite cool for how it easily scaffolds projects and simplifies deployment, but I doubt I would rely on it for production - there I want my resources in Terraform and the deployment within our established pipelines.
Both approaches taught us something: the Runtime path gives you full control but requires writing and maintaining MCP server code. The Gateway path is faster if your API spec is clean (and small), but you lose flexibility.
Beyond authentication, two other things caught us off guard during the AgentCore experience.
Costs surprised us. For a PoC with ~19 sessions and 35 invocations, we spent about $0.50. That doesn't sound like much - until you understand how the billing works.
AgentCore Runtime charges based on active resource consumption per second (pricing page):
The key nuance: billing spans the entire session lifetime - from microVM boot to session termination. Sessions stay alive for the configured idle timeout (default 15 minutes) after the last request. So even if your tool call takes 2 seconds, the session (and its memory billing) continues for another 15 minutes waiting for the next request.
In our case: 19 sessions × 15 minutes idle each = nearly 5 hours of memory billing for what amounted to a few minutes of actual work. Our breakdown was roughly:
Important update:The pricing model has evolved since our initial PoC. AWS now bills based onactive consumptionrather than pre-allocated resources - meaning I/O wait time (waiting for LLM responses, API calls) is free for CPU. This is a significant improvement over traditional compute pricing. Our early experience was with the previous billing model, so your mileage may vary. Always check the[current pricing page].
Logging was noisy. AgentCore automatically sends logs to CloudWatch, which sounds great, but we were a bit disappointed when we started looking at them. The log streams are cluttered with platform-level noise: health checks, container lifecycle events, internal routing messages, and recurring error messages from the infrastructure layer that have nothing to do with your code.
Our actual application logs - the logger.info("[MCP TOOL CALLED] get_contacts")
lines we added - were buried in there. The only log we could reliably spot was the server startup message. Tool execution logs either didn't appear or were lost in the noise. Part of the problem was that MCP servers communicate via stdout (for stdio) or HTTP responses (for HTTP transport), so print()
and basic logging
output doesn't always end up where you expect in a managed container environment.
That might also be a lack of understanding on my side about how logging works in Python - I'm more of a TypeScript developer and I was vibecoding the application because Python is usually the choice for data teams.
Starting simple is the right call. A single Python file with FastMCP got us from zero to working PoC in an afternoon. Don't over-engineer the first version.
stdio → HTTP → Cloud is a natural progression. Each step builds on the previous one. Don't skip straight to cloud deployment.
The MCP Inspector is non-negotiable. Use it from day one. It saves hours of debugging.
AgentCore is powerful but young. Expect rough edges, especially around documentation, authentication, and cost visibility. For a PoC, local Docker is the way to go.
AI helped, but it also hallucinated. When working with new/fast-moving services like AgentCore, AI assistants often generated code based on outdated or incorrect documentation. Always verify against the actual CLI and docs.
The code was a mess - and that's fine. A PoC is supposed to be messy. The goal was to validate the idea, not ship production code. We proved that AI tools could query our accounting system through MCP. That was the win.
After this first round, we had a working prototype but a long list of open questions:
And then there's code quality. The prototype was functional but not maintainable: no tests, no structure, no error handling worth mentioning. That's perfectly normal for a PoC. But I've seen too many PoCs go straight to production. Everyone gets excited - especially Product Owners - because "it works! Why throw it away?" And before you know it, you're maintaining spaghetti code in production, afraid to touch anything because you don't know what will break.
I didn't want that to happen here. And since I was looking for an excuse to experiment with Spec-Driven Development with Kiro, I decided to rebuild the MCP server from scratch in a new repository - no reference to the old code, proper architecture from the start.
But that's the next post.
Useful Resources: