Build and Deploy a Remote MCP Server to GKE in 30 Minutes #
Integrating context from tools and data sources into LLMs can be challenging, which impacts the ease of development for AI agents. To address this challenge, Anthropic introduced the Model Context Protocol (MCP), which standardizes how applications provide context to these models. Developers often want to build an MCP server for their APIs to make them available to fellow developers, allowing them to use it as context in their own applications. Google Kubernetes Engine (GKE) provides a scalable, reliable, and secure environment to deploy these remote MCP servers.
This guide shows the straightforward process of setting up a secure remote MCP server on GKE.
MCP transports #
The Model Context Protocol follows a client-server architecture. It initially only supported running the server locally using the stdio
transport. The protocol has since evolved and now supports remote access transports, specifically Streamable HTTP.
With Streamable HTTP, the server operates as an independent process that can handle multiple client connections. This transport uses HTTP POST and GET requests. The server must provide a single HTTP endpoint path that supports both POST and GET methods, such as https://example.com/mcp
. You can learn more about the different transports in the official documentation.
Benefits of running an MCP server on GKE #
Running an MCP server remotely on GKE provides several architecture benefits:
Scalability: GKE Autopilot is built to handle highly variable traffic. Since MCP Servers are stateless, GKE can scale horizontally to handle spikes in demand efficiently. Centralized access: Teams can share access to a centralized MCP server, allowing developers to connect from local machines, Agents or pipelines instead of running redundant local servers. Updates to the central server immediately benefit everyone. Enhanced security: The Kubernetes Gateway API combined with SSL certificates provides an easy way to force secure, encrypted traffic. This allows only secure connections to the MCP server, preventing unauthorized access.
Prerequisites #
Before starting, ensure the following tools are installed:
- python 3.10 or higher
- uv (for package and project management, see the
[installation documentation](https://docs.astral.sh/uv/getting-started/installation/))
- Google Cloud SDK (
gcloud
)
kubectl
command-line tool
Installation #
Prepare environment variables
- code_block
- <ListValue: [StructValue([('code', 'export PROJECT_ID=$(gcloud config get-value project)\r\nexport REGION=us-central1'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0880>)])]>
Create a folder, mcp-on-gke
, to store the code for the server and deployment.
-
code_block
-
<ListValue: [StructValue([('code', 'mkdir mcp-on-gke && cd mcp-on-gke'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e05e0>)])]> Now configure the Google Cloud credentials and set the active project.
-
code_block
-
<ListValue: [StructValue([('code', 'gcloud auth login\r\ngcloud config set project $PROJECT_ID'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0f70>)])]> Initiate the GKE Autopilot cluster creation in the background. This process takes a few minutes, so starting it now allows the cluster to provision while you complete the rest of the setup. Make sure to use an Autopilot version that ensures Cost-Optimized Compute (CCOP) is enabled for fast autoscale.
-
code_block
-
<ListValue: [StructValue([('code', 'gcloud container clusters create-auto mcp-cluster \\r\n --region $REGION \\r\n --release-channel rapid \\r\n --async'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0700>)])]>
Use uv
to create a project, which will generate a pyproject.toml
file.
-
code_block
-
<ListValue: [StructValue([('code', 'uv init'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0b20>)])]> Next, create the additional files needed:
server.py
for the MCP server code, `test_server.py`
for testing, and a `Dockerfile`
for the container deployment.
Math MCP server #
Large language models are excellent at non-deterministic tasks, such as generating text, summarizing ideas, and reasoning about concepts. However, they can be unreliable for deterministic tasks like math operations. To solve this, developers can create tools that provide valuable context. Using FastMCP, a framework for building MCP servers in Python, it is possible to create a simple math server with two tools: add and subtract.
First, add FastMCP as a dependency.
-
code_block
-
<ListValue: [StructValue([('code', 'uv add fastmcp\r\nuv add asyncio'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0340>)])]> Copy the following code into
server.py
to create the server.
- code_block
- <ListValue: [StructValue([('code', 'from fastmcp import FastMCP\r\nfrom starlette.requests import Request\r\nfrom starlette.responses import PlainTextResponse\r\nimport asyncio\r\nimport logging\r\n\r\nlogger = logging.getLogger(name)\r\nlogging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)\r\n\r\nmcp_port=3000\r\n\r\n# Initialize the FastMCP server\r\nserver = FastMCP(\r\n "Math Server",\r\n)\r\n\r\n@server.tool()\r\ndef add(a: int, b: int) -> int:\r\n """Add two numbers together."""\r\n return a + b\r\n\r\n@server.tool()\r\ndef subtract(a: int, b: int) -> int:\r\n """Subtract the second number from the first."""\r\n return a - b\r\n\r\n@server.custom_route("/healthz", methods=["GET"])\r\nasync def health_check(request: Request) -> PlainTextResponse:\r\n """Simple health check endpoint that returns a 200 OK response"""\r\n return PlainTextResponse("OK")\r\n\r\nif name == "main":\r\n logger.info(f" MCP server started on port {mcp_port}")\r\n # Could also use 'sse' transport, host="0.0.0.0" required for Cloud Run.\r\n asyncio.run(\r\n server.run_async(\r\n transport="streamable-http", \r\n host="0.0.0.0",\r\n port=mcp_port\r\n )\r\n )'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0d90>)])]>
This example uses the streamable-http
transport, which is recommended for remote servers. The script encapsulates the logic needed to run a scalable MCP endpoint.
Testing the MCP server locally #
Create the test_mcp_server.py
script to connect to test the MCP Server. This will be useful to test the MCP server before deploying it to GKE.
- code_block
- <ListValue: [StructValue([('code', 'from fastmcp import Client, FastMCP\r\nimport asyncio\r\nimport logging\r\n\r\n# Connect to the remote MCP server\r\nclient = Client("https://localhost:3000/mcp")\r\n\r\nasync def test_remote_server():\r\n async with client:\r\n # Basic server interaction\r\n await client.ping()\r\n\r\n # List available operations\r\n tools = await client.list_tools()\r\n print(f"Available tools: {tools} \n")\r\n\r\n # Execute add operation\r\n result = await client.call_tool("add", {"a": 5, "b": 3})\r\n print(f"Result of addition: {result} \n")\r\n\r\n # Execute subtract operation\r\n result = await client.call_tool("subtract", {"a": 5, "b": 3})\r\n print(f"Result of subtraction: {result} \n")\r\n\r\nif name == "main":\r\n asyncio.run(test_remote_server())'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0970>)])]>
Run the MCP server locally to test the connection:
-
code_block
-
<ListValue: [StructValue([('code', 'uv run server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e04c0>)])]> Then execute the test script in a new terminal to verify the connection.
-
code_block
-
<ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0fa0>)])]> The output should print available tools and the results of invocing the
add
and subtract
tools confirming the MCP server is functional.
Building the container image #
To speed up the deployment process, build the container image while the cluster is still creating.
First, prepare the Dockerfile
:
- code_block
- <ListValue: [StructValue([('code', 'FROM python:3.10-slim\r\nCOPY --from=ghcr.io/astral-sh/uv:0.4.15 /uv /bin/uv\r\nWORKDIR /app\r\nCOPY pyproject.toml .\r\nCOPY server.py .\r\nRUN uv sync\r\nCMD ["uv", "run", "server.py"]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0be0>)])]>
Now, set up the Artifact Registry and build the container image.
Set up Artifact Registry #
- code_block
- <ListValue: [StructValue([('code', 'gcloud artifacts repositories create mcp-repo \r\n--repository-format=docker \r\n--location=$REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5615afea60>)])]>
Build and push the image in parallel #
-
code_block
-
<ListValue: [StructValue([('code', 'gcloud builds submit --tag $REGION-docker.pkg.dev/$PROJECT_ID/mcp-repo/math-mcp-server:latest'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5615afe970>)])]> Once the image build is complete, verify that the cluster is ready and retrieve the credentials. If the output of the cluster is not "RUNNING" wait for it to be ready.
-
code_block
-
<ListValue: [StructValue([('code', 'gcloud container clusters list\r\ngcloud container clusters get-credentials mcp-cluster --region $REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802fd0>)])]>
Deploying to GKE with Gateway API and SSL #
The next step involves deploying the server workloads and exposing them securely using the Kubernetes Gateway API rather than the legacy Ingress. This guarantees secure, encrypted traffic via SSL certificates.
Create a deployment.yaml
file to define the Kubernetes Deployment and Service. Replace the placeholders with your actual project ID and region.
-
code_block
-
<ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: mcp-server\r\nspec:\r\n replicas: 2\r\n selector:\r\n matchLabels:\r\n app: mcp-server\r\n template:\r\n metadata:\r\n labels:\r\n app: mcp-server\r\n spec:\r\n containers:\r\n - name: mcp-server\r\n image: $REGION-docker.pkg.dev/$PROJECT_ID/mcp-repo/math-mcp-server:latest\r\n ports:\r\n - containerPort: 3000\r\n resources:\r\n requests:\r\n memory: "256Mi"\r\n cpu: "250m"\r\n limits:\r\n memory: "512Mi"\r\n cpu: "500m"\r\n livenessProbe:\r\n httpGet:\r\n path: /healthz\r\n port: 3000\r\n initialDelaySeconds: 15\r\n periodSeconds: 20\r\n readinessProbe:\r\n httpGet:\r\n path: /healthz\r\n port: 3000\r\n initialDelaySeconds: 5\r\n periodSeconds: 10\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n name: mcp-service\r\nspec:\r\n selector:\r\n app: mcp-server\r\n ports:\r\n - port: 80\r\n targetPort: 3000'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802dc0>)])]> Apply this configuration to the cluster:
-
code_block
-
<ListValue: [StructValue([('code', 'kubectl apply -f deployment.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802f40>)])]> Check the pods are up and running
-
code_block
-
<ListValue: [StructValue([('code', 'kubectl get pods'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802eb0>)])]> To ensure our remote MCP Server is accessible let's try to reach it with a port-forward.
-
code_block
-
<ListValue: [StructValue([('code', 'kubectl port-forward svc/mcp-service 8080:80'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614967280>)])]> Run the test script to verify the connection. make sure to edit the MCP Server URL in the test script to
http://localhost:8080/mcp
.
-
code_block
-
<ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614967e80>)])]> Now let's secure the connection. To do so, we'll use a Google-managed SSL certificate and attach it to a Gateway API resource. First, reserve a static IP address for your load balancer:
-
code_block
-
<ListValue: [StructValue([('code', 'gcloud compute addresses create mcp-server-ip --global\r\nexport MCP_SERVER_IP=$(gcloud compute addresses describe mcp-server-ip --global --format="value(address)")\r\necho "Your IP: $MCP_SERVER_IP"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614967850>)])]>
Point your domain's DNS A
record at $MCP_SERVER_IP
. Example: mcp.yourdomain.com
Create a Google-Managed Certificate. Replace mcp.yourdomain.com
with your actual domain.
- code_block
- <ListValue: [StructValue([('code', 'gcloud compute ssl-certificates create mcp-cert --domains mcp.yourdomain.com --global'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616ca6340>)])]>
Create a gateway.yaml
file to provision the load balancer and configure Transport Layer Security (TLS) termination.
- code_block
- <ListValue: [StructValue([('code', '# Gateway: HTTPS load balancer with the managed certificate and static IP\r\napiVersion: gateway.networking.k8s.io/v1beta1\r\nkind: Gateway\r\nmetadata:\r\n name: mcp-gateway\r\nspec:\r\n gatewayClassName: gke-l7-global-external-managed\r\n listeners:\r\n - name: https\r\n protocol: HTTPS\r\n port: 443\r\n tls:\r\n mode: Terminate\r\n options:\r\n networking.gke.io/pre-shared-certs: mcp-cert\r\n addresses:\r\n - type: NamedAddress\r\n value: mcp-server-ip\r\n---\r\n# HTTPRoute: forward traffic to the MCP Server\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: mcp-route\r\nspec:\r\n parentRefs:\r\n - name: mcp-gateway\r\n hostnames:\r\n - "mcp.yourdomain.com"\r\n rules:\r\n - matches:\r\n - path:\r\n type: PathPrefix\r\n value: /mcp\r\n backendRefs:\r\n - name: mcp-service\r\n port: 80\r\n---\r\n# The GCPBackendPolicy is used to configure session affinity and other backend.\r\n# Since MCP Servers are stateful we enable session affinity. This ensures that\r\n# requests from the same client are sent to the same backend.\r\napiVersion: networking.gke.io/v1\r\nkind: GCPBackendPolicy\r\nmetadata:\r\n name: mcp-backend-policy\r\nspec:\r\n default:\r\n sessionAffinity:\r\n type: CLIENT_IP\r\n targetRef:\r\n group: ""\r\n kind: Service\r\n name: mcp-service\r\n---\r\n# The HealthCheckPolicy is used to configure custom health probes for the MCP Server.\r\napiVersion: networking.gke.io/v1\r\nkind: HealthCheckPolicy\r\nmetadata:\r\n name: mcp-health\r\n namespace: default\r\nspec:\r\n default:\r\n checkIntervalSec: 15\r\n timeoutSec: 5\r\n healthyThreshold: 1\r\n unhealthyThreshold: 2\r\n logConfig:\r\n enabled: false\r\n config:\r\n type: HTTP\r\n httpHealthCheck:\r\n port: 3000\r\n requestPath: /healthz\r\n targetRef:\r\n group: ""\r\n kind: Service\r\n name: mcp-service'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614655fd0>)])]>
Deploying this configuration creates the infrastructure required to route external traffic securely to the MCP server.
-
code_block
-
<ListValue: [StructValue([('code', 'kubectl apply -f gateway.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616d7c670>)])]> Wait a few minutes for the load balancer to become active and the certificate to provision. Developers can check the status using
kubectl get gateway mcp-gateway
.
Try to reach the remote MCP Server. Run the test script to verify the connection. make sure to edit the MCP Server URL in the test script to https://mcp.yourdomain.com/mcp
.
-
code_block
-
<ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616d7c850>)])]>
Cleanup #
- code_block
- <ListValue: [StructValue([('code', 'kubectl delete -f deployment.yaml\r\nkubectl delete -f gateway.yaml\r\ngcloud compute addresses delete mcp-server-ip --global\r\ngcloud compute ssl-certificates delete mcp-cert --global\r\ngcloud artifacts repositories delete mcp-repo --location=$REGION\r\ngcloud container clusters delete mcp-cluster --region $REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616d7c370>)])]>
Continue reading
Deploying Model Context Protocol servers to Kubernetes enables new use cases for integrated agents and AI workflows. To dive deeper into these capabilities, explore the following resources: