# Build and Deploy a Remote MCP Server to GKE in 30 Minutes

> Source: <https://cloud.google.com/blog/topics/developers-practitioners/build-and-deploy-a-remote-mcp-server-to-gke-in-30-minutes/>
> Published: 2026-06-17 00:00:00+00:00

## Build and Deploy a Remote MCP Server to GKE in 30 Minutes

Integrating context from tools and data sources into LLMs can be challenging, which impacts the ease of development for AI agents. To address this challenge, Anthropic introduced the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction), which standardizes how applications provide context to these models. Developers often want to build an MCP server for their APIs to make them available to fellow developers, allowing them to use it as context in their own applications. Google Kubernetes Engine (GKE) provides a scalable, reliable, and secure environment to deploy these remote MCP servers.

This guide shows the straightforward process of setting up a secure remote MCP server on GKE.

## MCP transports

The Model Context Protocol follows a client-server architecture. It initially only supported running the server locally using the `stdio`

transport. The protocol has since evolved and now supports remote access transports, specifically [Streamable HTTP](https://modelcontextprotocol.io/specification/latest/basic/transports#streamable-http).

With Streamable HTTP, the server operates as an independent process that can handle multiple client connections. This transport uses HTTP POST and GET requests. The server must provide a single HTTP endpoint path that supports both POST and GET methods, such as `https://example.com/mcp`

. You can learn more about the different transports in the [official documentation](https://modelcontextprotocol.io/docs/concepts/architecture#transport-layer).

## Benefits of running an MCP server on GKE

Running an MCP server remotely on GKE provides several architecture benefits:

**Scalability:** GKE Autopilot is built to handle highly variable traffic. Since MCP Servers are stateless, GKE can scale horizontally to handle spikes in demand efficiently.
**Centralized access:** Teams can share access to a centralized MCP server, allowing developers to connect from local machines, Agents or pipelines instead of running redundant local servers. Updates to the central server immediately benefit everyone.
**Enhanced security:** The Kubernetes Gateway API combined with SSL certificates provides an easy way to force secure, encrypted traffic. This allows only secure connections to the MCP server, preventing unauthorized access.

## Prerequisites

Before starting, ensure the following tools are installed:

- python 3.10 or higher
- uv (for package and project management, see the
[installation documentation](https://docs.astral.sh/uv/getting-started/installation/))
- Google Cloud SDK (
`gcloud`

)
`kubectl`

command-line tool

## Installation

Prepare environment variables

- code_block
- <ListValue: [StructValue([('code', 'export PROJECT_ID=$(gcloud config get-value project)\r\nexport REGION=us-central1'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0880>)])]>

Create a folder, `mcp-on-gke`

, to store the code for the server and deployment.

- code_block
- <ListValue: [StructValue([('code', 'mkdir mcp-on-gke && cd mcp-on-gke'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e05e0>)])]>

Now configure the Google Cloud credentials and set the active project.

- code_block
- <ListValue: [StructValue([('code', 'gcloud auth login\r\ngcloud config set project $PROJECT_ID'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0f70>)])]>

Initiate the GKE Autopilot cluster creation in the background. This process takes a few minutes, so starting it now allows the cluster to provision while you complete the rest of the setup. Make sure to use an Autopilot version that ensures [Cost-Optimized Compute (CCOP)](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-compute-classes) is enabled for fast autoscale.

- code_block
- <ListValue: [StructValue([('code', 'gcloud container clusters create-auto mcp-cluster \\\r\n --region $REGION \\\r\n --release-channel rapid \\\r\n --async'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0700>)])]>

Use `uv`

to create a project, which will generate a `pyproject.toml`

file.

- code_block
- <ListValue: [StructValue([('code', 'uv init'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0b20>)])]>

Next, create the additional files needed: `server.py`

for the MCP server code, `test_server.py`

for testing, and a `Dockerfile`

for the container deployment.

## Math MCP server

Large language models are excellent at non-deterministic tasks, such as generating text, summarizing ideas, and reasoning about concepts. However, they can be unreliable for deterministic tasks like math operations. To solve this, developers can create tools that provide valuable context. Using [FastMCP](https://gofastmcp.com/getting-started/welcome), a framework for building MCP servers in Python, it is possible to create a simple math server with two tools: add and subtract.

First, add FastMCP as a dependency.

- code_block
- <ListValue: [StructValue([('code', 'uv add fastmcp\r\nuv add asyncio'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0340>)])]>

Copy the following code into `server.py`

to create the server.

- code_block
- <ListValue: [StructValue([('code', 'from fastmcp import FastMCP\r\nfrom starlette.requests import Request\r\nfrom starlette.responses import PlainTextResponse\r\nimport asyncio\r\nimport logging\r\n\r\nlogger = logging.getLogger(__name__)\r\nlogging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)\r\n\r\nmcp_port=3000\r\n\r\n# Initialize the FastMCP server\r\nserver = FastMCP(\r\n "Math Server",\r\n)\r\n\r\n@server.tool()\r\ndef add(a: int, b: int) -> int:\r\n """Add two numbers together."""\r\n return a + b\r\n\r\n@server.tool()\r\ndef subtract(a: int, b: int) -> int:\r\n """Subtract the second number from the first."""\r\n return a - b\r\n\r\n@server.custom_route("/healthz", methods=["GET"])\r\nasync def health_check(request: Request) -> PlainTextResponse:\r\n """Simple health check endpoint that returns a 200 OK response"""\r\n return PlainTextResponse("OK")\r\n\r\nif __name__ == "__main__":\r\n logger.info(f" MCP server started on port {mcp_port}")\r\n # Could also use \'sse\' transport, host="0.0.0.0" required for Cloud Run.\r\n asyncio.run(\r\n server.run_async(\r\n transport="streamable-http", \r\n host="0.0.0.0",\r\n port=mcp_port\r\n )\r\n )'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0d90>)])]>

This example uses the `streamable-http`

transport, which is recommended for remote servers. The script encapsulates the logic needed to run a scalable MCP endpoint.

## Testing the MCP server locally

Create the `test_mcp_server.py`

script to connect to test the MCP Server. This will be useful to test the MCP server before deploying it to GKE.

- code_block
- <ListValue: [StructValue([('code', 'from fastmcp import Client, FastMCP\r\nimport asyncio\r\nimport logging\r\n\r\n# Connect to the remote MCP server\r\nclient = Client("https://localhost:3000/mcp")\r\n\r\nasync def test_remote_server():\r\n async with client:\r\n # Basic server interaction\r\n await client.ping()\r\n\r\n # List available operations\r\n tools = await client.list_tools()\r\n print(f"Available tools: {tools} \\n")\r\n\r\n # Execute add operation\r\n result = await client.call_tool("add", {"a": 5, "b": 3})\r\n print(f"Result of addition: {result} \\n")\r\n\r\n # Execute subtract operation\r\n result = await client.call_tool("subtract", {"a": 5, "b": 3})\r\n print(f"Result of subtraction: {result} \\n")\r\n\r\nif __name__ == "__main__":\r\n asyncio.run(test_remote_server())'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0970>)])]>

Run the MCP server locally to test the connection:

- code_block
- <ListValue: [StructValue([('code', 'uv run server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e04c0>)])]>

Then execute the test script in a new terminal to verify the connection.

- code_block
- <ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0fa0>)])]>

The output should print available tools and the results of invocing the `add`

and `subtract`

tools confirming the MCP server is functional.

## Building the container image

To speed up the deployment process, build the container image while the cluster is still creating.

First, prepare the `Dockerfile`

:

- code_block
- <ListValue: [StructValue([('code', 'FROM python:3.10-slim\r\nCOPY --from=ghcr.io/astral-sh/uv:0.4.15 /uv /bin/uv\r\nWORKDIR /app\r\nCOPY pyproject.toml .\r\nCOPY server.py .\r\nRUN uv sync\r\nCMD ["uv", "run", "server.py"]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f56158e0be0>)])]>

Now, set up the Artifact Registry and build the container image.

## Set up Artifact Registry

- code_block
- <ListValue: [StructValue([('code', 'gcloud artifacts repositories create mcp-repo \r\n--repository-format=docker \r\n--location=$REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5615afea60>)])]>

## Build and push the image in parallel

- code_block
- <ListValue: [StructValue([('code', 'gcloud builds submit --tag $REGION-docker.pkg.dev/$PROJECT_ID/mcp-repo/math-mcp-server:latest'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5615afe970>)])]>

Once the image build is complete, verify that the cluster is ready and retrieve the credentials. If the output of the cluster is not "RUNNING" wait for it to be ready.

- code_block
- <ListValue: [StructValue([('code', 'gcloud container clusters list\r\ngcloud container clusters get-credentials mcp-cluster --region $REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802fd0>)])]>

## Deploying to GKE with Gateway API and SSL

The next step involves deploying the server workloads and exposing them securely using the [Kubernetes Gateway API](https://cloud.google.com/kubernetes-engine/docs/how-to/gatewayclass-capabilities) rather than the legacy Ingress. This guarantees secure, encrypted traffic via SSL certificates.

Create a `deployment.yaml`

file to define the Kubernetes Deployment and Service. Replace the placeholders with your actual project ID and region.

- code_block
- <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: mcp-server\r\nspec:\r\n replicas: 2\r\n selector:\r\n matchLabels:\r\n app: mcp-server\r\n template:\r\n metadata:\r\n labels:\r\n app: mcp-server\r\n spec:\r\n containers:\r\n - name: mcp-server\r\n image: $REGION-docker.pkg.dev/$PROJECT_ID/mcp-repo/math-mcp-server:latest\r\n ports:\r\n - containerPort: 3000\r\n resources:\r\n requests:\r\n memory: "256Mi"\r\n cpu: "250m"\r\n limits:\r\n memory: "512Mi"\r\n cpu: "500m"\r\n livenessProbe:\r\n httpGet:\r\n path: /healthz\r\n port: 3000\r\n initialDelaySeconds: 15\r\n periodSeconds: 20\r\n readinessProbe:\r\n httpGet:\r\n path: /healthz\r\n port: 3000\r\n initialDelaySeconds: 5\r\n periodSeconds: 10\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n name: mcp-service\r\nspec:\r\n selector:\r\n app: mcp-server\r\n ports:\r\n - port: 80\r\n targetPort: 3000'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802dc0>)])]>

Apply this configuration to the cluster:

- code_block
- <ListValue: [StructValue([('code', 'kubectl apply -f deployment.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802f40>)])]>

Check the pods are up and running

- code_block
- <ListValue: [StructValue([('code', 'kubectl get pods'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5617802eb0>)])]>

To ensure our remote MCP Server is accessible let's try to reach it with a port-forward.

- code_block
- <ListValue: [StructValue([('code', 'kubectl port-forward svc/mcp-service 8080:80'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614967280>)])]>

Run the test script to verify the connection. make sure to edit the MCP Server URL in the test script to `http://localhost:8080/mcp`

.

- code_block
- <ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614967e80>)])]>

Now let's secure the connection. To do so, we'll use a Google-managed SSL certificate and attach it to a Gateway API resource. First, reserve a static IP address for your load balancer:

- code_block
- <ListValue: [StructValue([('code', 'gcloud compute addresses create mcp-server-ip --global\r\nexport MCP_SERVER_IP=$(gcloud compute addresses describe mcp-server-ip --global --format="value(address)")\r\necho "Your IP: $MCP_SERVER_IP"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614967850>)])]>

Point your domain's DNS `A`

record at `$MCP_SERVER_IP`

. Example: `mcp.yourdomain.com`

Create a Google-Managed Certificate. Replace `mcp.yourdomain.com`

with your actual domain.

- code_block
- <ListValue: [StructValue([('code', 'gcloud compute ssl-certificates create mcp-cert --domains mcp.yourdomain.com --global'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616ca6340>)])]>

Create a `gateway.yaml`

file to provision the load balancer and configure Transport Layer Security (TLS) termination.

- code_block
- <ListValue: [StructValue([('code', '# Gateway: HTTPS load balancer with the managed certificate and static IP\r\napiVersion: gateway.networking.k8s.io/v1beta1\r\nkind: Gateway\r\nmetadata:\r\n name: mcp-gateway\r\nspec:\r\n gatewayClassName: gke-l7-global-external-managed\r\n listeners:\r\n - name: https\r\n protocol: HTTPS\r\n port: 443\r\n tls:\r\n mode: Terminate\r\n options:\r\n networking.gke.io/pre-shared-certs: mcp-cert\r\n addresses:\r\n - type: NamedAddress\r\n value: mcp-server-ip\r\n---\r\n# HTTPRoute: forward traffic to the MCP Server\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: mcp-route\r\nspec:\r\n parentRefs:\r\n - name: mcp-gateway\r\n hostnames:\r\n - "mcp.yourdomain.com"\r\n rules:\r\n - matches:\r\n - path:\r\n type: PathPrefix\r\n value: /mcp\r\n backendRefs:\r\n - name: mcp-service\r\n port: 80\r\n---\r\n# The GCPBackendPolicy is used to configure session affinity and other backend.\r\n# Since MCP Servers are stateful we enable session affinity. This ensures that\r\n# requests from the same client are sent to the same backend.\r\napiVersion: networking.gke.io/v1\r\nkind: GCPBackendPolicy\r\nmetadata:\r\n name: mcp-backend-policy\r\nspec:\r\n default:\r\n sessionAffinity:\r\n type: CLIENT_IP\r\n targetRef:\r\n group: ""\r\n kind: Service\r\n name: mcp-service\r\n---\r\n# The HealthCheckPolicy is used to configure custom health probes for the MCP Server.\r\napiVersion: networking.gke.io/v1\r\nkind: HealthCheckPolicy\r\nmetadata:\r\n name: mcp-health\r\n namespace: default\r\nspec:\r\n default:\r\n checkIntervalSec: 15\r\n timeoutSec: 5\r\n healthyThreshold: 1\r\n unhealthyThreshold: 2\r\n logConfig:\r\n enabled: false\r\n config:\r\n type: HTTP\r\n httpHealthCheck:\r\n port: 3000\r\n requestPath: /healthz\r\n targetRef:\r\n group: ""\r\n kind: Service\r\n name: mcp-service'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5614655fd0>)])]>

Deploying this configuration creates the infrastructure required to route external traffic securely to the MCP server.

- code_block
- <ListValue: [StructValue([('code', 'kubectl apply -f gateway.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616d7c670>)])]>

Wait a few minutes for the load balancer to become active and the certificate to provision. Developers can check the status using `kubectl get gateway mcp-gateway`

.

Try to reach the remote MCP Server. Run the test script to verify the connection. make sure to edit the MCP Server URL in the test script to `https://mcp.yourdomain.com/mcp`

.

- code_block
- <ListValue: [StructValue([('code', 'uv run test_mcp_server.py'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616d7c850>)])]>

## Cleanup

- code_block
- <ListValue: [StructValue([('code', 'kubectl delete -f deployment.yaml\r\nkubectl delete -f gateway.yaml\r\ngcloud compute addresses delete mcp-server-ip --global\r\ngcloud compute ssl-certificates delete mcp-cert --global\r\ngcloud artifacts repositories delete mcp-repo --location=$REGION\r\ngcloud container clusters delete mcp-cluster --region $REGION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f5616d7c370>)])]>

Continue reading

Deploying Model Context Protocol servers to Kubernetes enables new use cases for integrated agents and AI workflows. To dive deeper into these capabilities, explore the following resources:
