{"slug": "how-to-securely-connect-adk-agents-to-models-on-cloud-run", "title": "How to securely connect ADK agents to models on Cloud Run", "summary": "Google Cloud Run's built-in authentication and IAM policies protect model endpoints from unauthorized access, but the Agent Development Kit's LiteLLM connector requires manual handling of authenticated calls when connecting to models hosted on the service. Developers must acquire Google-signed OpenID tokens and inject them as bearer tokens in HTTP Authorization headers, with static headers only working for low-frequency deployments where token expiration is not an issue. The ADK framework automatically handles token fetching and refreshing for agent-to-agent and MCP server calls, but not for model connections through LiteLLM.", "body_md": "# How to Securely Connect ADK Agents to Models on Cloud Run\n\nThe Agent Development Kit (ADK) simplifies authentication for agents and tools, but is more challenging with the LiteLLM connector when accessing models hosted on Cloud Run. This guide explores how to acquire Google-signed OpenID (ID) tokens and inject them into the LiteLLM communication channel using ADK.\n\nGoogle Cloud Run provides a robust, built-in access control mechanism based on enforced authentication and IAM policies. When it is enabled, only calls that are made by authenticated accounts which have the specific Cloud Run *Invoker* role, are accepted, protecting your service from unauthorized invocations.\n\nTo implement the authenticated call you have to implement the following steps:\n\n- Acquire credentials either by implementing a sign-in process or using\n[Application Default Credentials](https://docs.cloud.google.com/docs/authentication/application-default-credentials) [Fetch an ID token](https://leoy.blog/posts/securely-call-cloud-run-service-from-anywhere/)from a user or[workload identity](https://docs.cloud.google.com/iam/docs/workload-identities)- Use the identity token as a bearer token in the\n[HTTP Authorization header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Authorization)when you call Cloud Run endpoint\n\nAgent Development Kit (ADK) greatly simplifies implementation of these steps for you. When your agent calls another agent or an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro) server that runs on Cloud Run, the framework handles discovering the application’s default credentials, fetching the token, and injecting the token into each call between agents and calls from agents to remote MCP tools. The framework also implements token refreshing when the current token is expired. As a developer, you don’t need to implement anything beyond configuring your remote MCP server or agent objects.\n\nThe situation is different when you configure an ADK agent to use a model that is deployed in Cloud Run. ADK provides a [LiteLLM connector](https://adk.dev/agents/models/litellm/) that allows an agent to use non-Gemini models hosted at remote endpoints. However, you will need to take care of making authenticated calls to the model yourself. How can you do that?\n\n## Method 1: Static header\n\nThe LiteLLM connector uses the `litellm`\n\nPython package to call the remote endpoints exposing Ollama, vLLM and other LLM engines. The package supports passing custom HTTP headers in the calls to LiteLLM APIs like `acompletion`\n\n, using the `external_headers`\n\nparameter, which is set to the map of header’s names and values.\n\n``` python\nfrom google.adk.agents import Agent\nfrom google.adk.models import LiteLlm\nimport google.auth\nimport google.auth.transport.requests\nfrom google.oauth2 import id_token\n\n# obtains a Google-signed ID token for the given audience (Cloud Run service URL).\naud = \"https://model-123456789012.us-central1.run.app\"\ncreds, _ = google.auth.default()\nauth_req = google.auth.transport.requests.Request()\ncreds.refresh(auth_req)\nif hasattr(creds, \"id_token\") and creds.id_token:\n   token = creds.id_token\nelse:\n   token = id_token.fetch_id_token(auth_req, aud)\n\n# set up the model\nmodel = LiteLlm(\n  model=f\"ollama_chat/gemma3:270m\", \n  api_base=aud,\n  extra_headers={\n        \"Authorization\": f\"Bearer {token}\",\n  },\n)\n\nagent = Agent(\n    name=\"content_builder\",\n    model=model,\n    instruction=\"agent system instructions\",\n)\n```\n\nThis approach works as long as the `token`\n\nis valid. Once the fetched token is expired the calls to the model will fail with the “HTTP 401 Unauthorized” status code because Cloud Run will block calls made with the expired token. This method would fit for agents deployed on Cloud Run with [scale to zero configuration](https://docs.cloud.google.com/run/docs/about-instance-autoscaling) and having a low invocation frequency. In this deployment pattern the agent will be frequently restarted which will lead to fetching a new authentication token.\n\n## Method 2: Dynamic token injection\n\nWhen the agent is expected to run continuously, the token has to be refreshed upon receiving an error. To implement this, you would need to extend the LiteLLMClient class of ADK:\n\n``` python\nimport os\nfrom typing import Any, Optional, Union\nfrom fastapi import HTTPException\nfrom google.adk.models.lite_llm import LiteLLMClient\nimport google.auth\nimport google.auth.transport.requests\nfrom google.oauth2 import id_token\nfrom litellm.exceptions import AuthenticationError\nfrom litellm import CustomStreamWrapper, ModelResponse\n\n_creds, _ = google.auth.default()\n\ndef _get_auth_token(aud: str) -> Optional[str]:\n  \"\"\"\n  Obtains a Google-signed ID token for the given audience (Cloud Run service URL).\n  \"\"\"\n    \n  try:\n    auth_req = google.auth.transport.requests.Request()\n    # support using user credentials for local development\n    _creds.refresh(auth_req)\n    if hasattr(_creds, \"id_token\") and _creds.id_token:\n      return _creds.id_token\n    # fetch token for service account credentials\n    fetched_id_token = id_token.fetch_id_token(auth_req, aud)\n    return fetched_id_token\n  except Exception as e:\n    print(f\"Error obtaining ID token for {aud}: {e}\")\n    return None\n\nclass LiteLLMClientEx(LiteLLMClient):\n  \"\"\"\n  Overrides the LiteLLMClient to inject a bearer token into the request headers.\n  \"\"\"\n\n  def __init__(self, audience: str, **data: Any) -> None:\n    self.token: Optional[str] = None\n    self.aud = audience\n    super().__init__(**data)\n\n  async def acompletion(\n      self, model, messages, tools, **kwargs\n  ) -> Union[ModelResponse, CustomStreamWrapper]:\n    \"\"\"\n    Updating headers to inject bearer token.\n    \"\"\"\n\n    self.token = self.token or _get_auth_token(self.aud)\n    if \"extra_headers\" not in kwargs or kwargs[\"extra_headers\"] is None:\n      kwargs[\"extra_headers\"] = {}\n    kwargs[\"extra_headers\"][\"Authorization\"] = f\"Bearer {self.token}\"\n\n    # Ensure we call the parent class acompletion to preserve ADK internal logic\n    try:\n      return await super().acompletion(\n          model=model, messages=messages, tools=tools, **kwargs\n      )\n    except (AuthenticationError, HTTPException) as e:\n      if isinstance(e, HTTPException) and e.status_code != 401:\n        raise\n      # refresh the token\n      self.token = _get_auth_token(self.aud)\n      kwargs[\"extra_headers\"][\"Authorization\"] = f\"Bearer {self.token}\"\n      return await super().acompletion(\n          model=model, messages=messages, tools=tools, **kwargs\n      )\n```\n\nThen use the extended `LiteLLMClientEx`\n\nwhen initializing the `LiteLLM`\n\nconnector:\n\n```\nmodel = LiteLlm(\n  model=f\"ollama_chat/gemma3:270m\", \n  api_base=\"https://model-123456789012.us-central1.run.app\",\n  llm_client=LiteLLMClientEx(\n                audience=\"https://model-123456789012.us-central1.run.app\"),\n)\n```\n\nNote that the exception handler in the `LiteLLMClientEx`\n\ncode snippet that handles the HTTP 401 status error is simplified for use in this post. Cloud Run returns HTTP 401 Unauthorized in multiple scenarios including mismatched audience of the token, wrong token type or malformed token and more. The recommended way to distinguish between expired token and other scenarios is to examine the `WWW-Authenticate`\n\nheader for the error description.\n\n## Method 3: Using litellm-proxy\n\nIf you cannot modify the agent code or Method 2 does not work for you for other reasons, you can use a [litellm-proxy configuration](https://docs.litellm.ai/docs/proxy/custom_auth) to automatically populate the `Authorization`\n\nheader of the proxied requests. The litellm proxy can be [deployed as a sidecar container](https://docs.cloud.google.com/run/docs/deploying#sidecars) alongside your agent, offloading the authentication logic from your primary agent code.\n\nOnce configured, you will use the local endpoint with the proxy port in the `LiteLLM`\n\nconnector configuration without need for any additional modifications of the agent’s code.\n\n```\nmodel = LiteLlm(\n  model=f\"ollama_chat/gemma3:270m\", \n  api_base=\"http://0.0.0.0:4000\",\n)\n```\n\n## What’s next\n\nWrapping up, there are three different methods to implement injection of ID token into LiteLLM connector communication with a model hosted on Cloud Run. Choosing the best one for you is simple.\n\n**Do you expect your agent instance to run less than one hour before being terminated?** If you deploy your agent on Cloud Run with autoscale down to zero, and expect having low request frequency, your instance will be [shut down after being idle](https://docs.cloud.google.com/run/docs/about-instance-autoscaling#idle-instance). In this scenario you can use **Method 1**. It is simple and will guarantee that the fetched ID token will not expire until the instance is shut down.\n\n**Do you need to centralize authentication logic for calling your model’s endpoint outside of agents?** If you plan to run more than one agent that use the same model or there is a decision not to add the model authentication logic into the agent, you should use **Method 3** - deploy the LiteLLM proxy as a sidecar to each agent or as a standalone service and to configure agents to use the proxy’s endpoint to access the model. One subtle nuance of running the proxy as a standalone Cloud Run service is that you will need to implement authentication logic to call that service which is effectively the same as implementing Method 2.\n\nIf you answered “No” to both of the above questions, your choice should be **Method 2**.\n\nHere are a few useful references that you can use to move forward with implementing authentication:\n\n- Take codelabs\n[Building a Multi-Agent System using Gemini and Gemma models](https://codelabs.developers.google.com/next26/multi-agent-system?hl=en#0)and[Create a Cloud Run service with a sidecar](https://codelabs.developers.google.com/codelabs/create-cloud-run-service-sidebar-localhost-volume-mount#0) - Explore Method 2 implementation in the\n[Multi-Agent Course Creator with Self-Hosted LLM on GPU](https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/agents/multi-agent-system/agents/content_builder)Github repo. - Read a blog post\n[Securely Call Cloud Run Service From Anywhere](https://leoy.blog/posts/securely-call-cloud-run-service-from-anywhere/)", "url": "https://wpnews.pro/news/how-to-securely-connect-adk-agents-to-models-on-cloud-run", "canonical_source": "https://leoy.blog/posts/securely-connect-agents-to-models/", "published_at": "2026-05-29 21:23:30+00:00", "updated_at": "2026-05-29 21:46:52.119778+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "large-language-models", "artificial-intelligence"], "entities": ["Agent Development Kit", "LiteLLM", "Google Cloud Run", "IAM", "MCP", "Model Context Protocol", "Application Default Credentials", "OpenID"], "alternates": {"html": "https://wpnews.pro/news/how-to-securely-connect-adk-agents-to-models-on-cloud-run", "markdown": "https://wpnews.pro/news/how-to-securely-connect-adk-agents-to-models-on-cloud-run.md", "text": "https://wpnews.pro/news/how-to-securely-connect-adk-agents-to-models-on-cloud-run.txt", "jsonld": "https://wpnews.pro/news/how-to-securely-connect-adk-agents-to-models-on-cloud-run.jsonld"}}