How to securely connect ADK agents to models on Cloud Run Google Cloud Run's built-in authentication and IAM policies protect model endpoints from unauthorized access, but the Agent Development Kit's LiteLLM connector requires manual handling of authenticated calls when connecting to models hosted on the service. Developers must acquire Google-signed OpenID tokens and inject them as bearer tokens in HTTP Authorization headers, with static headers only working for low-frequency deployments where token expiration is not an issue. The ADK framework automatically handles token fetching and refreshing for agent-to-agent and MCP server calls, but not for model connections through LiteLLM. How to Securely Connect ADK Agents to Models on Cloud Run The Agent Development Kit ADK simplifies authentication for agents and tools, but is more challenging with the LiteLLM connector when accessing models hosted on Cloud Run. This guide explores how to acquire Google-signed OpenID ID tokens and inject them into the LiteLLM communication channel using ADK. Google Cloud Run provides a robust, built-in access control mechanism based on enforced authentication and IAM policies. When it is enabled, only calls that are made by authenticated accounts which have the specific Cloud Run Invoker role, are accepted, protecting your service from unauthorized invocations. To implement the authenticated call you have to implement the following steps: - Acquire credentials either by implementing a sign-in process or using Application Default Credentials https://docs.cloud.google.com/docs/authentication/application-default-credentials Fetch an ID token https://leoy.blog/posts/securely-call-cloud-run-service-from-anywhere/ from a user or workload identity https://docs.cloud.google.com/iam/docs/workload-identities - Use the identity token as a bearer token in the HTTP Authorization header https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Authorization when you call Cloud Run endpoint Agent Development Kit ADK greatly simplifies implementation of these steps for you. When your agent calls another agent or an MCP Model Context Protocol https://modelcontextprotocol.io/docs/getting-started/intro server that runs on Cloud Run, the framework handles discovering the application’s default credentials, fetching the token, and injecting the token into each call between agents and calls from agents to remote MCP tools. The framework also implements token refreshing when the current token is expired. As a developer, you don’t need to implement anything beyond configuring your remote MCP server or agent objects. The situation is different when you configure an ADK agent to use a model that is deployed in Cloud Run. ADK provides a LiteLLM connector https://adk.dev/agents/models/litellm/ that allows an agent to use non-Gemini models hosted at remote endpoints. However, you will need to take care of making authenticated calls to the model yourself. How can you do that? Method 1: Static header The LiteLLM connector uses the litellm Python package to call the remote endpoints exposing Ollama, vLLM and other LLM engines. The package supports passing custom HTTP headers in the calls to LiteLLM APIs like acompletion , using the external headers parameter, which is set to the map of header’s names and values. python from google.adk.agents import Agent from google.adk.models import LiteLlm import google.auth import google.auth.transport.requests from google.oauth2 import id token obtains a Google-signed ID token for the given audience Cloud Run service URL . aud = "https://model-123456789012.us-central1.run.app" creds, = google.auth.default auth req = google.auth.transport.requests.Request creds.refresh auth req if hasattr creds, "id token" and creds.id token: token = creds.id token else: token = id token.fetch id token auth req, aud set up the model model = LiteLlm model=f"ollama chat/gemma3:270m", api base=aud, extra headers={ "Authorization": f"Bearer {token}", }, agent = Agent name="content builder", model=model, instruction="agent system instructions", This approach works as long as the token is valid. Once the fetched token is expired the calls to the model will fail with the “HTTP 401 Unauthorized” status code because Cloud Run will block calls made with the expired token. This method would fit for agents deployed on Cloud Run with scale to zero configuration https://docs.cloud.google.com/run/docs/about-instance-autoscaling and having a low invocation frequency. In this deployment pattern the agent will be frequently restarted which will lead to fetching a new authentication token. Method 2: Dynamic token injection When the agent is expected to run continuously, the token has to be refreshed upon receiving an error. To implement this, you would need to extend the LiteLLMClient class of ADK: python import os from typing import Any, Optional, Union from fastapi import HTTPException from google.adk.models.lite llm import LiteLLMClient import google.auth import google.auth.transport.requests from google.oauth2 import id token from litellm.exceptions import AuthenticationError from litellm import CustomStreamWrapper, ModelResponse creds, = google.auth.default def get auth token aud: str - Optional str : """ Obtains a Google-signed ID token for the given audience Cloud Run service URL . """ try: auth req = google.auth.transport.requests.Request support using user credentials for local development creds.refresh auth req if hasattr creds, "id token" and creds.id token: return creds.id token fetch token for service account credentials fetched id token = id token.fetch id token auth req, aud return fetched id token except Exception as e: print f"Error obtaining ID token for {aud}: {e}" return None class LiteLLMClientEx LiteLLMClient : """ Overrides the LiteLLMClient to inject a bearer token into the request headers. """ def init self, audience: str, data: Any - None: self.token: Optional str = None self.aud = audience super . init data async def acompletion self, model, messages, tools, kwargs - Union ModelResponse, CustomStreamWrapper : """ Updating headers to inject bearer token. """ self.token = self.token or get auth token self.aud if "extra headers" not in kwargs or kwargs "extra headers" is None: kwargs "extra headers" = {} kwargs "extra headers" "Authorization" = f"Bearer {self.token}" Ensure we call the parent class acompletion to preserve ADK internal logic try: return await super .acompletion model=model, messages=messages, tools=tools, kwargs except AuthenticationError, HTTPException as e: if isinstance e, HTTPException and e.status code = 401: raise refresh the token self.token = get auth token self.aud kwargs "extra headers" "Authorization" = f"Bearer {self.token}" return await super .acompletion model=model, messages=messages, tools=tools, kwargs Then use the extended LiteLLMClientEx when initializing the LiteLLM connector: model = LiteLlm model=f"ollama chat/gemma3:270m", api base="https://model-123456789012.us-central1.run.app", llm client=LiteLLMClientEx audience="https://model-123456789012.us-central1.run.app" , Note that the exception handler in the LiteLLMClientEx code snippet that handles the HTTP 401 status error is simplified for use in this post. Cloud Run returns HTTP 401 Unauthorized in multiple scenarios including mismatched audience of the token, wrong token type or malformed token and more. The recommended way to distinguish between expired token and other scenarios is to examine the WWW-Authenticate header for the error description. Method 3: Using litellm-proxy If you cannot modify the agent code or Method 2 does not work for you for other reasons, you can use a litellm-proxy configuration https://docs.litellm.ai/docs/proxy/custom auth to automatically populate the Authorization header of the proxied requests. The litellm proxy can be deployed as a sidecar container https://docs.cloud.google.com/run/docs/deploying sidecars alongside your agent, offloading the authentication logic from your primary agent code. Once configured, you will use the local endpoint with the proxy port in the LiteLLM connector configuration without need for any additional modifications of the agent’s code. model = LiteLlm model=f"ollama chat/gemma3:270m", api base="http://0.0.0.0:4000", What’s next Wrapping up, there are three different methods to implement injection of ID token into LiteLLM connector communication with a model hosted on Cloud Run. Choosing the best one for you is simple. Do you expect your agent instance to run less than one hour before being terminated? If you deploy your agent on Cloud Run with autoscale down to zero, and expect having low request frequency, your instance will be shut down after being idle https://docs.cloud.google.com/run/docs/about-instance-autoscaling idle-instance . In this scenario you can use Method 1 . It is simple and will guarantee that the fetched ID token will not expire until the instance is shut down. Do you need to centralize authentication logic for calling your model’s endpoint outside of agents? If you plan to run more than one agent that use the same model or there is a decision not to add the model authentication logic into the agent, you should use Method 3 - deploy the LiteLLM proxy as a sidecar to each agent or as a standalone service and to configure agents to use the proxy’s endpoint to access the model. One subtle nuance of running the proxy as a standalone Cloud Run service is that you will need to implement authentication logic to call that service which is effectively the same as implementing Method 2. If you answered “No” to both of the above questions, your choice should be Method 2 . Here are a few useful references that you can use to move forward with implementing authentication: - Take codelabs Building a Multi-Agent System using Gemini and Gemma models https://codelabs.developers.google.com/next26/multi-agent-system?hl=en 0 and Create a Cloud Run service with a sidecar https://codelabs.developers.google.com/codelabs/create-cloud-run-service-sidebar-localhost-volume-mount 0 - Explore Method 2 implementation in the Multi-Agent Course Creator with Self-Hosted LLM on GPU https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/agents/multi-agent-system/agents/content builder Github repo. - Read a blog post Securely Call Cloud Run Service From Anywhere https://leoy.blog/posts/securely-call-cloud-run-service-from-anywhere/