# How to securely connect ADK agents to models on Cloud Run

> Source: <https://leoy.blog/posts/securely-connect-agents-to-models/>
> Published: 2026-05-29 21:23:30+00:00

# How to Securely Connect ADK Agents to Models on Cloud Run

The Agent Development Kit (ADK) simplifies authentication for agents and tools, but is more challenging with the LiteLLM connector when accessing models hosted on Cloud Run. This guide explores how to acquire Google-signed OpenID (ID) tokens and inject them into the LiteLLM communication channel using ADK.

Google Cloud Run provides a robust, built-in access control mechanism based on enforced authentication and IAM policies. When it is enabled, only calls that are made by authenticated accounts which have the specific Cloud Run *Invoker* role, are accepted, protecting your service from unauthorized invocations.

To implement the authenticated call you have to implement the following steps:

- Acquire credentials either by implementing a sign-in process or using
[Application Default Credentials](https://docs.cloud.google.com/docs/authentication/application-default-credentials) [Fetch an ID token](https://leoy.blog/posts/securely-call-cloud-run-service-from-anywhere/)from a user or[workload identity](https://docs.cloud.google.com/iam/docs/workload-identities)- Use the identity token as a bearer token in the
[HTTP Authorization header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Authorization)when you call Cloud Run endpoint

Agent Development Kit (ADK) greatly simplifies implementation of these steps for you. When your agent calls another agent or an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro) server that runs on Cloud Run, the framework handles discovering the application’s default credentials, fetching the token, and injecting the token into each call between agents and calls from agents to remote MCP tools. The framework also implements token refreshing when the current token is expired. As a developer, you don’t need to implement anything beyond configuring your remote MCP server or agent objects.

The situation is different when you configure an ADK agent to use a model that is deployed in Cloud Run. ADK provides a [LiteLLM connector](https://adk.dev/agents/models/litellm/) that allows an agent to use non-Gemini models hosted at remote endpoints. However, you will need to take care of making authenticated calls to the model yourself. How can you do that?

## Method 1: Static header

The LiteLLM connector uses the `litellm`

Python package to call the remote endpoints exposing Ollama, vLLM and other LLM engines. The package supports passing custom HTTP headers in the calls to LiteLLM APIs like `acompletion`

, using the `external_headers`

parameter, which is set to the map of header’s names and values.

``` python
from google.adk.agents import Agent
from google.adk.models import LiteLlm
import google.auth
import google.auth.transport.requests
from google.oauth2 import id_token

# obtains a Google-signed ID token for the given audience (Cloud Run service URL).
aud = "https://model-123456789012.us-central1.run.app"
creds, _ = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
if hasattr(creds, "id_token") and creds.id_token:
   token = creds.id_token
else:
   token = id_token.fetch_id_token(auth_req, aud)

# set up the model
model = LiteLlm(
  model=f"ollama_chat/gemma3:270m", 
  api_base=aud,
  extra_headers={
        "Authorization": f"Bearer {token}",
  },
)

agent = Agent(
    name="content_builder",
    model=model,
    instruction="agent system instructions",
)
```

This approach works as long as the `token`

is valid. Once the fetched token is expired the calls to the model will fail with the “HTTP 401 Unauthorized” status code because Cloud Run will block calls made with the expired token. This method would fit for agents deployed on Cloud Run with [scale to zero configuration](https://docs.cloud.google.com/run/docs/about-instance-autoscaling) and having a low invocation frequency. In this deployment pattern the agent will be frequently restarted which will lead to fetching a new authentication token.

## Method 2: Dynamic token injection

When the agent is expected to run continuously, the token has to be refreshed upon receiving an error. To implement this, you would need to extend the LiteLLMClient class of ADK:

``` python
import os
from typing import Any, Optional, Union
from fastapi import HTTPException
from google.adk.models.lite_llm import LiteLLMClient
import google.auth
import google.auth.transport.requests
from google.oauth2 import id_token
from litellm.exceptions import AuthenticationError
from litellm import CustomStreamWrapper, ModelResponse

_creds, _ = google.auth.default()

def _get_auth_token(aud: str) -> Optional[str]:
  """
  Obtains a Google-signed ID token for the given audience (Cloud Run service URL).
  """
    
  try:
    auth_req = google.auth.transport.requests.Request()
    # support using user credentials for local development
    _creds.refresh(auth_req)
    if hasattr(_creds, "id_token") and _creds.id_token:
      return _creds.id_token
    # fetch token for service account credentials
    fetched_id_token = id_token.fetch_id_token(auth_req, aud)
    return fetched_id_token
  except Exception as e:
    print(f"Error obtaining ID token for {aud}: {e}")
    return None

class LiteLLMClientEx(LiteLLMClient):
  """
  Overrides the LiteLLMClient to inject a bearer token into the request headers.
  """

  def __init__(self, audience: str, **data: Any) -> None:
    self.token: Optional[str] = None
    self.aud = audience
    super().__init__(**data)

  async def acompletion(
      self, model, messages, tools, **kwargs
  ) -> Union[ModelResponse, CustomStreamWrapper]:
    """
    Updating headers to inject bearer token.
    """

    self.token = self.token or _get_auth_token(self.aud)
    if "extra_headers" not in kwargs or kwargs["extra_headers"] is None:
      kwargs["extra_headers"] = {}
    kwargs["extra_headers"]["Authorization"] = f"Bearer {self.token}"

    # Ensure we call the parent class acompletion to preserve ADK internal logic
    try:
      return await super().acompletion(
          model=model, messages=messages, tools=tools, **kwargs
      )
    except (AuthenticationError, HTTPException) as e:
      if isinstance(e, HTTPException) and e.status_code != 401:
        raise
      # refresh the token
      self.token = _get_auth_token(self.aud)
      kwargs["extra_headers"]["Authorization"] = f"Bearer {self.token}"
      return await super().acompletion(
          model=model, messages=messages, tools=tools, **kwargs
      )
```

Then use the extended `LiteLLMClientEx`

when initializing the `LiteLLM`

connector:

```
model = LiteLlm(
  model=f"ollama_chat/gemma3:270m", 
  api_base="https://model-123456789012.us-central1.run.app",
  llm_client=LiteLLMClientEx(
                audience="https://model-123456789012.us-central1.run.app"),
)
```

Note that the exception handler in the `LiteLLMClientEx`

code snippet that handles the HTTP 401 status error is simplified for use in this post. Cloud Run returns HTTP 401 Unauthorized in multiple scenarios including mismatched audience of the token, wrong token type or malformed token and more. The recommended way to distinguish between expired token and other scenarios is to examine the `WWW-Authenticate`

header for the error description.

## Method 3: Using litellm-proxy

If you cannot modify the agent code or Method 2 does not work for you for other reasons, you can use a [litellm-proxy configuration](https://docs.litellm.ai/docs/proxy/custom_auth) to automatically populate the `Authorization`

header of the proxied requests. The litellm proxy can be [deployed as a sidecar container](https://docs.cloud.google.com/run/docs/deploying#sidecars) alongside your agent, offloading the authentication logic from your primary agent code.

Once configured, you will use the local endpoint with the proxy port in the `LiteLLM`

connector configuration without need for any additional modifications of the agent’s code.

```
model = LiteLlm(
  model=f"ollama_chat/gemma3:270m", 
  api_base="http://0.0.0.0:4000",
)
```

## What’s next

Wrapping up, there are three different methods to implement injection of ID token into LiteLLM connector communication with a model hosted on Cloud Run. Choosing the best one for you is simple.

**Do you expect your agent instance to run less than one hour before being terminated?** If you deploy your agent on Cloud Run with autoscale down to zero, and expect having low request frequency, your instance will be [shut down after being idle](https://docs.cloud.google.com/run/docs/about-instance-autoscaling#idle-instance). In this scenario you can use **Method 1**. It is simple and will guarantee that the fetched ID token will not expire until the instance is shut down.

**Do you need to centralize authentication logic for calling your model’s endpoint outside of agents?** If you plan to run more than one agent that use the same model or there is a decision not to add the model authentication logic into the agent, you should use **Method 3** - deploy the LiteLLM proxy as a sidecar to each agent or as a standalone service and to configure agents to use the proxy’s endpoint to access the model. One subtle nuance of running the proxy as a standalone Cloud Run service is that you will need to implement authentication logic to call that service which is effectively the same as implementing Method 2.

If you answered “No” to both of the above questions, your choice should be **Method 2**.

Here are a few useful references that you can use to move forward with implementing authentication:

- Take codelabs
[Building a Multi-Agent System using Gemini and Gemma models](https://codelabs.developers.google.com/next26/multi-agent-system?hl=en#0)and[Create a Cloud Run service with a sidecar](https://codelabs.developers.google.com/codelabs/create-cloud-run-service-sidebar-localhost-volume-mount#0) - Explore Method 2 implementation in the
[Multi-Agent Course Creator with Self-Hosted LLM on GPU](https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/agents/multi-agent-system/agents/content_builder)Github repo. - Read a blog post
[Securely Call Cloud Run Service From Anywhere](https://leoy.blog/posts/securely-call-cloud-run-service-from-anywhere/)
