# Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

> Source: <https://developer.nvidia.com/blog/deploy-a-production-ready-nvidia-ai-q-blueprint-on-oracle-cloud-infrastructure/>
> Published: 2026-06-26 19:00:45+00:00

AI agents have changed a lot in the last two years. The first could only answer one question at a time. Then came multi-turn chat, where the model could keep some context across a session. Today, we have long-horizon agents. Systems that plan many steps, split work between sub-agents, keep context across a long task, and run tools in a safe sandbox.

The [NVIDIA AI-Q Blueprint](https://build.nvidia.com/nvidia/aiq) is an open source reference for this kind of agent. It is built on [LangChain Deep Agents](https://docs.langchain.com/oss/python/integrations/providers/nvidia) and the [NVIDIA NeMo Agent Toolkit](https://github.com/NVIDIA/NeMo-Agent-Toolkit). You can use it for quick cited answers, or for longer research reports with sources.

This post shows you how to deploy AI-Q 2.0 on Oracle Cloud Infrastructure (OCI) using Terraform to create the OCI resources and Helm to install the workloads on OKE. By the end, you will have a working AI-Q endpoint in your own OCI tenancy, and one command to take it all down when you are done.

**Who this is for:** Developers and platform engineers comfortable with Kubernetes, Terraform, and the shell, and who want to run AI-Q on OCI rather than on a laptop.

**What you’ll learn:** How AI-Q’s multi-agent architecture maps to OCI services, plus the exact commands to provision, deploy, and open the blueprint from start to finish.

More background on the multi-agent architecture (such as intent router, shallow research agent, deep agent, planning sub-agent, researcher sub-agent), is on the [AI-Q product page](https://build.nvidia.com/nvidia/aiq) and the [NeMo Agent Toolkit docs](https://github.com/NVIDIA/NeMo-Agent-Toolkit).

**Prerequisites**

Make sure you have:

**OCI tenancy access** with a compartment you can deploy into, and enough service limits for:- OKE: One enhanced cluster and one node pool
- Block Volume: At least 10 GB (dynamically provisioned by the OKE CSI driver for the in-cluster PostgreSQL)
- Load Balancer: One flexible
- Vault: One vault plus secrets

**API keys:**- NGC API key from
[build.nvidia.com](https://build.nvidia.com/), format`nvapi-`

… used both as the NVIDIA inference key and to authenticate to the NGC container registry (`nvcr.io`

). - Tavily API key from
[tavily.com](https://tavily.com/), format`tvly-`

…

- NGC API key from
**Local tools:** terraform 1.5 or later,`kubectl`

1.28 or later,`helm`

3.x or later, the oci CLI set up with your API signing key**Some basic knowledge** of Kubernetes, Helm charts, Terraform, and the shell. LangChain or NeMo Agent Toolkit experience is nice to have, but not required.

## Architecture overview

AI-Q uses a multi-agent design. An **intent router** reads each user query and sends it to the right workflow.

The blueprint is built to be extensible. Every layer (models, tools, RAG backends, sub-agents, evaluators) can be swapped through YAML config or through the NeMo Agent Toolkit plugin system. We will use that extensibility in Parts 2 and 3 of this series.

## OCI deployment architecture

The deployment uses **Terraform** for the OCI resources and **Helm** for the Kubernetes workloads. This gives a clean split between infrastructure and application, and one `terraform destroy`

is enough to remove everything later.

Resource | Terraform module | Purpose |
|---|---|---|
| VCN, subnets, gateways, NSGs | `network` | Network isolation with public and OKE subnets |
| OKE cluster + node pool | `oke` | Kubernetes runtime (Enhanced cluster, VCN-native CNI) |
| OCI Load Balancer | `loadbalancer` | Public HTTP ingress on port 80, forwarding to NodePort 30080 |
| OCI Vault + secrets | `vault` | AES-256 encrypted storage for API keys and credentials |

*Table 1. OCI resources created by the Terraform modules in*

`deploy/terraform`

.The Helm chart installs three workloads on OKE:

**Backend**(`aiq-backend`

): A FastAPI-based agent server that runs the AI-Q workflow.**Frontend**(`aiq-frontend`

): A`next.js`

web UI exposed over NodePort 30080.**PostgreSQL**(`aiq-postgres`

): An in-cluster database for the job store, checkpoints, and summaries.

## Deployment steps

```
git clone https://github.com/oracle-samples/ai-q.git
cd ai-q/oke-samples/aiq-2.0
```

Total time: around 20 to 25 minutes. The full reference is in [aiq-2.0/README.md](https://github.com/oracle-samples/ai-q/blob/main/oke-samples/aiq-2.0/README.md).

**Step 1. Configure Terraform variables**

Copy the example file and edit it with your tenancy details:

```
cd deploy/terraform
cp terraform.tfvars.example terraform.tfvars
```

At minimum, set these variables in `terraform.tfvars`

:

`tenancy_ocid`

,`compartment_id`

,`region`

(for example`us-chicago-1`

)`user_ocid`

,`fingerprint`

,`private_key_path`

(same values as your`~/.oci/config`

)`db_admin_password`

, used to bootstrap the in-cluster PostgreSQL, stored in OCI Vault.`nvidia_api_key`

, your NVIDIA NGC key from[build.nvidia.com](https://build.nvidia.com/). Used for inference and to pull container images from`nvcr.io`

.`tavily_api_key`

, your Tavily key from[tavily.com](https://tavily.com/), for web search.

**Step 2. Create the infrastructure**

Initialize the providers, check the plan, and apply:

```
terraform init
terraform plan
terraform apply
```

This takes about 10 to 15 minutes. Terraform creates the VCN, OKE cluster, Load Balancer, and the Vault with the NGC and Tavily API keys encrypted at rest.

**Check: **`terraform output`

should show values for `oke_cluster_id`

and `lb_public_ip`

. If either is empty, run `terraform apply`

again – the apply is safe to repeat.

Capture the two values you’ll need in the next step:

```
export OKE_CLUSTER_ID="$(terraform output -raw oke_cluster_id)"
export LB_PUBLIC_IP="$(terraform output -raw lb_public_ip)"
```

**Step 3. Install AI-Q from the NGC Helm chart**

The chart and container images are published on NGC, so there’s nothing to build locally. We point `kubectl`

at the new OKE cluster, create the secrets the chart consumes, then `helm pull`

and `helm install`

.

**3a. Configure kubectl for the OKE cluster**

```
# configure kubectl for the OKE cluster

oci ce cluster create-kubeconfig \
  --cluster-id "$OKE_CLUSTER_ID" \
  --file ~/.kube/config \
  --region us-ashburn-1 \
  --token-version 2.0.0 \
  --kube-endpoint PUBLIC_ENDPOINT

# sanity check. nodes should be ready

kubectl get nodes
```

**3b. Export the API keys**

Reuse the same NGC and Tavily keys you put in `terraform.tfvars`

. The NGC key does double duty. It’s both the inference key and the `nvcr.io`

pull credential.

```
export NGC_API_KEY="nvapi-..."         # from build.nvidia.com
export TAVILY_API_KEY="tvly-..."       # from tavily.com
export DB_USER_PASSWORD="<same value as db_admin_password in Step 1>"
```

**3c. Create the namespace and secrets**

```
kubectl create namespace ns-aiq --dry-run=client -o yaml | kubectl apply -f -

# Application credentials (NVIDIA + Tavily inference, Postgres user)
kubectl create secret generic aiq-credentials -n ns-aiq \
  --from-literal=NVIDIA_API_KEY="$NGC_API_KEY" \
  --from-literal=TAVILY_API_KEY="$TAVILY_API_KEY" \
  --from-literal=DB_USER_NAME="aiq" \
  --from-literal=DB_USER_PASSWORD="$DB_USER_PASSWORD"

# Image-pull secret for nvcr.io (NGC container registry)
kubectl create secret docker-registry ngc-secret -n ns-aiq \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password="$NGC_API_KEY"
```

**3d. Pull and install the chart from NGC**

```
cd ../helm     # from deploy/terraform to deploy/helm

helm pull https://helm.ngc.nvidia.com/nvidia/blueprint/charts/aiq2-web-2.0.0.tgz \
  --username='$oauthtoken' \
  --password="$NGC_API_KEY"

helm upgrade --install aiq aiq2-web-2.0.0.tgz \
  -n ns-aiq \
  --wait --timeout 10m \
  -f values-oci-ngc.yaml
```

The OCI overlay (`values-oci-ngc.yaml`

) is intentionally tiny — it only pins the frontend service to NodePort 30080 (the port the OCI Load Balancer health-checks) and names the `ngc-secret`

image-pull secret. Image repositories, the Postgres init SQL, and the dynamically provisioned 10 Gi Block Volume PVC all come from the chart’s own defaults.

Check: `kubectl get pods -n ns-aiq`

should show `aiq-backend`

, `aiq-frontend`

, and `aiq-postgres`

pods in `Running`

state after 3 to 5 minutes.

**Step 4. Open AI-Q**

The LB IP is already in your shell from Step 2:

```
echo "http://$LB_PUBLIC_IP"
```

If you opened a new shell since then, re-export it from Terraform:

```
cd ../terraform

export LB_PUBLIC_IP="$(terraform output -raw lb_public_ip)"

echo "http://$LB_PUBLIC_IP"
```

Open `http://<lb_public_ip>`

in your browser. You should see the `AI-Q`

frontend.

Try a simple question first, for example, “What is the NeMo Agent Toolkit?”, to confirm the routing works. Then try a deeper one, for example, “Compare the top three open-source deep-research agents by benchmark score and cost”, to see the deep agent in action.

### Troubleshooting

`terraform apply`

**fails on**. Check the service limits for your compartment for “Cluster count” and “Node count”, and ask for more quota if needed.`OKE`

creation with a quota error**Pods stuck in**`ImagePullBackOff`

. Check that the image-pull secret was created (`kubectl get secret -n ns-aiq`

) and that your`NGC_API_KEY`

was correct when you ran the`kubectl create secret docker-registry ngc-secret`

command in Step 3c. To rotate, delete the secret and re-create it, then`kubectl rollout restart deployment -n ns-aiq aiq-backend aiq-frontend`

.`postgres`

**pod stays in**`Pending`

**for more than 2 minutes**. The Block Volume PVC didn’t get dynamically provisioned. Run`kubectl describe pvc -n ns-aiq`

. Typical causes are the OKE CSI driver not running, the default StorageClass missing, or insufficient Block Volume quota. Check the storage class with`kubectl get sc`

and your compartment’s Block Volume service limit.**Load Balancer IP comes back as**. OCI can take a minute or two after`null`

`Terraform`

to finish the`LB`

. Run`terraform refresh`

and then`terraform output lb_public_ip`

again.**Frontend loads but queries return**. Look at`500`

`kubectl logs -n ns-aiq deploy/aiq-backend`

. The most common cause is a wrong or missing`NVIDIA_API_KEY`

or`TAVILY_API_KEY`

in the`aiq-credentials`

secret you created in Step 3c.

### Learn more

You now have a working `AI-Q 2.0`

deployment on OCI, and one command (`terraform destroy`

) to remove it cleanly when you are done. A few things to keep in mind as you go further:

**Cost:** The OKE node pool and the Load Balancer keep costing you while they run. Destroy the stack between experiments, or scale the node pool down to zero.**Secrets:** Terraform stores the NGC and Tavily keys in OCI Vault at provision time (for audit and disaster recovery), but the running pods read them from the`aiq-credentials`

Kubernetes secret you created in Step 3c. To rotate, delete and re-create that secret with the new values, then`kubectl rollout restart deployment -n ns-aiq aiq-backend`

. Editing`terraform.tfvars`

alone won’t reach the pods.**Extensibility:** Everything you just deployed is driven by YAML and by the NeMo Agent Toolkit plugin system. Swapping an`LLM`

, adding a sub-agent, or plugging a new RAG backend is a configuration change, not a rewrite.

Clone the [AI-Q in OCI repo](https://github.com/oracle-samples/ai-q.git) and share on the [NVIDIA Developer Forum](https://forums.developer.nvidia.com/) the solution you built and what problem you solved.
