Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

AI agents have changed a lot in the last two years. The first could only answer one question at a time. Then came multi-turn chat, where the model could keep some context across a session. Today, we have long-horizon agents. Systems that plan many steps, split work between sub-agents, keep context across a long task, and run tools in a safe sandbox.

The NVIDIA AI-Q Blueprint is an open source reference for this kind of agent. It is built on LangChain Deep Agents and the NVIDIA NeMo Agent Toolkit. You can use it for quick cited answers, or for longer research reports with sources.

This post shows you how to deploy AI-Q 2.0 on Oracle Cloud Infrastructure (OCI) using Terraform to create the OCI resources and Helm to install the workloads on OKE. By the end, you will have a working AI-Q endpoint in your own OCI tenancy, and one command to take it all down when you are done.

Who this is for: Developers and platform engineers comfortable with Kubernetes, Terraform, and the shell, and who want to run AI-Q on OCI rather than on a laptop.

What you’ll learn: How AI-Q’s multi-agent architecture maps to OCI services, plus the exact commands to provision, deploy, and open the blueprint from start to finish.

More background on the multi-agent architecture (such as intent router, shallow research agent, deep agent, planning sub-agent, researcher sub-agent), is on the AI-Q product page and the NeMo Agent Toolkit docs.

Prerequisites

Make sure you have:

OCI tenancy access with a compartment you can deploy into, and enough service limits for:
- OKE: One enhanced cluster and one node pool
- Block Volume: At least 10 GB (dynamically provisioned by the OKE CSI driver for the in-cluster PostgreSQL)
- Load Balancer: One flexible
- Vault: One vault plus secrets
API keys:
- NGC API key from build.nvidia.com, format nvapi-… used both as the NVIDIA inference key and to authenticate to the NGC container registry (nvcr.io).
- Tavily API key from tavily.com, format tvly-…
Local tools: terraform 1.5 or later, kubectl 1.28 or later, helm 3.x or later, the oci CLI set up with your API signing key
Some basic knowledge of Kubernetes, Helm charts, Terraform, and the shell. LangChain or NeMo Agent Toolkit experience is nice to have, but not required.

Architecture overview

AI-Q uses a multi-agent design. An intent router reads each user query and sends it to the right workflow.

Diagram of the AI-Q multi-agent architecture. A user query goes into an intent router, which sends it to either a Shallow Research Agent or a Deep Agent. The Deep Agent has a Planning sub-agent and a Researcher sub-agent. They share a Filesystem layer (to-do lists, memory, file storage) and run skills like Data Analysis and Image Processing in isolated sandboxes. Data sources at the entry point include MCP, AI Data Platforms, Web Search, and user-uploaded documents. — Figure 1. The AI-Q multi-agent architecture. The intent router sends queries either to the Shallow Research Agent (fast, bounded tool-augmented search) or to the Deep Agent (a Planning sub-agent and a Researcher sub-agent that share a Filesystem layer and run skills in sandboxes)

The blueprint is built to be extensible. Every layer (models, tools, RAG backends, sub-agents, evaluators) can be swapped through YAML config or through the NeMo Agent Toolkit plugin system. We will use that extensibility in Parts 2 and 3 of this series.

OCI deployment architecture

The deployment uses Terraform for the OCI resources and Helm for the Kubernetes workloads. This gives a clean split between infrastructure and application, and one terraform destroy is enough to remove everything later.

Architecture diagram of the AI-Q deployment on OCI. A public OCI Load Balancer in front of an OKE cluster inside a VCN, with public and OKE subnets. The OKE cluster runs three workloads pulled from the NVIDIA NGC registry: an AI-Q backend (FastAPI), an AI-Q frontend (Next.js), and a PostgreSQL pod. OCI Vault stores the NGC and Tavily API keys at provision time. — *Figure 2. The AI-Q deployment on OCI. Terraform creates the VCN, OKE cluster, Load Balancer, and Vault. Helm installs the AI-Q backend, frontend, and PostgreSQL workloads on OKE.*

Resource	Terraform module	Purpose
VCN, subnets, gateways, NSGs	`network`	Network isolation with public and OKE subnets
OKE cluster + node pool	`oke`	Kubernetes runtime (Enhanced cluster, VCN-native CNI)
OCI Load Balancer	`loadbalancer`	Public HTTP ingress on port 80, forwarding to NodePort 30080
OCI Vault + secrets	`vault`	AES-256 encrypted storage for API keys and credentials

Table 1. OCI resources created by the Terraform modules in deploy/terraform.

The Helm chart installs three workloads on OKE:

Backend (aiq-backend): A FastAPI-based agent server that runs the AI-Q workflow.
Frontend (aiq-frontend): A next.js web UI exposed over NodePort 30080.
PostgreSQL (aiq-postgres): An in-cluster database for the job store, checkpoints, and summaries.

Deployment steps

git clone https://github.com/oracle-samples/ai-q.git
cd ai-q/oke-samples/aiq-2.0

Total time: around 20 to 25 minutes. The full reference is in aiq-2.0/README.md.

Step 1. Configure Terraform variables

Copy the example file and edit it with your tenancy details:

cd deploy/terraform
cp terraform.tfvars.example terraform.tfvars

At minimum, set these variables in terraform.tfvars:

tenancy_ocid, compartment_id, region (for example us-chicago-1)
user_ocid, fingerprint, private_key_path (same values as your ~/.oci/config)
db_admin_password, used to bootstrap the in-cluster PostgreSQL, stored in OCI Vault.
nvidia_api_key, your NVIDIA NGC key from build.nvidia.com. Used for inference and to pull container images from nvcr.io.
tavily_api_key, your Tavily key from tavily.com, for web search.

Step 2. Create the infrastructure

Initialize the providers, check the plan, and apply:

terraform init
terraform plan
terraform apply

This takes about 10 to 15 minutes. Terraform creates the VCN, OKE cluster, Load Balancer, and the Vault with the NGC and Tavily API keys encrypted at rest.

Check: terraform output should show values for oke_cluster_id and lb_public_ip. If either is empty, run terraform apply again – the apply is safe to repeat.

Capture the two values you’ll need in the next step:

export OKE_CLUSTER_ID="$(terraform output -raw oke_cluster_id)"
export LB_PUBLIC_IP="$(terraform output -raw lb_public_ip)"

Step 3. Install AI-Q from the NGC Helm chart

The chart and container images are published on NGC, so there’s nothing to build locally. We point kubectl at the new OKE cluster, create the secrets the chart consumes, then helm pull and helm install.

3a. Configure kubectl for the OKE cluster

# configure kubectl for the OKE cluster

oci ce cluster create-kubeconfig \
  --cluster-id "$OKE_CLUSTER_ID" \
  --file ~/.kube/config \
  --region us-ashburn-1 \
  --token-version 2.0.0 \
  --kube-endpoint PUBLIC_ENDPOINT

# sanity check. nodes should be ready

kubectl get nodes

3b. Export the API keys

Reuse the same NGC and Tavily keys you put in terraform.tfvars. The NGC key does double duty. It’s both the inference key and the nvcr.io pull credential.

export NGC_API_KEY="nvapi-..."         # from build.nvidia.com
export TAVILY_API_KEY="tvly-..."       # from tavily.com
export DB_USER_PASSWORD="<same value as db_admin_password in Step 1>"

3c. Create the namespace and secrets

kubectl create namespace ns-aiq --dry-run=client -o yaml | kubectl apply -f -

# Application credentials (NVIDIA + Tavily inference, Postgres user)
kubectl create secret generic aiq-credentials -n ns-aiq \
  --from-literal=NVIDIA_API_KEY="$NGC_API_KEY" \
  --from-literal=TAVILY_API_KEY="$TAVILY_API_KEY" \
  --from-literal=DB_USER_NAME="aiq" \
  --from-literal=DB_USER_PASSWORD="$DB_USER_PASSWORD"

# Image-pull secret for nvcr.io (NGC container registry)
kubectl create secret docker-registry ngc-secret -n ns-aiq \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password="$NGC_API_KEY"

3d. Pull and install the chart from NGC

cd ../helm     # from deploy/terraform to deploy/helm

helm pull https://helm.ngc.nvidia.com/nvidia/blueprint/charts/aiq2-web-2.0.0.tgz \
  --username='$oauthtoken' \
  --password="$NGC_API_KEY"

helm upgrade --install aiq aiq2-web-2.0.0.tgz \
  -n ns-aiq \
  --wait --timeout 10m \
  -f values-oci-ngc.yaml

The OCI overlay (values-oci-ngc.yaml) is intentionally tiny — it only pins the frontend service to NodePort 30080 (the port the OCI Load Balancer health-checks) and names the ngc-secret image-pull secret. Image repositories, the Postgres init SQL, and the dynamically provisioned 10 Gi Block Volume PVC all come from the chart’s own defaults.

Check: kubectl get pods -n ns-aiq should show aiq-backend, aiq-frontend, and aiq-postgres pods in Running state after 3 to 5 minutes.

Step 4. Open AI-Q

The LB IP is already in your shell from Step 2:

echo "http://$LB_PUBLIC_IP"

If you opened a new shell since then, re-export it from Terraform:

cd ../terraform

export LB_PUBLIC_IP="$(terraform output -raw lb_public_ip)"

echo "http://$LB_PUBLIC_IP"

Open http://<lb_public_ip> in your browser. You should see the AI-Q frontend.

Try a simple question first, for example, “What is the NeMo Agent Toolkit?”, to confirm the routing works. Then try a deeper one, for example, “Compare the top three open-source deep-research agents by benchmark score and cost”, to see the deep agent in action.

Troubleshooting

terraform apply fails on OKE creation with a quota error. Check the service limits for your compartment for “Cluster count” and “Node count”, and ask for more quota if needed.
Pods stuck in ImagePullBackOff. Check that the image-pull secret was created (kubectl get secret -n ns-aiq) and that your NGC_API_KEY was correct when you ran the kubectl create secret docker-registry ngc-secret command in Step 3c. To rotate, delete the secret and re-create it, then kubectl rollout restart deployment -n ns-aiq aiq-backend aiq-frontend.
postgres pod stays in Pending for more than 2 minutes. The Block Volume PVC didn’t get dynamically provisioned. Run kubectl describe pvc -n ns-aiq. Typical causes are the OKE CSI driver not running, the default StorageClass missing, or insufficient Block Volume quota. Check the storage class with kubectl get sc and your compartment’s Block Volume service limit.
Load Balancer IP comes back as null. OCI can take a minute or two after Terraform to finish the LB. Run terraform refresh and then terraform output lb_public_ip again.
Frontend loads but queries return 500. Look at kubectl logs -n ns-aiq deploy/aiq-backend. The most common cause is a wrong or missing NVIDIA_API_KEY or TAVILY_API_KEY in the aiq-credentials secret you created in Step 3c.

Learn more

You now have a working AI-Q 2.0 deployment on OCI, and one command (terraform destroy) to remove it cleanly when you are done. A few things to keep in mind as you go further:

Cost: The OKE node pool and the Load Balancer keep costing you while they run. Destroy the stack between experiments, or scale the node pool down to zero.
Secrets: Terraform stores the NGC and Tavily keys in OCI Vault at provision time (for audit and disaster recovery), but the running pods read them from the aiq-credentials Kubernetes secret you created in Step 3c. To rotate, delete and re-create that secret with the new values, then kubectl rollout restart deployment -n ns-aiq aiq-backend. Editing terraform.tfvars alone won’t reach the pods.
Extensibility: Everything you just deployed is driven by YAML and by the NeMo Agent Toolkit plugin system. Swapping an LLM, adding a sub-agent, or plugging a new RAG backend is a configuration change, not a rewrite.

Clone the AI-Q in OCI repo and share on the NVIDIA Developer Forum the solution you built and what problem you solved.