{"slug": "cutting-idle-agent-costs-by-90-with-agent-substrate", "title": "Cutting Idle Agent Costs by 90% with Agent Substrate", "summary": "Agent Substrate, a new platform for running AI agents, reduces idle agent costs by 90% compared to traditional Kubernetes Pod deployments. By using an actor model with checkpoint/restore and gVisor, it enables efficient resource sharing across multiple agents per worker, significantly lowering hardware requirements. The project provides a demo and CLI tool for users to test the cost savings.", "body_md": "Cost is everything. In just about every agentic conversation, the three things that come up for enterprises implementing AI workloads are:\n\nand as AI continues to throw everyone for a loop when it comes to cost management (e.g - Uber running out of the yearly token budget in one quarter), the ability to shrink resource (like hardware) usage will be crucial moving forward.\n\nIn this blog post, you will learn how to cust costs by 90% using Agent Susbtrate in comparison to Agents running in k8s Deployments/Pods.\n\nAgents need a place to run. The \"place to run\" needs to be a platform that's easily managed, orchestrated, and has the ability to cluster resources. Resources like CPU, GPU, and memory need to be able to scale and expand. Without this, it's a matter of manually managing servers that Agents are running on and clients to interact with said server.\n\nThat's why so many organizations choose Kubernetes to run Agentic.\n\nWhen running Agents per Pod, however, that can get costly very quick in terms of hardware (GPU, CPU, memory) and performance (can your cluster scale up and down quickly based on resource needs when it comes to Agents coming up and going down per use?).\n\nThe tests in this blog post show:\n\nAnd the comparison will be 50 always-on Pods in comparison to 50 Actors across 5-7 Workers (Pods). If there are 50 Agents running per Pod and 50 Agents running per Worker with 5-10 Actors per Pod, you can already imagine the hardware resource savings that can be accomplished.\n\nRight now, the majority of organizations start off with the \"one Agent per Pod\" approach as that's the fastest way to show value and get up and running. For the future, however, Agents in Actors via Agent Substrate will be how organizations deploy when they care about efficiency, optimization, and managing cost.\n\nLet's dive in from a hands-on perspective.\n\nTo follow along in a hands-on fashion, you will need:\n\n`kubectl-ate`\n\ninstalledYou can install Agent Substrate and `kubectl-ate`\n\nfrom the Agent Substrate repo.\n\nWithin the Agent Substrate repo, you will see a file in the `hack`\n\ndirectory called `ate-dev-env.sh.example`\n\n. Make a copy of the file:\n\n```\ncp hack/ate-dev-env.sh.example .ate-dev-env.sh\n```\n\nThen, edit the file with your cluster and account information. For example, if you are using GCP and deploy a GKE cluster, the `.ate-dev-env.sh`\n\nwill look like the below:\n\n```\nPROJECT_ID=<your-project-id>\nPROJECT_NUMBER=<your-project-number>\nGCE_REGION=<bucket-region>\nCLUSTER_LOCATION=<cluster-zone-or-region>\nCLUSTER_NAME=<your-gke-cluster>\nBUCKET_NAME=<your-snapshot-bucket>\nKO_DOCKER_REPO=gcr.io/<your-project-id>/ate-images\nKUBECTL_CONTEXT=<your-kube-context>\n```\n\nOnce you've filled in your `.ate-dev-env.sh`\n\nfile, you can source it and install the counter demo which exists in the Agent Substrate repo.\n\n```\nsource .ate-dev-env.sh\n./hack/install-ate.sh --deploy-demo-counter\n```\n\nYou can then install the `kubectl-ate`\n\nCLI to interact with Actors, Workers, and Templates.\n\n```\ngo install ./cmd/kubectl-ate\n\nexport PATH=\"$(go env GOPATH)/bin:$PATH\"\n```\n\nVerify that Substrate is up and operational:\n\n```\nkubectl get workerpools.ate.dev counter -n ate-demo-counter\nkubectl get pods -n ate-demo-counter\nkubectl ate get workers\n```\n\nIf your GKE cluster is regional or has Worker Nodes spread across zones, pin the demo `WorkerPool`\n\nto one zone before creating benchmark actors. Agent Substrate uses `checkpoint/restore`\n\n, and gVisor restores can fail if a snapshot created on oneunderlying CPU platform is restored on a node with a different CPU feature set.\n\nChoose the zone where you want the counter workers to run:\n\n```\nexport SUBSTRATE_WORKER_ZONE=us-east1-d\n```\n\nThen patch the counter `WorkerPool`\n\nto schedule Workers in that zone.\n\n```\nkubectl patch workerpools.ate.dev counter \\\n  -n ate-demo-counter \\\n  --type=merge \\\n  -p \"{\\\"spec\\\":{\\\"template\\\":{\\\"nodeSelector\\\":{\\\"topology.kubernetes.io/zone\\\":\\\"${SUBSTRATE_WORKER_ZONE}\\\"}}}}\"\n```\n\nWith the installation and configuration complete, you can now start setting up the benchmark environment and tests.\n\nThere are several environment variables below. The `ACTOR_COUNT`\n\nis the number of logical counter agents to test in both scenarios. `BENCHMARK_NAMESPACE`\n\n| is the namespace for the always-on Kubernetes baseline workloads and the in-cluster benchmark client Pod. `BASELINE_PREFIX`\n\nis the name prefix for Kubernetes baseline Deployments and Services. `SUBSTRATE_PREFIX`\n\nis the name prefix for Substrate actors created by the benchmark. `TEMPLATE_REF`\n\nis the Substrate actor template reference in `<namespace>/<name>`\n\nformat. The counter demo creates `ate-demo-counter/counter`\n\n. `SUBSTRATE_ROUTER_URL`\n\nis the in-cluster URL for `atenet-router`\n\n; benchmark client sends Substrate actor traffic through this service. `BASELINE_CPU_REQUEST`\n\nis the requests assigned to each always-on Kubernetes baseline Pod. Used to make baseline resource consumption explicit. `BASELINE_MEMORY_REQUEST`\n\nis the memory request assigned to each always-on Kubernetes baseline Pod and is used to make baseline resource consumption explicit. `BASELINE_RESULTS_FILE`\n\nis the Local TSV file for Kubernetes baseline latency results. `SUBSTRATE_RESULTS_FILE`\n\nand `SUMMARY_FILE`\n\nare the files that contain the results.\n\n```\nexport ACTOR_COUNT=50\nexport BENCHMARK_NAMESPACE=cost-comparison\nexport BASELINE_PREFIX=k8s-counter\nexport SUBSTRATE_PREFIX=substrate-counter\nexport TEMPLATE_REF=ate-demo-counter/counter\nexport SUBSTRATE_ROUTER_URL=http://atenet-router.ate-system.svc:80\n\nexport BASELINE_CPU_REQUEST=50m\nexport BASELINE_MEMORY_REQUEST=64Mi\nexport SUBSTRATE_WORKER_CPU_REQUEST=50m\nexport SUBSTRATE_WORKER_MEMORY_REQUEST=64Mi\n\nexport BASELINE_RESULTS_FILE=baseline-kubernetes-results.tsv\nexport SUBSTRATE_RESULTS_FILE=substrate-results.tsv\nexport SUMMARY_FILE=cost-comparison-summary.txt\n```\n\n`ActorTemplate`\n\n. This keeps the Kubernetes baseline on the same counter server image used by the Substrate demo.\n\n```\nexport COUNTER_IMAGE=$(kubectl get actortemplates.ate.dev counter \\\n  -n ate-demo-counter \\\n  -o jsonpath='{.spec.containers[0].image}')\n\nprintf \"Counter image: %s\\n\" \"$COUNTER_IMAGE\"\ncase \"$COUNTER_IMAGE\" in\n  ko://*)\n    printf \"Counter image was not resolved: %s\\n\" \"$COUNTER_IMAGE\"\n    exit 1\n    ;;\nesac\nexport WORKER_REPLICAS=$(kubectl get workerpools.ate.dev counter \\\n  -n ate-demo-counter \\\n  -o jsonpath='{.spec.replicas}')\n\nprintf \"Logical agents: %s\\nSubstrate workers: %s\\n\" \\\n  \"$ACTOR_COUNT\" \"$WORKER_REPLICAS\"\n```\n\nIn this section, you will deploy the Kubernetes Pods that will be running the counter demo.\n\n```\nkubectl create namespace \"$BENCHMARK_NAMESPACE\"\n```\n\n`Deployment`\n\nand `Service`\n\nobject per logical counter Agent.\n\n```\nfor i in $(seq 1 \"$ACTOR_COUNT\"); do\n  name=$(printf \"%s-%03d\" \"$BASELINE_PREFIX\" \"$i\")\n\n  kubectl apply -f - <<EOF\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: ${name}\n  namespace: ${BENCHMARK_NAMESPACE}\n  labels:\n    app.kubernetes.io/name: counter\n    app.kubernetes.io/part-of: cost-comparison\n    cost-comparison/model: always-on-kubernetes\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app.kubernetes.io/name: counter\n      app.kubernetes.io/instance: ${name}\n  template:\n    metadata:\n      labels:\n        app.kubernetes.io/name: counter\n        app.kubernetes.io/instance: ${name}\n        app.kubernetes.io/part-of: cost-comparison\n        cost-comparison/model: always-on-kubernetes\n    spec:\n      containers:\n      - name: counter\n        image: ${COUNTER_IMAGE}\n        command:\n        - /ko-app/counter\n        ports:\n        - containerPort: 80\n        resources:\n          requests:\n            cpu: ${BASELINE_CPU_REQUEST}\n            memory: ${BASELINE_MEMORY_REQUEST}\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: ${name}\n  namespace: ${BENCHMARK_NAMESPACE}\n  labels:\n    app.kubernetes.io/name: counter\n    app.kubernetes.io/part-of: cost-comparison\n    cost-comparison/model: always-on-kubernetes\nspec:\n  selector:\n    app.kubernetes.io/name: counter\n    app.kubernetes.io/instance: ${name}\n  ports:\n  - name: http\n    port: 80\n    targetPort: 80\nEOF\ndone\nkubectl get deployments -n \"$BENCHMARK_NAMESPACE\" \\\n  -l cost-comparison/model=always-on-kubernetes\n\nkubectl get pods -n \"$BENCHMARK_NAMESPACE\" \\\n  -l cost-comparison/model=always-on-kubernetes\n```\n\nYou'll see an output with 50 objects like in the example below:\n\n```\nNAME                               READY   STATUS    RESTARTS   AGE\nk8s-counter-001-9f9f44464-h526d    1/1     Running   0          128m\nk8s-counter-002-9678fb86f-bwpwm    1/1     Running   0          128m\nk8s-counter-003-54bccfd7db-k5wgz   1/1     Running   0          128m\nk8s-counter-004-5957785959-59k24   1/1     Running   0          128m\nk8s-counter-005-5df4559dd4-pgm4v   1/1     Running   0          128m\nk8s-counter-006-5597c6cd9-nj9mm    1/1     Running   0          128m\nk8s-counter-007-8d5c5bb74-j2n6p    1/1     Running   0          128m\nk8s-counter-008-ff9898c5f-hc7hw    1/1     Running   0          128m\nk8s-counter-009-77bc4cf8bd-rk2sq   1/1     Running   0          128m\nk8s-counter-010-578684f8c8-d7xg8   1/1     Running   0          128m\nk8s-counter-011-8697447c4f-5jz9p   1/1     Running   0          128m\nk8s-counter-012-5d7bc67f4d-hhhkh   1/1     Running   0          128m\nxxxxx\nxxxxx\nxxxxx\nxxxxx\nxxxxx\n```\n\nIn this configuration, you can create one Actor per logical counter Agent.\n\n```\nfor i in $(seq 1 \"$ACTOR_COUNT\"); do\n  actor=$(printf \"%s-%03d\" \"$SUBSTRATE_PREFIX\" \"$i\")\n  kubectl ate create actor \"$actor\" --template \"$TEMPLATE_REF\" || true\ndone\n```\n\nWhen you run the above, you will see 50 Actors deployed via `kubectl ate get actors`\n\nHowever, you will only see 7 Workers.\n\nThat's the optimization and efficiency right there. Same exact deployment/configuration as the Kubernetes section above, except with Agent Substrate, you can run the same workloads in 7 Pods (Workers) insteadof 50.\n\nYou’ll see 50 Agent Substrate Actors and a smaller set of Workers (Pods). The Actors are logical workloads, but they are not actively running while they are `STATUS_SUSPENDED`\n\n. By default, Actors are in a \"suspended\" state until they are used, which is why Actors are so great from an efficiency perspective. When traffic arrives for an Actor, Agent Substrate assigns that actor to an available Worker, resumes it, serves the request, and can suspend it again afterward. This is the efficiency model: many idle actors can exist without each requiring its own always-on Kubernetes Pod.\n\nThe benchmark client runs inside the cluster so both paths avoid local port-forward overhead.\n\nbenchmark-client = temporary in-cluster curl Pod.\n\nWhy it exists:\n\n```\nkubectl delete pod benchmark-client \\\n  -n \"$BENCHMARK_NAMESPACE\" \\\n  --ignore-not-found\n\nkubectl run benchmark-client \\\n  -n \"$BENCHMARK_NAMESPACE\" \\\n  --image=curlimages/curl:8.10.1 \\\n  --restart=Never \\\n  --command -- sleep 3600\n\nkubectl wait --for=condition=Ready pod/benchmark-client \\\n  -n \"$BENCHMARK_NAMESPACE\" \\\n  --timeout=2m\n```\n\nEach baseline agent receives two requests:\n\nBecause these are always-on Pods, both requests should be served by already running Kubernetes workloads.\n\n```\nkubectl exec -n \"$BENCHMARK_NAMESPACE\" benchmark-client -- sh -c '\nset -eu\nactor_count=\"$1\"\nprefix=\"$2\"\nnamespace=\"$3\"\n\nprintf \"agent\\tfirst_seconds\\twarm_seconds\\n\"\n\nfor i in $(seq 1 \"$actor_count\"); do\n  name=$(printf \"%s-%03d\" \"$prefix\" \"$i\")\n  url=\"http://${name}.${namespace}.svc.cluster.local\"\n\n  first_seconds=$(curl -sS -o /dev/null -w \"%{time_total}\" -X POST \"$url\")\n  warm_seconds=$(curl -sS -o /dev/null -w \"%{time_total}\" -X POST \"$url\")\n\n  printf \"%s\\t%s\\t%s\\n\" \"$name\" \"$first_seconds\" \"$warm_seconds\"\ndone\n' sh \"$ACTOR_COUNT\" \"$BASELINE_PREFIX\" \"$BENCHMARK_NAMESPACE\" > \"$BASELINE_RESULTS_FILE\"\n```\n\nInspect the baseline results:\n\n```\ncolumn -t -s $'\\t' \"$BASELINE_RESULTS_FILE\"\n```\n\nYou'll see an output similar to the below for 50 counters.\n\n```\nagent            first_seconds  warm_seconds\nk8s-counter-001  0.023404       0.003911\nk8s-counter-002  0.023275       0.005233\nk8s-counter-003  0.015850       0.003773\nk8s-counter-004  0.017657       0.005033\nk8s-counter-005  0.014946       0.004443\nk8s-counter-006  0.015616       0.004212\nk8s-counter-007  0.016875       0.004261\nk8s-counter-008  0.014731       0.004317\nk8s-counter-009  0.017053       0.004707\nk8s-counter-010  0.013013       0.003273\nk8s-counter-011  0.014281       0.004552\nk8s-counter-012  0.018644       0.003734\nxxxxx\nxxxxx\n```\n\nEach Substrate actor receives two requests:\n\nAfter each actor is measured, the Actor is suspended so the worker can serve the next Actor.\n\n```\nprintf \"actor\\twake_seconds\\twarm_seconds\\n\" > \"$SUBSTRATE_RESULTS_FILE\"\n\nfor i in $(seq 1 \"$ACTOR_COUNT\"); do\n  actor=$(printf \"%s-%03d\" \"$SUBSTRATE_PREFIX\" \"$i\")\n  actor_host=\"${actor}.actors.resources.substrate.ate.dev\"\n\n  result=$(kubectl exec -n \"$BENCHMARK_NAMESPACE\" benchmark-client -- sh -c '\nset -eu\nrouter_url=\"$1\"\nactor_host=\"$2\"\n\nwake_seconds=$(curl -sS -o /dev/null -w \"%{time_total}\" \\\n  -X POST \\\n  -H \"Host: ${actor_host}\" \\\n  \"$router_url\")\n\nwarm_seconds=$(curl -sS -o /dev/null -w \"%{time_total}\" \\\n  -X POST \\\n  -H \"Host: ${actor_host}\" \\\n  \"$router_url\")\n\nprintf \"%s\\t%s\" \"$wake_seconds\" \"$warm_seconds\"\n' sh \"$SUBSTRATE_ROUTER_URL\" \"$actor_host\")\n\n  printf \"%s\\t%s\\n\" \"$actor\" \"$result\" >> \"$SUBSTRATE_RESULTS_FILE\"\n  kubectl ate suspend actor \"$actor\" >/dev/null\ndone\n```\n\nInspect the Substrate results:\n\n```\ncolumn -t -s $'\\t' \"$SUBSTRATE_RESULTS_FILE\"\n```\n\nNow it's time to measure the results and see if Substrate really saves resources and helps optimize workloads running in k8s.\n\n```\nexport BASELINE_RUNNING_PODS=$(kubectl get pods \\\n  -n \"$BENCHMARK_NAMESPACE\" \\\n  -l cost-comparison/model=always-on-kubernetes \\\n  --field-selector=status.phase=Running \\\n  --no-headers | wc -l | tr -d ' ')\n\nexport SUBSTRATE_WORKLOAD_PODS=$(kubectl get pods \\\n  -n ate-demo-counter \\\n  --field-selector=status.phase=Running \\\n  --no-headers | wc -l | tr -d ' ')\n```\n\nThe results for k8s always on:\n\n```\nprintf \"baseline_cpu_request_per_pod=%s\\n\" \"$BASELINE_CPU_REQUEST\"\nprintf \"baseline_memory_request_per_pod=%s\\n\" \"$BASELINE_MEMORY_REQUEST\"\n\nbaseline_cpu_request_per_pod=50m\nbaseline_memory_request_per_pod=64Mi\n```\n\nThe results for Substrate:\n\n```\nbaseline_cpu_request_per_pod=50m\nbaseline_memory_request_per_pod=64Mi\nsubstrate_worker_cpu_request_per_pod=50m\nsubstrate_worker_memory_request_per_pod=64Mi\nbaseline_total_cpu_request_millicores=2500\nbaseline_total_memory_request_mib=3200\nsubstrate_total_cpu_request_millicores=250\nsubstrate_total_memory_request_mib=320\n```\n\nThe above shows the requested-capacity savings:\n\nKubernetes baseline: 50 Pods * 50m CPU / 64Mi = 2500m CPU / 3200Mi\n\nSubstrate: 5 Pods * 50m CPU / 64Mi = 250m CPU / 320Mi\n\nWhich results in a **90%** reduction!\n\nYou can also capture actual CPU and memory usage if `metrics-server`\n\nis installed:\n\n```\nkubectl top pods -n \"$BENCHMARK_NAMESPACE\" \\\n  -l cost-comparison/model=always-on-kubernetes || true\n\nkubectl top pods -n ate-demo-counter || true\n```\n\nNotice how many resources are saved with just a few Substrate deployments (on the bottom) vs k8s running workloads (50 pods, 50 agents).\n\nThe Kubernetes baseline is running one always-on Pod per logical workload. With`ACTOR_COUNT=50`\n\n, that means 50 `k8s-counter-*`\n\nPods are running even when they are mostly idle. Each baseline Pod has explicit CPU and memory requests, so the always-on capacity grows linearly with the number of logical workloads.\n\nThe Substrate side is running the same logical workload count as actors, but only the worker pool stays hot. In this run, the counter WorkerPool has 5 `counter-deployment-*`\n\nPods. Each worker Pod has explicit CPU and memory requests, so the requested always-on capacity grows with worker count, not Actor count.\n\nThe main difference is the always-on footprint:\n\n```\nKubernetes baseline: 50 running workload PodsAgent Substrate:     5 running worker Pods for 50 logical actors\n```\n\nYour result shows the core optimization: 50 idle logical workloads do not require50 always-on workload Pods when they run as Substrate actors.\n\n", "url": "https://wpnews.pro/news/cutting-idle-agent-costs-by-90-with-agent-substrate", "canonical_source": "https://dev.to/thenjdevopsguy/cutting-idle-agent-costs-by-90-with-agent-substrate-18en", "published_at": "2026-06-30 12:40:06+00:00", "updated_at": "2026-06-30 12:48:41.954695+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "developer-tools"], "entities": ["Agent Substrate", "Kubernetes", "gVisor", "GCP", "GKE"], "alternates": {"html": "https://wpnews.pro/news/cutting-idle-agent-costs-by-90-with-agent-substrate", "markdown": "https://wpnews.pro/news/cutting-idle-agent-costs-by-90-with-agent-substrate.md", "text": "https://wpnews.pro/news/cutting-idle-agent-costs-by-90-with-agent-substrate.txt", "jsonld": "https://wpnews.pro/news/cutting-idle-agent-costs-by-90-with-agent-substrate.jsonld"}}