cd /news/artificial-intelligence/auditing-kubernetes-manifests-with-a… · home topics artificial-intelligence article
[ARTICLE · art-28993] src=dev.to ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Auditing Kubernetes Manifests With AI: A Practical Workflow

A senior Kubernetes engineer has developed a practical workflow for auditing Kubernetes manifests using AI assistants. By scoping prompts to specific categories like probes and graceful shutdown, and providing full resource bundles instead of isolated YAML files, the engineer obtains actionable, fixable findings rather than generic noise. The approach includes verifying AI suggestions against official Kubernetes documentation to catch common failure modes.

read4 min views1 publishedJun 16, 2026

A senior K8s engineer I work with audits manifests faster than I read them. He's seen so many patterns that "missing readinessProbe on a Deployment that takes 45 seconds to start" jumps off the page. Most of us don't have that pattern library memorized — and increasingly, we don't need to. AI assistants have read more Kubernetes manifests than any human ever will.

The catch: a generic "review this YAML" prompt produces generic noise. You need to direct the model toward the categories of issues that actually matter in your environment.

Mistake 1: Asking for "a security review." You'll get a bullet list of every possible concern, ranked alphabetically, with no signal about which matter. You'll skim, dismiss, and learn nothing.

Mistake 2: Pasting one manifest. Real Kubernetes problems live in the interaction between resources — a Deployment's readiness probe and a Service's selector, a NetworkPolicy and the actual app traffic. One YAML in isolation hides most of the bugs.

The fix for both is the same: give the model a bounded scope and enough context to reason about interactions.

Pre-decide what you're checking for. Different prompts for different dimensions:

Mixing dimensions in one review produces wishy-washy output. Pick one, get a clean answer, move on.

For a workload review, paste:

For YAML this is usually under 500 lines, well within any model's context window. The model can now reason about interactions, not just isolated fields.

The big difference between "tell me about this YAML" and a useful review is the instruction format. Compare:

Review this Kubernetes manifest.

versus:

You are reviewing a production Deployment + Service + NetworkPolicy bundle. For each finding, give: (1) severity (critical/high/medium/low), (2) the exact field path that's wrong, (3) one sentence on why it matters, (4) the corrected YAML snippet. Focus only on probes, lifecycle, and graceful shutdown. Ignore documentation/comments.

The first prompt produces an essay. The second produces a list of fixable issues.

This is where most reviews go wrong. The model is right most of the time. It's wrong some of the time, often in ways that look correct.

Common AI failure modes in K8s review:

spec.template.spec.terminationGracePeriod

(it's terminationGracePeriodSeconds

)policy/v1beta1 PodDisruptionBudget

(removed in 1.25)failureThreshold

defaults to 1 when it's 3runAsNonRoot: true

for a workload that legitimately needs rootFor every "fix" the model suggests, glance at the official K8s docs for that field. This adds 30 seconds per finding and catches the wrong ones. Without this step, you will apply changes that break things.

Here's a Deployment I reviewed last week:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments
spec:
  replicas: 2
  selector:
    matchLabels: { app: payments }
  template:
    metadata:
      labels: { app: payments }
    spec:
      containers:
      - name: app
        image: registry.example.com/payments:v3.1.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_URL
          value: postgres://payments-db:5432/payments
        resources:
          limits:
            cpu: "2"
            memory: "2Gi"
        readinessProbe:
          httpGet: { path: /healthz, port: 8080 }
          initialDelaySeconds: 5

I asked Claude to review for probes and graceful shutdown only. The findings:

requests

, only limits

BestEffort

QoS, first to be evicted under pressure. Set requests equal to or below limits.initialDelaySeconds: 5

startupProbe

with longer threshold.livenessProbe

terminationGracePeriodSeconds

preStop

hooksleep 15

preStop.All five were real, all five were fixable in two minutes of YAML editing. The model didn't tell me about anything irrelevant. That's because I scoped the prompt to "probes and graceful shutdown only."

The big one — #5 — is something I've personally been bitten by twice. The model wouldn't have prioritized it without the directive prompt.

Yes, you should run those too. They catch consistent issues at admission time. They don't catch issues that require judgment: "is 30 seconds enough graceful shutdown for this specific service?" Policy enforcement is a floor; AI review is a directed second opinion above that floor.

I run both. Kyverno catches "no securityContext at all" before it ever lands. AI review catches "readinessProbe path doesn't match what the app exposes" — something only a human (or an AI imitating one) would notice.

If you want a template, here's the one I use most:

You are reviewing a Kubernetes workload bundle for production readiness. Focus only on: probes (readiness, liveness, startup),

terminationGracePeriodSeconds

, preStop hooks, and rolling update strategy. For each finding produce: severity, exact field path, why it matters in one sentence, corrected YAML. Ignore everything else (security context, network policies, resource limits — those are separate reviews). The workload is [serves HTTP at /api on port 8080 / consumes from a queue / batch processor that runs N hours].

The bracketed context at the end is what makes the review accurate for your workload. Without it, the model assumes a generic web service.

For our full prompt library on Kubernetes review, see the Kubernetes & Helm category — especially kubernetes-yaml-security-review and kubernetes-resource-limits-tuning.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @kubernetes 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/auditing-kubernetes-…] indexed:0 read:4min 2026-06-16 ·