{"slug": "auditing-kubernetes-manifests-with-ai-a-practical-workflow", "title": "Auditing Kubernetes Manifests With AI: A Practical Workflow", "summary": "A senior Kubernetes engineer has developed a practical workflow for auditing Kubernetes manifests using AI assistants. By scoping prompts to specific categories like probes and graceful shutdown, and providing full resource bundles instead of isolated YAML files, the engineer obtains actionable, fixable findings rather than generic noise. The approach includes verifying AI suggestions against official Kubernetes documentation to catch common failure modes.", "body_md": "A senior K8s engineer I work with audits manifests faster than I read them. He's seen so many patterns that \"missing readinessProbe on a Deployment that takes 45 seconds to start\" jumps off the page. Most of us don't have that pattern library memorized — and increasingly, we don't need to. AI assistants have read more Kubernetes manifests than any human ever will.\n\nThe catch: a generic \"review this YAML\" prompt produces generic noise. You need to direct the model toward the categories of issues that actually matter in your environment.\n\n**Mistake 1: Asking for \"a security review.\"** You'll get a bullet list of every possible concern, ranked alphabetically, with no signal about which matter. You'll skim, dismiss, and learn nothing.\n\n**Mistake 2: Pasting one manifest.** Real Kubernetes problems live in the interaction between resources — a Deployment's readiness probe and a Service's selector, a NetworkPolicy and the actual app traffic. One YAML in isolation hides most of the bugs.\n\nThe fix for both is the same: give the model a *bounded scope* and *enough context* to reason about interactions.\n\nPre-decide what you're checking *for*. Different prompts for different dimensions:\n\nMixing dimensions in one review produces wishy-washy output. Pick one, get a clean answer, move on.\n\nFor a workload review, paste:\n\nFor YAML this is usually under 500 lines, well within any model's context window. The model can now reason about interactions, not just isolated fields.\n\nThe big difference between \"tell me about this YAML\" and a useful review is *the instruction format*. Compare:\n\nReview this Kubernetes manifest.\n\nversus:\n\nYou are reviewing a production Deployment + Service + NetworkPolicy bundle. For each finding, give: (1) severity (critical/high/medium/low), (2) the exact field path that's wrong, (3) one sentence on why it matters, (4) the corrected YAML snippet. Focus only on probes, lifecycle, and graceful shutdown. Ignore documentation/comments.\n\nThe first prompt produces an essay. The second produces a list of fixable issues.\n\nThis is where most reviews go wrong. The model is right *most of the time*. It's wrong some of the time, often in ways that look correct.\n\nCommon AI failure modes in K8s review:\n\n`spec.template.spec.terminationGracePeriod`\n\n(it's `terminationGracePeriodSeconds`\n\n)`policy/v1beta1 PodDisruptionBudget`\n\n(removed in 1.25)`failureThreshold`\n\ndefaults to 1 when it's 3`runAsNonRoot: true`\n\nfor a workload that legitimately needs rootFor every \"fix\" the model suggests, glance at the official K8s docs for that field. This adds 30 seconds per finding and catches the wrong ones. Without this step, you will apply changes that break things.\n\nHere's a Deployment I reviewed last week:\n\n```\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: payments\nspec:\n  replicas: 2\n  selector:\n    matchLabels: { app: payments }\n  template:\n    metadata:\n      labels: { app: payments }\n    spec:\n      containers:\n      - name: app\n        image: registry.example.com/payments:v3.1.0\n        ports:\n        - containerPort: 8080\n        env:\n        - name: DB_URL\n          value: postgres://payments-db:5432/payments\n        resources:\n          limits:\n            cpu: \"2\"\n            memory: \"2Gi\"\n        readinessProbe:\n          httpGet: { path: /healthz, port: 8080 }\n          initialDelaySeconds: 5\n```\n\nI asked Claude to review for probes and graceful shutdown only. The findings:\n\n`requests`\n\n, only `limits`\n\n`BestEffort`\n\nQoS, first to be evicted under pressure. Set requests equal to or below limits.`initialDelaySeconds: 5`\n\n`startupProbe`\n\nwith longer threshold.`livenessProbe`\n\n`terminationGracePeriodSeconds`\n\n`preStop`\n\nhook`sleep 15`\n\npreStop.All five were real, all five were fixable in two minutes of YAML editing. The model didn't tell me about anything irrelevant. That's because I scoped the prompt to \"probes and graceful shutdown only.\"\n\nThe big one — #5 — is something I've personally been bitten by twice. The model wouldn't have prioritized it without the directive prompt.\n\nYes, you should run those too. They catch consistent issues at admission time. They don't catch issues that require *judgment*: \"is 30 seconds enough graceful shutdown for this specific service?\" Policy enforcement is a floor; AI review is a directed second opinion above that floor.\n\nI run both. Kyverno catches \"no securityContext at all\" before it ever lands. AI review catches \"readinessProbe path doesn't match what the app exposes\" — something only a human (or an AI imitating one) would notice.\n\nIf you want a template, here's the one I use most:\n\nYou are reviewing a Kubernetes workload bundle for production readiness. Focus only on: probes (readiness, liveness, startup),\n\n`terminationGracePeriodSeconds`\n\n, preStop hooks, and rolling update strategy. For each finding produce: severity, exact field path, why it matters in one sentence, corrected YAML. Ignore everything else (security context, network policies, resource limits — those are separate reviews). The workload is [serves HTTP at /api on port 8080 / consumes from a queue / batch processor that runs N hours].\n\nThe bracketed context at the end is what makes the review accurate for *your* workload. Without it, the model assumes a generic web service.\n\nFor our full prompt library on Kubernetes review, see the [Kubernetes & Helm category](https://dev.to/categories/kubernetes-helm/) — especially [kubernetes-yaml-security-review](https://dev.to/prompts/kubernetes-yaml-security-review/) and [kubernetes-resource-limits-tuning](https://dev.to/prompts/kubernetes-resource-limits-tuning/).\n\n*This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.*", "url": "https://wpnews.pro/news/auditing-kubernetes-manifests-with-ai-a-practical-workflow", "canonical_source": "https://dev.to/devopsaitoolkit/auditing-kubernetes-manifests-with-ai-a-practical-workflow-4368", "published_at": "2026-06-16 04:31:15+00:00", "updated_at": "2026-06-16 04:47:21.612175+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools"], "entities": ["Kubernetes", "Claude"], "alternates": {"html": "https://wpnews.pro/news/auditing-kubernetes-manifests-with-ai-a-practical-workflow", "markdown": "https://wpnews.pro/news/auditing-kubernetes-manifests-with-ai-a-practical-workflow.md", "text": "https://wpnews.pro/news/auditing-kubernetes-manifests-with-ai-a-practical-workflow.txt", "jsonld": "https://wpnews.pro/news/auditing-kubernetes-manifests-with-ai-a-practical-workflow.jsonld"}}