{"slug": "the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other", "title": "The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally.", "summary": "The article describes the process of mirroring a production EKS cluster locally on a Mac, focusing on integrating Vault for secret management. It explains how Vault's Kubernetes auth method allows pods to authenticate using service account JWTs, enabling secure retrieval of credentials without hard-coded secrets. The author emphasizes the importance of setting up LimitRanges and testing the full injection workflow locally to debug failures safely.", "body_md": "Part 6 of 7 — The Mac Kubernetes Lab: A Production-Mirror Setup from Scratch.\nPreviously in Part 5: We installed Istio with revision-based upgrades, MetalLB for LoadBalancer IPs, and practised traffic management with Gateways, VirtualServices, and fault injection. The cluster behaves. Now we wire up the last three pieces that turn it from “a working local cluster” into “a real mirror of our production EKS.”\nThe cluster works. Istio is running. MetalLB is handing out IPs. But it’s still missing three layers that make the production parity actually meaningful:\nThe LimitRange story is the most important of the three, so I’ll tell it properly when we get there. First, the auth layer.\nVault’s Kubernetes auth method lets pods authenticate by presenting their service account JWT. Vault validates the token against the Kubernetes API server and exchanges it for a Vault token with the appropriate policies attached.\nOn the production EKS clusters at work, this is how microservices retrieve database credentials, API keys, and TLS certificates: no hard-coded secrets, no secret sprawl, every issuance audit-logged in Vault.\nSetting it up locally means I can test the full injection workflow without a VPN, and debug failures on a cluster where the stakes are zero.\nWe deploy just the Vault agent injector in the lab cluster. It points to the external Vault VM rather than running its own Vault server:\n# 💻 Mac\nkubectx lab-cluster\nhelm repo add hashicorp https://helm.releases.hashicorp.com\nhelm repo update\n# Get the vault VM IP - does not persist across sessions\nexport VAULT_IP=$(orb run -m vault hostname -I | awk '{print $1}')\necho \"VAULT_IP=$VAULT_IP\"\nhelm install vault hashicorp/vault \\\n--namespace vault --create-namespace \\\n--set \"injector.externalVaultAddr=http://$VAULT_IP:8200\"\nkubectl get pods -n vault\n# vault-agent-injector-xxx 1/1 Running 0 30s\nRun this on the vault VM, pointing Vault at the lab cluster’s API server:\n# 🖥️ VM: vault\n# Re-export - always required, doesn't persist across sessions\nexport VAULT_ADDR='http://127.0.0.1:8200'\nexport VAULT_ROOT_TOKEN=$(grep 'Initial Root Token' ~/vault-init.txt | awk '{print $NF}')\n# If Vault is sealed after a reboot:\n# vault operator unseal $(grep 'Unseal Key 1' ~/vault-init.txt | awk '{print $NF}')\nvault login $VAULT_ROOT_TOKEN\n# Get CP_IP from the Mac terminal: orb run -m cp01 hostname -I | awk '{print $1}'\nexport CP_IP=<cp01-ip>\n# Regenerate the CA cert if /tmp was cleared after reboot\nvault read -field=certificate pki_k8s/issuer/default > /tmp/lab-ca.crt\n# Enable Kubernetes auth (safe to re-run - ignores \"already enabled\")\nvault auth enable -path=lab-k8s kubernetes 2>/dev/null || echo \"already enabled\"\n# Configure - point Vault at the lab cluster API server\nvault write auth/lab-k8s/config \\\nkubernetes_host=\"https://$CP_IP:6443\" \\\nkubernetes_ca_cert=@/tmp/lab-ca.crt\nvault read auth/lab-k8s/config\nCreate a simple role and test it from a pod:\n# 🖥️ VM: vault\n# Create a policy\nvault policy write read-secrets - <<EOF\npath \"secret/data/myapp/*\" {\ncapabilities = [\"read\"]\n}\nEOF\n# Create a K8s auth role\nvault write auth/lab-k8s/role/myapp \\\nbound_service_account_names=myapp \\\nbound_service_account_namespaces=default \\\npolicies=read-secrets \\\nttl=1h\n# Write a test secret\nvault secrets enable -path=secret kv-v2 2>/dev/null || true\nvault kv put secret/myapp/config db_password=\"supersecret\"\n# 💻 Mac — deploy a pod with Vault annotations\nkubectl apply -f - <<EOF\napiVersion: v1\nkind: ServiceAccount\nmetadata:\nname: myapp\nnamespace: default\n---\napiVersion: v1\nkind: Pod\nmetadata:\nname: vault-test\nnamespace: default\nannotations:\nvault.hashicorp.com/agent-inject: \"true\"\nvault.hashicorp.com/role: \"myapp\"\nvault.hashicorp.com/agent-inject-secret-config: \"secret/data/myapp/config\"\nspec:\nserviceAccountName: myapp\ncontainers:\n- name: app\nimage: busybox\ncommand: [\"sleep\", \"3600\"]\nEOF\n# Check the secret was injected\nkubectl exec vault-test -c app -- cat /vault/secrets/config\n# db_password: supersecret\nIf that last line returns the password, the whole chain works: service account JWT → Vault validation → Vault token → secret retrieval → file injection. Every link of the chain is what a real production app does.\nCrossplane turns a Kubernetes cluster into a universal control plane for cloud infrastructure. Instead of Terraform modules or CloudFormation stacks, you define infrastructure as Kubernetes custom resources, and Crossplane reconciles them continuously.\nI use it at work to provision AWS resources (EKS node groups, RDS, S3 buckets, IAM roles) and VMware Cloud Director resources through a custom provider. The lab version mirrors the AWS side of that.\n# 💻 Mac\nhelm repo add crossplane-stable https://charts.crossplane.io/stable\nhelm repo update\n# Composition Functions are enabled by default in recent versions.\n# The --enable-composition-functions flag was removed.\nhelm install crossplane crossplane-stable/crossplane \\\n--namespace crossplane-system --create-namespace\nkubectl get pods -n crossplane-system -w\n# NAME READY STATUS AGE\n# crossplane-xxx 1/1 Running 60s\n# crossplane-rbac-manager-xxx 1/1 Running 60s\nkubectl apply -f - <<EOF\napiVersion: pkg.crossplane.io/v1\nkind: Provider\nmetadata:\nname: provider-aws-ec2\nspec:\npackage: xpkg.upbound.io/upbound/provider-aws-ec2:latest\nEOF\nkubectl get pkg\nA bare-minimum ProviderConfig enough to verify the install is working:\n# 💻 Mac\nkubectl apply -f - <<EOF\napiVersion: aws.upbound.io/v1beta1\nkind: ProviderConfig\nmetadata:\nname: default\nspec:\ncredentials:\nsource: Secret\nsecretRef:\nnamespace: crossplane-system\nname: aws-creds\nkey: creds\nEOF\nIn a real setup, you create a IRSA ( IAM Role for Service Account) to authenticate and give the provider permission to create and monitor resources. For local validation, the provider installs, and the compositions can be validated structurally without ever calling AWS.\nThe LimitRange story.\nThis is the one that came from a real incident at work.\nWe had repeated disk-pressure events in our production EKS cluster. Pods with no resource requests had crept into a few namespaces — someone deployed a YAML that omitted resources: entirely, and nobody caught it in review. The Kubernetes scheduler had no signal about their consumption, so nodes ended up overcommitted. Then ephemeral storage filled up, eviction kicked in, and a couple of unrelated pods went down with it. Total downtime measured in tens of minutes. Cause-and-effect chain that took a while to untangle.\nThe fix is one of the most boring features in Kubernetes: LimitRanges. They set default resource requests and limits at the namespace level. Any container that doesn’t specify its own requests gets the defaults applied automatically by the admission controller. The scheduler always has a signal. Overcommit becomes a deliberate choice, not an accident.\n# 💻 Mac\nkubectl apply -f - <<EOF\napiVersion: v1\nkind: LimitRange\nmetadata:\nname: default-limits\nnamespace: default\nspec:\nlimits:\n- default:\nmemory: 512Mi\ncpu: 500m\ndefaultRequest:\nmemory: 128Mi\ncpu: 100m\nmax:\nephemeral-storage: 2Gi\ntype: Container\nEOF\nApply this to every namespace that hosts workloads. In production, I now apply it as a post-provisioning step on every new namespace:\n# 💻 Mac — apply to multiple namespaces\nfor ns in default vault crossplane-system istio-system; do\nkubectl apply -f limitrange.yaml -n $ns\ndone\nephemeral-storage\nmax is the part that specifically addresses the disk-pressure failure mode — it bounds how much scratch space a container can consume, which is what spirals when ephemeral storage runs unbounded.\nLet’s confirm the whole stack is up:\n# 💻 Mac\nkubectl get nodes -o wide\n# NAME STATUS ROLES VERSION\n# cp01 Ready control-plane v1.34.x\n# worker01 Ready <none> v1.34.x\n# worker02 Ready <none> v1.34.x\nkubectl get pods -A\n# Cilium/Calico, CoreDNS, istiod-1-26, MetalLB, Crossplane, Vault injector - all Running\nThe only meaningful differences are the CNI (because of OrbStack’s VM capabilities, as we covered in Part 4) and the LoadBalancer implementation. Everything else is identical in configuration. The mental model from this lab transfers directly to the production cluster, and vice versa.\nIn the final article: How to stop and start the lab without losing state, the CKS exam scenarios this cluster was purpose-built for, and the shell aliases that make the whole thing pleasant to live with.\n← Part 5: How I Practise Istio Upgrades Locally Before Touching Production EKS | Part 7: The Day 2 Reality of Running a Kubernetes Lab on Your Mac: Stop/Start, CKS Scenarios, and What I Learned Building It →\nI’m Noah Makau, a DevSecOps engineer based in Nairobi. I run a small DevOps consultancy and hold CKA, CKAD, and the AWS Solutions Architect Professional certifications , currently preparing for CKS. I write about Kubernetes, Vault, Crossplane, and the day-to-day of running platforms that actually have to stay up.\noriginally published at blog.arkilasystems.com", "url": "https://wpnews.pro/news/the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other", "canonical_source": "https://dev.to/nkmakau/the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other-lessons-from-18b8", "published_at": "2026-05-22 13:26:56+00:00", "updated_at": "2026-05-22 13:36:07.495715+00:00", "lang": "en", "topics": ["cloud-computing", "developer-tools", "open-source", "enterprise-software", "cybersecurity"], "entities": ["Istio", "MetalLB", "Vault", "Kubernetes", "EKS", "Hashicorp", "Mac", "VPN"], "alternates": {"html": "https://wpnews.pro/news/the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other", "markdown": "https://wpnews.pro/news/the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other.md", "text": "https://wpnews.pro/news/the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other.txt", "jsonld": "https://wpnews.pro/news/the-disk-pressure-incident-that-taught-me-to-always-set-limitranges-and-other.jsonld"}}