{"slug": "meet-kueue-smart-job-queueing-for-kubernetes", "title": "🚦 Meet Kueue: Smart Job Queueing for Kubernetes 🧠⚙️", "summary": "Kueue, a Kubernetes-native job queueing system maintained as a kubernetes-sigs project, addresses the gap in Kubernetes' ability to manage when batch jobs start. It provides a control tower that queues jobs, enforces quota, and supports priority-based admission, fair sharing, and elastic jobs. The system is designed to prevent resource starvation and improve GPU utilization and cloud cost management.", "body_md": "Hey everyone 👋\n\nIf you run batch jobs, data pipelines, or any kind of AI and ML training on Kubernetes, you have probably hit this wall. Kubernetes is fantastic at deciding WHERE a pod should run, but it is surprisingly clueless about WHEN a job should start. 😅\n\nYou submit ten jobs, the cluster fills up, and the rest just sit there as Pending. No real queue, no priority, no fairness between teams. One noisy team can eat all your expensive nodes while everyone else waits. 🥲\n\nThat is exactly the gap Kueue fills, and today I want to walk you through it with a pile of hands on examples you can run on any cluster, even your homelab. 🏡\n\n👉 Key takeaway up front: Kueue is a job level manager that holds your jobs in a real queue and only admits them when there is enough quota to actually run them.\n\n🧪 Everything in this guide was tested against Kueue v0.18.1 using the v1beta2 API. I pinned every command and manifest to that version so you do not get surprised by API drift.\n\n✅ Why Kubernetes needs a queue\n\n✅ The building blocks in plain language\n\n✅ Installing Kueue\n\n✅ Setting up quota with a ResourceFlavor, a ClusterQueue, and a LocalQueue\n\n✅ Submitting a Job and watching it get queued and admitted\n\n✅ Priority based admission\n\n✅ Partial admission and elastic jobs\n\n✅ Multiple resource flavors for x86 and arm\n\n✅ Fair sharing between teams with cohorts\n\n✅ Dedicated quota with a shared fallback\n\n✅ Queueing a plain Pod\n\n✅ Why this matters a lot for GPUs and your cloud bill\n\nNative Kubernetes scheduling is pod centric. The scheduler looks at one pod at a time and tries to place it. That works great for long running services.\n\nBatch workloads are different. They have a beginning and an end, they often need a fixed chunk of capacity, and they compete with other teams for the same nodes.\n\nWithout a queueing layer you get:\n\n✅ Jobs that fail or stay Pending when resources are tight\n\n✅ No quota governance, so one team can starve the others\n\n✅ No admission priority, so a quick experiment can block production training\n\nKueue is a Kubernetes native job queueing system, maintained as a kubernetes-sigs project. It does not replace the scheduler. It sits in front of it. 🛂\n\nHere is the simple mental model. Think of the Kubernetes scheduler as the runway, and Kueue as the control tower deciding which flight is cleared for takeoff and when. ✈️\n\nWhen a job arrives, Kueue suspends it, creates a matching Workload object, checks if there is enough quota, and only then lets the pods be created. If there is no room, the job waits politely in the queue instead of failing.\n\nThere are four pieces you need to know, plus one bonus piece for teams.\n\n✅ ResourceFlavor 🍦\n\nDescribes a type of resource, usually tied to node labels. For example x86 nodes versus arm nodes, or GPU nodes versus CPU nodes. If you do not need to distinguish node types, you use one empty flavor.\n\n✅ ClusterQueue 🏦\n\nA cluster scoped object that holds the actual quota. This is where you say how much cpu, memory, or how many GPUs are available. Users do not submit to it directly.\n\n✅ LocalQueue 📥\n\nA namespaced object that points to a ClusterQueue. This is what users actually target with their jobs.\n\n✅ Workload 📦\n\nThe internal object Kueue creates for each job to track its admission state. You usually just observe it.\n\n✅ Cohort 👥 (bonus)\n\nA group of ClusterQueues that can borrow each other unused quota. This is the magic behind fair sharing between teams.\n\nThe simplest method is to apply the released manifests with server side apply.\n\n```\nkubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.18.1/manifests.yaml\n```\n\nThe controller runs in the kueue-system namespace. Give it a few seconds and check it is healthy.\n\n```\nkubectl get deploy -n kueue-system\n```\n\nYou should see the controller manager become ready.\n\n```\nNAME                       READY   UP-TO-DATE   AVAILABLE   AGE\nkueue-controller-manager   1/1     1            1           30s\n```\n\nPrefer Helm? Kueue publishes an OCI chart for each release. Just make sure the chart version matches the release you want.\n\n```\nhelm install kueue oci://registry.k8s.io/kueue/charts/kueue \\\n  --version=0.18.1 \\\n  --namespace kueue-system \\\n  --create-namespace \\\n  --wait --timeout 300s\n```\n\nSince we are not distinguishing node types in this first demo, an empty flavor is all we need.\n\n```\n# default-flavor.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ResourceFlavor\nmetadata:\n  name: \"default-flavor\"\n```\n\nApply it.\n\n```\nkubectl apply -f default-flavor.yaml\n```\n\nNow we define the quota for the whole cluster. Here we allow 9 cpu and 36Gi of memory, all served by our single flavor.\n\n```\n# cluster-queue.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ClusterQueue\nmetadata:\n  name: \"cluster-queue\"\nspec:\n  namespaceSelector: {} # match all namespaces\n  resourceGroups:\n  - coveredResources: [\"cpu\", \"memory\"]\n    flavors:\n    - name: \"default-flavor\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 9\n      - name: \"memory\"\n        nominalQuota: 36Gi\n```\n\nApply it.\n\n```\nkubectl apply -f cluster-queue.yaml\n```\n\nOne important detail. The flavor name under spec.resourceGroups must match the ResourceFlavor name from step 2. If they do not match, the ClusterQueue will not become ready. 🔗\n\nUsers cannot send work to a ClusterQueue directly. They need a LocalQueue in their namespace that points to it. We will put ours in the default namespace.\n\n```\n# default-user-queue.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: LocalQueue\nmetadata:\n  namespace: \"default\"\n  name: \"user-queue\"\nspec:\n  clusterQueue: \"cluster-queue\"\n```\n\nApply it.\n\n```\nkubectl apply -f default-user-queue.yaml\n```\n\nQuick tip: you can apply all three of the above at once using the example bundle from the project.\n\n```\nkubectl apply -f https://kueue.sigs.k8s.io/examples/admin/single-clusterqueue-setup.yaml\n```\n\nThis is the only change your users need to make to an existing Job. Add the kueue.x-k8s.io/queue-name label pointing to the LocalQueue, and make sure each pod declares resource requests.\n\n```\n# sample-job.yaml\napiVersion: batch/v1\nkind: Job\nmetadata:\n  generateName: sample-job-\n  namespace: default\n  labels:\n    kueue.x-k8s.io/queue-name: user-queue\nspec:\n  parallelism: 3\n  completions: 3\n  template:\n    spec:\n      containers:\n      - name: dummy-job\n        image: registry.k8s.io/e2e-test-images/agnhost:2.53\n        command: [ \"/bin/sh\" ]\n        args: [ \"-c\", \"sleep 60\" ]\n        resources:\n          requests:\n            cpu: \"1\"\n            memory: \"200Mi\"\n      restartPolicy: Never\n```\n\nNotice that you do not need to set the job to suspended yourself. Kueue manages suspension for you through a webhook and decides the best moment to start it. 🪄\n\nCreate the job.\n\n```\nkubectl create -f sample-job.yaml\n```\n\nList your local queues. The alias queues also works.\n\n```\nkubectl -n default get localqueues\nNAME         CLUSTERQUEUE    PENDING WORKLOADS\nuser-queue   cluster-queue   0\n```\n\nKueue creates a Workload object for your job. Have a look.\n\n```\nkubectl -n default get workloads.kueue.x-k8s.io\nNAME               QUEUE         RESERVED IN     ADMITTED   AGE\nsample-job-xxxxx   user-queue    cluster-queue   True       3s\n```\n\nWant the full story? Describe the workload. When there is not enough quota, you will see it sit unadmitted with a clear message.\n\n```\nkubectl -n default describe workload sample-job-xxxxx\nStatus:\n  Conditions:\n    Message:  workload didn't fit\n    Reason:   Pending\n    Status:   False\n    Type:     Admitted\n```\n\nThe moment quota frees up, Kueue admits it automatically. If you describe the Job itself, the event timeline tells the whole story.\n\n```\nEvents:\n  Type    Reason            From                  Message\n  ----    ------            ----                  -------\n  Normal  Suspended         job-controller        Job suspended\n  Normal  CreatedWorkload   kueue-job-controller  Created Workload: default/sample-job-xxxxx\n  Normal  Started           kueue-job-controller  Admitted by clusterQueue cluster-queue\n  Normal  Resumed           job-controller        Job resumed\n  Normal  Completed         job-controller        Job completed\n```\n\nNo babysitting required. 🎉\n\nInside a queue, not all jobs are equal. With a WorkloadPriorityClass you can control admission and preemption priority independently from pod priority. Production training jumps the line ahead of throwaway experiments. 🏎️\n\nFirst create the priority class.\n\n```\n# sample-priority.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: WorkloadPriorityClass\nmetadata:\n  name: sample-priority\nvalue: 10000\ndescription: \"Sample priority\"\n```\n\nThen point a Job at it with the kueue.x-k8s.io/priority-class label.\n\n```\n# priority-job.yaml\napiVersion: batch/v1\nkind: Job\nmetadata:\n  name: sample-job\n  labels:\n    kueue.x-k8s.io/queue-name: user-queue\n    kueue.x-k8s.io/priority-class: sample-priority\nspec:\n  parallelism: 3\n  completions: 3\n  suspend: true\n  template:\n    spec:\n      containers:\n      - name: dummy-job\n        image: registry.k8s.io/e2e-test-images/agnhost:latest\n        args: [\"pause\"]\n      restartPolicy: Never\n```\n\nHigher value means higher priority for queuing and preemption. The neat part is this priority does not touch the pod priority, so it does not interfere with your normal Kubernetes scheduling. 👌\n\nSometimes a big job can still make progress with fewer pods. With the kueue.x-k8s.io/job-min-parallelism annotation, Kueue can admit the job at a reduced parallelism instead of leaving it Pending.\n\n```\n# partial-job.yaml\napiVersion: batch/v1\nkind: Job\nmetadata:\n  name: sample-job-partial-admission\n  namespace: default\n  labels:\n    kueue.x-k8s.io/queue-name: user-queue\n  annotations:\n    kueue.x-k8s.io/job-min-parallelism: \"5\"\nspec:\n  parallelism: 20\n  completions: 20\n  template:\n    spec:\n      containers:\n      - name: dummy-job\n        image: registry.k8s.io/e2e-test-images/agnhost:2.53\n        args: [\"entrypoint-tester\", \"hello\", \"world\"]\n        resources:\n          requests:\n            cpu: 1\n            memory: \"200Mi\"\n      restartPolicy: Never\n```\n\nIf only 9 cpu is free, this job is admitted with parallelism 9 instead of waiting for all 20. The completions count stays the same. 🙌\n\nElastic jobs let you change a running Job parallelism without recreating, restarting, or suspending it. This is an alpha feature, so you must enable the ElasticJobsViaWorkloadSlices feature gate and annotate the Job.\n\n```\n# elastic-job.yaml\napiVersion: batch/v1\nkind: Job\nmetadata:\n  name: sample-elastic-job\n  namespace: default\n  annotations:\n    kueue.x-k8s.io/elastic-job: \"true\"\n  labels:\n    kueue.x-k8s.io/queue-name: user-queue\nspec:\n  parallelism: 3\n  completions: 100\n  template:\n    spec:\n      containers:\n      - name: dummy-job\n        image: registry.k8s.io/e2e-test-images/agnhost:2.53\n        command: [ \"/bin/sh\" ]\n        args: [ \"-c\", \"sleep 60\" ]\n        resources:\n          requests:\n            cpu: \"100m\"\n            memory: \"100Mi\"\n      restartPolicy: Never\n```\n\nWhen you bump parallelism up, Kueue creates a new admitted Workload for the new pod count and marks the old one as Finished. When you scale down, the extra pods terminate and no new Workload is created. Smooth. 🧘\n\nReal clusters often mix node types. Say you have x86 and arm nodes labelled with cpu-arch. You can create one flavor per architecture.\n\n```\n# flavor-x86.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ResourceFlavor\nmetadata:\n  name: \"x86\"\nspec:\n  nodeLabels:\n    cpu-arch: x86\n# flavor-arm.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ResourceFlavor\nmetadata:\n  name: \"arm\"\nspec:\n  nodeLabels:\n    cpu-arch: arm\n```\n\nThen reference both in a single ClusterQueue. Here cpu is split across the two architectures, while memory uses the simple default flavor because we do not care which architecture provides it.\n\n```\n# cluster-queue-multi.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ClusterQueue\nmetadata:\n  name: \"cluster-queue\"\nspec:\n  namespaceSelector: {} # match all\n  resourceGroups:\n  - coveredResources: [\"cpu\"]\n    flavors:\n    - name: \"x86\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 9\n    - name: \"arm\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 12\n  - coveredResources: [\"memory\"]\n    flavors:\n    - name: \"default-flavor\"\n      resources:\n      - name: \"memory\"\n        nominalQuota: 84Gi\n```\n\nThe labels in the ResourceFlavor must match the labels on your nodes. If you use the cluster autoscaler, make sure it adds those labels to new nodes too. 🏷️\n\nThis is where Kueue really shines. Put two ClusterQueues in the same cohort and they can borrow each other unused quota.\n\n```\n# team-a-cq.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ClusterQueue\nmetadata:\n  name: \"team-a-cq\"\nspec:\n  namespaceSelector: {}\n  cohortName: \"team-ab\"\n  resourceGroups:\n  - coveredResources: [\"cpu\", \"memory\"]\n    flavors:\n    - name: \"default-flavor\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 9\n        borrowingLimit: 6\n      - name: \"memory\"\n        nominalQuota: 36Gi\n        borrowingLimit: 24Gi\n# team-b-cq.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ClusterQueue\nmetadata:\n  name: \"team-b-cq\"\nspec:\n  namespaceSelector: {}\n  cohortName: \"team-ab\"\n  resourceGroups:\n  - coveredResources: [\"cpu\", \"memory\"]\n    flavors:\n    - name: \"default-flavor\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 12\n      - name: \"memory\"\n        nominalQuota: 48Gi\n```\n\nBoth queues belong to the cohort team-ab. Team A has its own guaranteed quota, but it can also borrow idle capacity from Team B, up to the borrowingLimit of 6 cpu and 24Gi. When Team B needs its capacity back, Kueue handles it. ⚖️\n\nA ClusterQueue can borrow from the cohort even when it has zero nominal quota for a flavor. This lets you give each team dedicated capacity on one flavor, plus a shared pool to fall back on.\n\n```\n# team-a-cq.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ClusterQueue\nmetadata:\n  name: \"team-a-cq\"\nspec:\n  namespaceSelector: {} # match all\n  cohortName: \"team-ab\"\n  resourceGroups:\n  - coveredResources: [\"cpu\"]\n    flavors:\n    - name: \"arm\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 9\n        borrowingLimit: 0\n    - name: \"x86\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 0\n  - coveredResources: [\"memory\"]\n    flavors:\n    - name: \"default-flavor\"\n      resources:\n      - name: \"memory\"\n        nominalQuota: 36Gi\n# shared-cq.yaml\napiVersion: kueue.x-k8s.io/v1beta2\nkind: ClusterQueue\nmetadata:\n  name: \"shared-cq\"\nspec:\n  namespaceSelector: {} # match all\n  cohortName: \"team-ab\"\n  resourceGroups:\n  - coveredResources: [\"cpu\"]\n    flavors:\n    - name: \"x86\"\n      resources:\n      - name: \"cpu\"\n        nominalQuota: 6\n  - coveredResources: [\"memory\"]\n    flavors:\n    - name: \"default-flavor\"\n      resources:\n      - name: \"memory\"\n        nominalQuota: 24Gi\n```\n\nRead it like this:\n\n✅ team-a-cq has a borrowingLimit of 0 on the arm flavor, so its arm capacity is truly dedicated and cannot be borrowed away.\n\n✅ team-a-cq has a nominalQuota of 0 on the x86 flavor, so it has no x86 of its own and can only borrow x86 from shared-cq.\n\nThis pattern is great for giving each team a guaranteed slice while still pooling the expensive shared hardware. 🤝\n\nYou are not limited to Jobs. Kueue can manage plain Pods too. Just add the queue-name label and resource requests.\n\n```\n# kueue-sleep-pod.yaml\napiVersion: v1\nkind: Pod\nmetadata:\n  generateName: kueue-sleep-\n  namespace: default\n  labels:\n    kueue.x-k8s.io/queue-name: user-queue\nspec:\n  containers:\n  - name: sleep\n    image: busybox\n    command:\n    - sleep\n    args:\n    - 3s\n    resources:\n      requests:\n        cpu: 3\n  restartPolicy: OnFailure\n```\n\nKueue injects a kueue.x-k8s.io/managed=true label to mark the pods it manages. The same label driven approach works for Deployments, StatefulSets, RayJobs, JobSets, and Kubeflow jobs as well. 🧰\n\nKueue ships a kubectl plugin so you can manage queues without writing kubectl get workloads.kueue.x-k8s.io every time. Once installed, you get handy commands.\n\n```\n# List workloads in a namespace\nkubectl kueue list workload\n\n# Stop a workload (it stays in the queue but will not be admitted)\nkubectl kueue stop workload sample-job-xxxxx\n\n# Resume it later\nkubectl kueue resume workload sample-job-xxxxx\n```\n\nIt also covers create, delete, describe, edit, get, and patch for clusterqueues, localqueues, resourceflavors, and workloads. A nice quality of life upgrade for operators. 🧑🔧\n\nHere is the part that makes finance happy. 🤑\n\nGPU and accelerator nodes are expensive, and they are often the scarcest resource in the cluster. The worst outcome is a job that partially grabs a few GPUs, then waits forever for the rest while those GPUs sit idle and billed.\n\nWith Kueue you get:\n\n✅ Quota governance so no single team hoards the accelerators\n\n✅ Admission only when the capacity a job needs is available\n\n✅ Priority so production training is admitted before throwaway experiments\n\n✅ Borrowing so idle quota is actually used instead of wasted\n\nThat combination is exactly why Kueue is becoming a key building block for running AI and ML workloads on Kubernetes at scale. 🚀\n\n✅ Always set resource requests on your pods. If you only set limits, Kueue treats the limits as requests. If you set neither, quota accounting cannot work.\n\n✅ The queue-name label must point to a LocalQueue that exists in the same namespace as the job.\n\n✅ The flavor names in the ClusterQueue must match your ResourceFlavor names exactly.\n\n✅ Elastic jobs are alpha, so remember to enable the ElasticJobsViaWorkloadSlices feature gate.\n\n✅ Stick to one API version. This guide uses v1beta2, which is the current served version in v0.18.1.\n\nKueue takes Kubernetes from I will place pods wherever and whenever to I will admit jobs in a fair, prioritized, quota aware order. For batch, data, and AI workloads that is a huge upgrade, and it costs you almost nothing to adopt since your jobs only need one extra label. 🙌\n\nTo recap the flow:\n\n✅ Install Kueue\n\n✅ Create a ResourceFlavor\n\n✅ Create a ClusterQueue with your quota\n\n✅ Create a LocalQueue per namespace\n\n✅ Add the queue-name label to your jobs\n\n✅ Layer on priority, partial admission, elastic scaling, and cohorts as you grow\n\nGive it a spin on a small cluster first, watch the Workload objects, and you will quickly get a feel for how admission works.\n\nWhat is next? I am going to bring an AI agent into my own homelab cluster and show the full setup, so stay tuned for that one. 🤖🏡\n\nHappy queueing and stay safe! 👋", "url": "https://wpnews.pro/news/meet-kueue-smart-job-queueing-for-kubernetes", "canonical_source": "https://dev.to/hkhelil/meet-kueue-smart-job-queueing-for-kubernetes-3gj", "published_at": "2026-06-30 06:32:30+00:00", "updated_at": "2026-06-30 06:48:54.212199+00:00", "lang": "en", "topics": ["developer-tools", "machine-learning", "artificial-intelligence", "mlops", "ai-infrastructure"], "entities": ["Kueue", "Kubernetes", "kubernetes-sigs", "GPU"], "alternates": {"html": "https://wpnews.pro/news/meet-kueue-smart-job-queueing-for-kubernetes", "markdown": "https://wpnews.pro/news/meet-kueue-smart-job-queueing-for-kubernetes.md", "text": "https://wpnews.pro/news/meet-kueue-smart-job-queueing-for-kubernetes.txt", "jsonld": "https://wpnews.pro/news/meet-kueue-smart-job-queueing-for-kubernetes.jsonld"}}