{"slug": "spot-instances-as-github-actions-runners", "title": "Spot instances as GitHub Actions runners", "summary": "This article explains how the author's team reduced CI costs by replacing GitHub's managed runners with self-hosted EC2 spot instances, achieving a roughly 4x cost savings ($160 vs. $647 for 1,350 hours of compute). The setup uses the open-source `terraform-aws-github-runner` module to create ephemeral, single-job spot instances that automatically terminate after each workflow run. The team organizes runners into three tiers (small, medium, large) with different instance types and labels, allowing workflows to select the appropriate hardware while maintaining VPC network access and instance flexibility.", "body_md": "Part 1 was Jenkins as code with ephemeral workers. Part 2 was macOS. This one moves a chunk of the CI workload off Jenkins entirely - onto GitHub Actions, with EC2 spot as the runner fleet.\nJenkins isn't dead here. It still handles the big builds - macOS, Windows, anything that runs for hours or needs custom orchestration. GitHub Actions runs alongside it for the workloads where it fits better.\nThis post covers the self-hosted spot runner pattern: how to point GitHub Actions at your own ephemeral EC2 fleet, and what bites once you do.\nGitHub's managed runners are fine for small teams. A few things push you toward self-hosted:\n1. Cost at volume. GitHub bills its managed Linux runners at $0.008/minute ($0.48/hour). Fine for a few builds a day. Last month we ran 80,887 runner-minutes across 29,347 jobs (~1,350 hours). On managed runners that would have been ~$647. Our actual EC2 bill for the runner fleet was ~$160 - $130 on spot, $28 EC2-Other (EBS, ENIs, data transfer). Roughly 4x cheaper, and the gap widens the more you run.\n2. Instance shape. Managed runners come in fixed sizes. If your build wants 16 vCPUs and 64 GB of RAM, or a GPU, or arm64, you're either paying for the largest tier or stuck. Self-hosted lets you pick the EC2 type the build needs.\n3. Network access. Builds that talk to private resources - internal artifact registries, RDS, anything behind a VPC - are awkward on managed runners. Self-hosted runners live inside your VPC, no proxies or tunnels.\nThe bill got us to try it. The VPC access and instance flexibility were bonuses.\nA self-hosted GitHub Actions runner is a small agent that registers with a GitHub repo or org, polls for jobs matching its labels, runs them, reports back. It doesn't care where it lives - bare metal, VM, container, anything that runs the binary.\nTwo flavors:\nWe picked ephemeral. A long-lived self-hosted runner means babysitting a host, build pollution from a shared agent, and a security blast radius that never closes.\nSo every job gets its own EC2 spot instance from a Packer-baked AMI. Job runs, instance terminates. Same one-build-per-worker pattern as the Jenkins fleet, different control plane.\nThere's no single \"self-hosted spot runner\" service. You stitch it yourself, or grab one of the open-source modules that stitch it for you. I went with terraform-aws-github-runner - most battle-tested of the bunch, drops into a Terraform-managed AWS account without much ceremony. (If you remember it as philips-labs/terraform-aws-github-runner\n- same project, moved to the github-aws-runners\norg.)\nWhen someone opens a PR that triggers a workflow:\nworkflow_job\nwebhook the moment a job is queued.--ephemeral\n, so the agent exits after one job. A scale-down Lambda comes by on a schedule and reaps anything left over.The Lambdas, SQS queues, IAM - all of it lives inside the module. You write the Terraform that declares which runners exist, which AMI they use, which instance types are eligible.\n📌 IMAGE TODO (architecture): linear flow left-to-right - GitHub webhook → API Gateway + Lambda → SQS → Scale-up Lambda → EC2 spot instance (Packer AMI) → Job runs → Instance terminates. One dashed loop labeled \"scale-down Lambda watches and cleans up\". Hand-drawn excalidraw style to match Parts 1 and 2.\nThe setup earns its keep once you split runners into tiers by label. Workflows pick a tier with runs-on:\nin their YAML.\nWe run three:\nt3.medium\n/ m5.large\n. Linters, formatters, doc builds. Cheap, spot capacity is never a problem, spawns fast.m5.xlarge\n/ c5.xlarge\n. The typical build/test workflow.c7a.4xlarge\n/ c8a.8xlarge\n. Compile-heavy builds, big test suites, anything that scales with cores.Each tier is its own call to the same Terraform module with different labels and instance-type lists. Sanitized:\nmodule \"github-runners\" {\nsource = \"github-aws-runners/github-runner/aws//modules/multi-runner\"\nversion = \"~> 6.0\"\nmulti_runner_config = {\n\"linux-x64-small\" = {\nrunner_config = {\nrunner_extra_labels = \"linux,x64,small\"\ninstance_types = local.default_instances\nami_filter = { name = [\"*ci-runner-x64*\"] }\nenable_ephemeral_runners = true\nenable_spot_instances = true\n}\n}\n\"linux-x64-compute-intensive\" = {\nrunner_config = {\nrunner_extra_labels = \"linux,x64,compute-intensive\"\ninstance_types = local.compute_intensive\nami_filter = { name = [\"*ci-runner-x64*\"] }\nenable_ephemeral_runners = true\nenable_spot_instances = true\n}\n}\n# ...and so on for the other tiers\n}\n# Common stuff: webhook secret, GitHub app credentials, VPC config, etc.\ngithub_app = { ... }\nwebhook_secret = random_id.webhook_secret.hex\nvpc_id = module.vpc.vpc_id\nsubnet_ids = module.vpc.private_subnets\n}\nPicking the tier in a workflow:\njobs:\nbuild:\nruns-on: [self-hosted, linux, x64, compute-intensive]\nsteps:\n- uses: actions/checkout@v4\n- run: make build\nGitHub matches the runs-on\narray against runner labels and picks any registered runner that has them all. The Lambda only spawns instances on demand, so an unused tier costs nothing.\nPure on-demand scaling looks great on paper: zero idle runners, pay only when a job runs, Lambda spawns exactly when GitHub queues something. Two things spoil it.\nThe morning-rush problem. The first PRs of the day all queue around the time people log in. On pure on-demand, every one eats the full cold-start (~60-120s from queue to running). A dozen devs hitting \"push\" at 9am and you have a backlog people notice.\nThe 3am problem. Even on spot, idle runners cost something - EBS volumes on the warm AMIs, plus the always-on orchestration Lambdas. Outside business hours the queue is mostly empty. No point keeping capacity hot.\nThe module handles both with idle pools and scheduled scaling.\nWhat we do:\nThe Terraform for it (sanitized, per-tier):\nrunner_config = {\n# ... labels, instance_types, ami_filter as before ...\nenable_ephemeral_runners = true\nenable_spot_instances = true\n# Warm pool - kept at this size during the cron windows below.\nidle_config = [\n{\ncron = \"0 8 * * MON-FRI\" # ramp up at 08:00 weekdays\ntimeZone = \"Europe/Berlin\"\nidleCount = 3\n},\n{\ncron = \"0 20 * * MON-FRI\" # ramp down at 20:00 weekdays\ntimeZone = \"Europe/Berlin\"\nidleCount = 0\n},\n]\n}\nA few notes:\nDuring working hours the dev experience is about as snappy as managed runners. Outside, the bill is close to zero.\n📌 IMAGE TODO (warm-pool schedule): a 24-hour timeline showing pool size. Flat at 0 from 00:00-08:00, ramps up to N runners at 08:00, stays at N until 20:00, drops to 0 for the night. Weekend bar shows pool=0 all day. Annotate \"cold-start hides here\" inside the 08:00-20:00 band.\nSame idea as Part 1: whatever the runner needs - language runtimes, build tools, Docker, cached dependencies - goes into a Packer AMI ahead of time. The AMI is versioned, lives in your AWS account, and is referenced by the runner's ami_filter\n.\nFor GitHub Actions the image is usually lighter than the Jenkins worker AMIs - GHA workflows install most of their tooling at runtime via setup-node\n, setup-python\n, setup-java\n. So the base image just needs:\ndocker build\n/ docker run\n).git\n, curl\n, jq\n, unzip\n.The image ends up 5-10 GB. A fresh spot instance pulls it and starts running in well under a minute.\nFirst question everyone asks about spot for CI: what happens when AWS yanks the instance mid-build?\nThe build fails. GitHub re-queues. A fresh instance picks it up. No recovery dance to write - the runner is ephemeral and the workflow should be idempotent anyway. You lose one partial build, the retry runs from scratch.\nSpot gives you 2 minutes of warning before termination. The runner listens for that and de-registers from GitHub cleanly before shutdown - the module wires this up for you. Without it, GitHub briefly shows a \"runner went offline mid-job\" error before the retry. Ugly but not fatal.\nIn practice we see 1-3% interruption on the default and large tiers, a bit higher on compute-intensive (larger instance types have less spot capacity per AZ). For most workloads that's a fine trade. For workflows that can't tolerate a retry - release builds, deploys with side effects - I either flip enable_spot_instances = false\nfor that tier (same module) or ship them over to Jenkins, where the lifecycle is more controlled.\n\"Should this run on Jenkins or GitHub Actions?\" comes up a lot. How I think about it:\nThe two systems don't overlap, they cover different shapes of work. Moving everything to GitHub Actions would have been a mistake. Moving the small PR-scoped stuff off Jenkins freed up real capacity for the big jobs.\nSame pattern across all three parts: ephemeral workers, baked images, orchestrated from git, secrets from a vault. Jenkins is one way to wire it up. GitHub Actions on self-hosted spot is another. You don't have to pick one.\nThe part you can't get wrong is the worker lifecycle - don't keep workers between builds. After that, the rest (Jenkins or GHA, spot or on-demand, Tart or vSphere) is swappable.\nThat's where the Odyssey wraps for now. Three posts. Same advice underneath: stop clicking, bake your images, don't reuse workers. If it saves you a week, worth it.\nphilips-labs/terraform-aws-github-runner\n.)Part 3 of My CI/CD Odyssey. Thanks for reading. If you run self-hosted CI differently, I'd be curious to hear about it in the comments.", "url": "https://wpnews.pro/news/spot-instances-as-github-actions-runners", "canonical_source": "https://dev.to/lanycrost/spot-instances-as-github-actions-runners-h19", "published_at": "2026-05-23 09:04:32+00:00", "updated_at": "2026-05-23 09:32:57.179125+00:00", "lang": "en", "topics": ["cloud-computing", "developer-tools", "enterprise-software"], "entities": ["GitHub Actions", "EC2", "Jenkins", "AWS"], "alternates": {"html": "https://wpnews.pro/news/spot-instances-as-github-actions-runners", "markdown": "https://wpnews.pro/news/spot-instances-as-github-actions-runners.md", "text": "https://wpnews.pro/news/spot-instances-as-github-actions-runners.txt", "jsonld": "https://wpnews.pro/news/spot-instances-as-github-actions-runners.jsonld"}}