{"slug": "zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate", "title": "Zero-Downtime Blue-Green and IP-Based Canary Deployments on ECS Fargate", "summary": "This article describes a Terraform-driven, zero-downtime deployment workflow for ECS Fargate that uses ALB listener rules based on source IP addresses to route internal team traffic to a GREEN environment while public users continue hitting the BLUE environment. This approach allows QA and internal teams to validate new releases on the actual production infrastructure and URL before public rollout, without relying on CodeDeploy, DNS switching, or duplicate infrastructure. The deployment is controlled by simple boolean variables (`enable_canary`, `activate_canary`, `promote_to_all`) that manage the lifecycle of the BLUE and GREEN environments.", "body_md": "Most ECS blue-green deployment tutorials eventually lead to the same stack:\nAnd while CodeDeploy works, I kept running into one practical limitation during real deployments:\nI couldn’t let my internal team validate a new release on the actual production URL before exposing it to customers.\nThat became the entire motivation behind this setup.\nI didn’t want:\nI wanted something much simpler:\nSo I built a Terraform-driven deployment workflow using:\nwithout using CodeDeploy.\nAfter running this setup in practice, I ended up preferring it for many ECS workloads.\nBoth BLUE and GREEN environments run behind the same ALB.\nInternal office/VPN IPs get routed to GREEN first.\nEveryone else continues hitting BLUE.\nThat means QA and internal teams can validate the new release directly on the real production infrastructure before public rollout begins.\nSame:\nNo “staging surprises” later.\nA lot of deployment issues only appear on the real production routing path.\nInternal users open:\nhttps://nginx.jayakrishnayadav.cloud\n…and immediately see the GREEN version.\nMeanwhile, public users continue seeing BLUE.\nNo DNS switching.\nNo duplicate infrastructure.\nJust ALB listener routing.\nThe deployment flow looks like this:\n┌────────────────────┐\n│ Application LB │\n└─────────┬──────────┘\n│\n┌────────────────┴────────────────┐\n│ │\nInternal Office/VPN IPs Public Users\n│ │\n▼ ▼\nGREEN Target Group BLUE Target Group\n│ │\nECS GREEN Tasks ECS BLUE Tasks\nThe canary routing rule gets evaluated first.\nIf the request source IP matches internal CIDRs, traffic goes to GREEN.\nEverything else falls back to BLUE.\nI kept the Terraform layout modular so it could be reused across multiple services.\n.\n├── main.tf\n├── variables.tf\n├── outputs.tf\n├── env/\n│ ├── backend.hcl\n│ └── terraform.tfvars\n├── modules/\n│ ├── vpc/\n│ ├── iam/\n│ ├── alb/\n│ ├── ecs-cluster/\n│ └── ecs-blue-green-service/\n└── scripts/\n└── zero-downtime-test.sh\nEach ECS service gets:\nThe entire deployment behavior depends on ALB listener priorities.\nThe canary listener rule gets evaluated first.\nIf the request source IP matches internal CIDRs, traffic gets forwarded to GREEN.\nresource \"aws_lb_listener_rule\" \"canary\" {\ncount = var.activate_canary ? 1 : 0\npriority = 99\ncondition {\nsource_ip {\nvalues = var.canary_source_ips\n}\n}\ncondition {\nhost_header {\nvalues = [\"nginx.jayakrishnayadav.cloud\"]\n}\n}\naction {\ntype = \"forward\"\ntarget_group_arn = aws_lb_target_group.green.arn\n}\n}\nThe production rule remains below it:\nresource \"aws_lb_listener_rule\" \"production\" {\npriority = 100\ncondition {\nhost_header {\nvalues = [\"nginx.jayakrishnayadav.cloud\"]\n}\n}\naction {\ntype = \"forward\"\ntarget_group_arn = local.active_target_group\n}\n}\nThat’s it.\nNo weighted routing.\nNo lifecycle hooks.\nJust listener priorities.\nThis wasn’t built as a theoretical architecture exercise.\nI tested the rollout flow directly from Terraform while continuously validating traffic behavior against live ECS Fargate services.\nTerraform initialization:\nterraform init -backend-config=env/backend.hcl\nDeployment apply:\nterraform apply \\\n-var-file=env/terraform.tfvars \\\n-lock=false \\\n-auto-approve\nDuring canary validation, I continuously verified my public IP:\ncurl ifconfig.me\nThat mattered because the ALB source-IP rule decides whether traffic reaches:\nOnce my IP matched the configured canary CIDRs, traffic immediately started routing to GREEN.\nThe nice part about this setup is that everything becomes variable-driven.\nBLUE handles all production traffic.\nGREEN remains scaled down.\nenable_canary = false\nactivate_canary = false\npromote_to_all = false\nApply:\nterraform apply \\\n-var-file=env/terraform.tfvars \\\n-lock=false \\\n-auto-approve\nResult:\nNow we start the GREEN environment.\nenable_canary = true\nactivate_canary = false\npromote_to_all = false\nApply again:\nterraform apply \\\n-var-file=env/terraform.tfvars \\\n-lock=false \\\n-auto-approve\nAt this stage:\nUsers never hit partially starting containers.\nNow we enable canary routing.\nenable_canary = true\nactivate_canary = true\npromote_to_all = false\nApply again:\nterraform apply \\\n-var-file=env/terraform.tfvars \\\n-lock=false \\\n-auto-approve\nNow:\nThis became the most valuable phase of the deployment workflow.\nBecause now:\nwhile customers remain completely unaffected.\nThis is the ALB listener rules view while canary routing is enabled.\nThe priority 99 rule matches internal source IPs and forwards them to GREEN, while everyone else continues hitting BLUE.\nOnce validation looks good:\nenable_canary = true\nactivate_canary = false\npromote_to_all = true\nApply again:\nterraform apply \\\n-var-file=env/terraform.tfvars \\\n-lock=false \\\n-auto-approve\nNow:\nNo downtime occurs.\nTraffic simply moves from one target group to another.\nI didn’t want to assume the deployment was safe.\nI wanted to verify it continuously during rollout.\nSo I used a simple curl-based validation script that continuously hit both applications while traffic shifted between BLUE and GREEN.\nfor i in {1..100}\ndo\nfor url in \\\n\"https://nginx.jayakrishnayadav.cloud/\" \\\n\"https://apache.jayakrishnayadav.cloud/\"\ndo\nresponse=$(curl -k -s -w \" HTTPSTATUS:%{http_code}\" \"$url\")\nbody=${response% HTTPSTATUS:*}\nstatus=${response##*HTTPSTATUS:}\nif [[ $body == *\"BLUE - v\"* ]]; then\ncolor=\"BLUE\"\nelif [[ $body == *\"GREEN - v\"* ]]; then\ncolor=\"GREEN\"\nelse\ncolor=\"UNKNOWN\"\nfi\necho \"Run: $i | URL: $url | Status: $status | Version: $color\"\ndone\ndone\nOutput during deployment:\nYou can clearly see:\nThat confirmed the deployment was genuinely zero downtime.\nAfter promotion:\nClean and simple.\nRollback became extremely simple.\nI just reverted the Terraform variables:\nenable_canary = false\nactivate_canary = false\npromote_to_all = false\nApply Terraform again:\nterraform apply \\\n-var-file=env/terraform.tfvars \\\n-lock=false \\\n-auto-approve\nALB immediately routes traffic back to BLUE.\nThe rollback process stays predictable because traffic switching is entirely controlled through ALB listener rules.\nThe ALB uses ACM certificates for HTTPS.\nListeners:\nExample:\ntest_listener_allowed_cidrs = [\n\"160.30.39.198/32\"\n]\nThat keeps internal preview traffic private while still using the same production infrastructure.\nOne thing I specifically wanted to avoid was permanently doubling infrastructure cost.\nNormal state:\nDeployment window:\nAfter promotion:\nSo infrastructure cost only increases briefly during deployments.\nThis project started because I wanted a very practical deployment workflow:\nInternal users should validate the new version on the actual production URL before customers ever see it.\nOnce I implemented that using ALB listener priorities and source IP routing, I realized I no longer really needed CodeDeploy for this workflow.\nThe end result became:\nAnd because everything is Terraform-driven, the deployment process stays reproducible and predictable.\nFull Terraform implementation:\nhttps://github.com/jayakrishnayadav24/ecs-blue-green-deployment/tree/canary", "url": "https://wpnews.pro/news/zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate", "canonical_source": "https://dev.to/aws-builders/zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate-4ea8", "published_at": "2026-05-23 08:36:49+00:00", "updated_at": "2026-05-23 09:03:48.879728+00:00", "lang": "en", "topics": ["cloud-computing", "developer-tools", "enterprise-software"], "entities": ["ECS Fargate", "CodeDeploy", "ALB", "Terraform", "AWS"], "alternates": {"html": "https://wpnews.pro/news/zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate", "markdown": "https://wpnews.pro/news/zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate.md", "text": "https://wpnews.pro/news/zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate.txt", "jsonld": "https://wpnews.pro/news/zero-downtime-blue-green-and-ip-based-canary-deployments-on-ecs-fargate.jsonld"}}