{"slug": "how-to-control-cloudwatch-logs-costs-on-ecs", "title": "How to Control CloudWatch Logs Costs on ECS", "summary": "An engineer at Fortem.dev has published a four-step guide to controlling runaway CloudWatch Logs costs on Amazon ECS, where default settings create log groups with \"Never Expire\" retention and INFO-level verbosity. The fix includes setting retention policies to 30 days, reducing log levels from INFO to WARN, and using on-demand CloudWatch Logs Insights queries instead of streaming everything to Datadog, which can cut storage costs by 60-80%. The guide also provides a skill file for AI agents like Claude Code to automatically scan and optimize CloudWatch log groups.", "body_md": "Your AWS bill shows CloudWatch at $400 this month. You have 15 ECS services logging INFO-level with retention set to Never Expire. You didn't configure this — ECS did it by default.\n\nThe fix takes 4 steps.\n\nECS uses the `awslogs`\n\ndriver by default. Every container's stdout goes to CloudWatch. ECS creates log groups with no retention policy — **Never Expire** — so logs accumulate forever.\n\nHere's what that looks like for a typical 15-service fleet:\n\n| Cost component | 15 services, INFO level, 3 GB/day |\n|---|---|\n| Ingestion ($0.50/GB) | $45/mo |\n| Storage ($0.03/GB/month) | $54/mo (grows every month) |\n| Insights queries ($0.005/GB) | $36/mo (5 queries/day) |\nTotal |\n$135/mo |\n\nThree separate charges on the same data. Ingestion is pay-what-you-send. Storage is pay-what-you-keep. Insights is pay-what-you-scan. ECS defaults mean you pay all three — with no upper bound — on every log line your application prints.\n\nThere's a skill file at fortem.dev that an AI agent (Claude Code, OpenCode, Codex) can run for you. It scans your CloudWatch log groups, finds the ones bleeding money, and optionally fixes them — all read-only by default, changes only with your confirmation.\n\n[Get the CloudWatch Cost Optimizer skill file → fortem.dev/blog/cloudwatch-costs-ecs](https://fortem.dev/blog/cloudwatch-costs-ecs)\n\nThe agent runs locally against your AWS account. No data leaves your machine.\n\nOne Terraform line — `retention_in_days = 30`\n\n— cuts storage cost by 60-80%. This single change has the biggest impact of any step in this guide.\n\n**Find groups without retention:**\n\n```\naws logs describe-log-groups \\\n    --query 'logGroups[?retentionInDays==`null`].[logGroupName,storedBytes]' \\\n    --output table\n```\n\n**Set 30-day retention:**\n\n```\naws logs put-retention-policy \\\n    --log-group-name \"/aws/ecs/your-service\" \\\n    --retention-in-days 30\n```\n\n**Terraform:**\n\n```\nresource \"aws_cloudwatch_log_group\" \"ecs_service\" {\n  name              = \"/ecs/${var.env_prefix}-${var.service_name}\"\n  retention_in_days = 30  # was null (Never Expire)\n}\n```\n\n**Recommended retention by environment:**\n\n| Environment | Retention | Why |\n|---|---|---|\n| Production | 90 days | Compliance + incident investigation |\n| Staging | 30 days | Recent deploy history |\n| Dev / QA | 7 days | Active development only |\n| CI/CD / Build | 1 day | Don't store ephemeral build logs |\n\nSpring Boot, Express, Django — they all default to INFO. In practice, an INFO-level web server generates one to two orders of magnitude more log volume than the same server at WARN. Switch production to WARN.\n\n```\n# Find which services generate the most log volume (last 7 days)\naws logs start-query \\\n    --log-group-name \"/aws/ecs/prod-api\" \\\n    --start-time $(date -v-7d +%s) \\\n    --end-time $(date +%s) \\\n    --query-string \"stats count() by @logStream | sort count desc | limit 10\"\n\n# Set log level by framework:\n# Spring Boot: logging.level.root=WARN in application.properties\n# Express: LOG_LEVEL=warn\n# Django: LOGGING['root']['level'] = 'WARNING'\n```\n\n\"CloudWatch Logs charges $0.50 per GB ingested, $0.03 per GB stored per month, and $0.005 per GB scanned by Logs Insights queries — beyond the 5 GB/month free tier.\" — aws.amazon.com/cloudwatch/pricing, verified June 2026\n\nStreaming everything to Datadog adds an indexing cost on top of ingestion. Once you index for search — which is the point — the combined cost per GB is several times CloudWatch's ingest + storage combined.\n\nFor debugging, use CloudWatch Logs Insights instead — query on demand at $0.005/GB scanned, not per GB indexed.\n\n```\n# Find errors in the last hour\naws logs start-query \\\n    --log-group-name \"/aws/ecs/prod-api\" \\\n    --start-time $(date -v-1H +%s) \\\n    --end-time $(date +%s) \\\n    --query-string \"fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50\"\n\n# For compliance: send to S3 instead (cheap, durable)\naws logs put-subscription-filter \\\n    --log-group-name \"/aws/ecs/prod-api\" \\\n    --filter-name \"AllToS3\" \\\n    --filter-pattern \"\" \\\n    --destination-arn \"arn:aws:firehose:...\"\n```\n\nYou know CloudWatch is $400. You don't know which of your 15 services is responsible for $300 of it. This Insights query tells you in 5 minutes.\n\n```\naws logs start-query \\\n    --log-group-name \"/aws/ecs/prod-api\" \\\n    --start-time $(date -v-7d +%s) \\\n    --end-time $(date +%s) \\\n    --query-string \"stats sum(strlen(@message)) as totalBytes by @logStream | sort totalBytes desc | limit 10\"\n```\n\nOnce you know the top offender, check three things: (1) log level, (2) whether it logs stack traces on every request, (3) whether it logs health check pings. Those three fix 90% of high-volume log problems.\n\n**Will reducing log retention affect my ability to debug?**\n\nFor production: 90 days covers both incident response and compliance. For dev/staging: 7 days — if you haven't debugged it in a week, the logs won't help. You can always increase retention temporarily during an incident.\n\n**Can I use a different log driver instead of CloudWatch?**\n\nYes — ECS supports `awsfirelens`\n\n(20+ destinations), `fluentd`\n\n, and Splunk. But switching the driver doesn't reduce costs — it moves them. CloudWatch with retention set and log-level filtering is often the cheapest option because you're already in the AWS ecosystem.\n\n**How do I estimate my CloudWatch costs before the bill arrives?**\n\nCloudWatch Metrics → Logs → `IncomingBytes`\n\nand `StoredBytes`\n\n. Multiply `IncomingBytes`\n\nby $0.50/GB for ingestion. Multiply `StoredBytes`\n\nby $0.03/GB for storage. Most importantly: count how many log groups have `retentionInDays = null`\n\n(Never Expire) — those are silently accumulating.\n\n**Can I set retention globally across all log groups?**\n\nNo single command sets retention for all groups. Use the CLI loop approach above, or add `retention_in_days`\n\nto every `aws_cloudwatch_log_group`\n\nresource in Terraform. AWS does not offer a global retention default.\n\n**Does CloudWatch Logs Insights query cost depend on retention?**\n\nNo — Insights costs $0.005 per GB scanned regardless of data age. Shorter retention means less data to scan, so queries cost proportionally less. A 30-day log group has 1/12th the data of a 365-day group.\n\n*Full article with downloadable skill file: fortem.dev/blog/cloudwatch-costs-ecs*", "url": "https://wpnews.pro/news/how-to-control-cloudwatch-logs-costs-on-ecs", "canonical_source": "https://dev.to/dspv/how-to-control-cloudwatch-logs-costs-on-ecs-5592", "published_at": "2026-06-12 08:20:16+00:00", "updated_at": "2026-06-12 08:42:11.947375+00:00", "lang": "en", "topics": ["ai-tools", "ai-agents", "mlops"], "entities": ["AWS", "CloudWatch", "ECS", "Claude Code", "OpenCode", "Codex", "fortem.dev", "Terraform"], "alternates": {"html": "https://wpnews.pro/news/how-to-control-cloudwatch-logs-costs-on-ecs", "markdown": "https://wpnews.pro/news/how-to-control-cloudwatch-logs-costs-on-ecs.md", "text": "https://wpnews.pro/news/how-to-control-cloudwatch-logs-costs-on-ecs.txt", "jsonld": "https://wpnews.pro/news/how-to-control-cloudwatch-logs-costs-on-ecs.jsonld"}}