9 in 10 Docker Compose files skip the basic security flags

According to the article, a security audit of 6,444 public Docker Compose files found that approximately 90% of them are missing three basic security flags: `read_only: true`, `cap_drop: [ALL]`, and `no-new-privileges`. The author attributes this widespread omission not to engineer carelessness, but to Docker Compose shipping with these hardening features switched off by default, with few examples or tutorials demonstrating their use. The findings are rated as "MEDIUM" severity because they represent missing controls rather than active misuse, though they are explicitly recommended by the CIS Docker Benchmark and OWASP Docker Security Cheat Sheet.

I created compose-lint, a security linter for Docker Compose files, and pointed it at 6,444 public docker-compose.yml and compose.yaml files from GitHub. More on why below. Three numbers stood out: I don't read this as engineers being careless. It's about defaults. Docker Compose ships with the hardening switched off, almost nobody turns it on, and the examples people learn from don't either. By day I lead a team of security engineers at a large financial institution, where Compose barely comes up. Production runs on Kubernetes and ECS, both with mature security tooling around them. At home in my lab, though, Compose is the right tool: quick, low-ceremony, enough to stand up a stack on a Saturday. What bugged me was the asymmetry. Kubernetes and Terraform have a deep bench of scanners: Checkov, Trivy, kube-bench, Kubescape. Compose is an afterthought in most of them. The Compose-specific tools I found solved adjacent problems instead. Hadolint lints Dockerfiles, not Compose files. dclint checks Compose structure and style, not security. What I wanted was simple: a zero-config, OWASP/CIS-grounded linter I could drop into CI and run against my own stacks. So I wrote one. Then I got curious whether the stuff I kept fixing in my own files showed up everywhere else. It does. That's what this writeup is about, and I'm putting the tool out there in case it's useful to anyone who builds the way I do. Personal project; the views here are my own, not my employer's. Here's what I found. I split the files into four tiers. A single "X% of Compose files do Y" number is misleading when it averages a polished Bitnami example against someone's half-finished homelab: canonical 327 files — official upstream examples: awesome-compose, Bitnami, Grafana, Vaultwarden. The stuff READMEs tell you to copy-paste. popular 3,977 — repos with ≥50 stars and a recent Compose file. Production-adjacent code. selfhosted 588 — app-store / template registries: CasaOS, runtipi, Dockge. Home-LAN threat model. longtail 1,552 — a stratified sweep of GitHub code search. The median file in the wild. Every file goes through the same rule set compose-lint 0.7.0 , and every rule is grounded in the OWASP Docker Security Cheat Sheet or the CIS Docker Benchmark. The full methodology, including what the study deliberately doesn't claim, is in the canonical report. Three findings fire on roughly 90% of every file in the corpus: read only: true missing — 91%cap drop: ALL missing — 91%no-new-privileges missing — 90%They're rated MEDIUM, not CRITICAL, because each one is a missing control rather than active misuse. That's also what makes them interesting. The Compose hardening triple is almost never set. The fix takes about 30 seconds per service: services: app: image: nginx:1.27@sha256:... pin a digest, not just a tag read only: true CL-0007 cap drop: ALL CL-0006 security opt: - no-new-privileges:true CL-0003 Add a tmpfs: for whatever paths your app writes to and you've cleared the three most common findings in the corpus. "Aren't these just optional config choices, not real vulnerabilities?" Mostly fair, and it's worth being exact about what a "finding" is here. The linter isn't claiming anything was exploited. It's flagging that a file leaves a recommended hardening control unset. Whether that matters is a judgment call that belongs to the org or the engineer, based on their context and risk tolerance. What the linter takes off your plate is the "did I even know this control existed?" part. read only , cap drop: ALL , and no-new-privileges aren't my preferences about tidy YAML; they're named controls in the CIS Docker Benchmark and the OWASP Docker Security Cheat Sheet. A finding means the config diverges from that published baseline. Closing the gap or accepting it is your call. You'd expect the official examples to be the hardened ones. Being copied is their entire job. They're the cleanest tier in the corpus, and they still come in at 83%: A few things jump out: That last point is most of the story. The examples teach the unhardened shape, and the shape propagates. A fair caveat here, especially if you're running a homelab: threat model matters. A single-user box behind a firewall and Tailscale is a different risk calculus than something exposed to the internet, and a finding is usually something to decide about rather than an emergency. Start with what actually bites. A mounted Docker socket is full host takeover whether or not you meant to expose it, so fix those first and treat the MEDIUM pile as gradual cleanup. It's why the CI gate defaults to fail-on: high . This is the one I didn't expect. In the long-tail tier, 9.6% of files don't parse as a valid Compose file at all, against well under 1% everywhere else: And it's almost never broken YAML. It's shape errors: people writing services as a dictionary of strings. What a lot of people write does not parse : services: nginx: nginx:1.27 What Compose actually wants: services: nginx: image: nginx:1.27 A file that doesn't parse with a real Compose engine was never deployed by one. So these are docs snippets, tutorial follow-alongs, half-finished first drafts. None of them are getting linted before they ship, and the parse-error rate is really a map of where unreviewed config piles up. If you're wondering: "long tail" here means the low-visibility mass of ordinary repos, not a statistical distribution tail. One more chart, because the severity mix surprises people: Nearly four out of five findings are MEDIUM. That's the hardening-triple misses from Finding 1, which hit almost every file and are MEDIUM by design. CRITICAL findings are rarer but real: a mounted Docker socket, cap add: ALL , a bind-mounted / . The Docker socket one alone, which is full host takeover, shows up on 6.4% of parsed files and 8% in the popular tier. LOW is almost empty, and that's by construction. Only one of compose-lint's 21 rules is LOW a healthcheck someone explicitly turned off , because the tool's whole scope is security misconfiguration, where the floor is MEDIUM. So read "0.0% LOW" as a fact about the tool, not as the small stuff being fine. Because Docker optimizes for "it runs the first time." A writable filesystem, the full capability set, privilege escalation left on: that's the path of least surprise. Your container starts, your app works. Hardening is opt-in, and opting in means knowing the control exists, confirming your app still works without that capability or that write access, and adding a few lines per service. And "knowing it exists" is the hard part. There's no single secure-Compose baseline to copy from. The controls are scattered across the Compose spec, the Docker run reference, the OWASP Docker Security Cheat Sheet, and the CIS benchmark. You'd have to already know that no-new-privileges is a thing, that cap drop: ALL goes before a targeted cap add , that read only: true usually needs a tmpfs for whatever your app writes. Most people writing a Compose file aren't container-security specialists. They want their stack up. Expecting everyone to carry that whole surface in their head is how you end up with a 91% finding rate. A linter flips it around: you don't memorize anything, you fix the line it points at and read why. And it compounds. The examples never opt in, so the next person copies the unhardened shape, ships it, and becomes the next example someone copies. The corpus is that loop at scale. None of it is exotic. It's the accumulated weight of a sensible default that nobody goes back to revisit. A few things I'm explicitly not claiming, because it's easy to over-read this: The full "what this study does not claim" section in the report lays out every boundary. compose-lint is MIT-licensed, zero-config, and depends only on PyYAML. A few ways to run it: one-off, locally pipx install compose-lint && compose-lint docker-compose.yml or the published image distroless, nonroot docker run --rm -v "$ pwd :/src" composelint/compose-lint In CI, there's a GitHub Action: .github/workflows/compose-lint.yml name: compose-lint on: pull request jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: tmatens/compose-lint@v0.7.0 with: pattern: " / compose .y ml" docker-compose.yml, compose.yaml, … fail-on: high fail-on: high the default fails only on HIGH/CRITICAL, so you can adopt it without drowning in the MEDIUM backlog on day one, then tighten later. There's also a pre-commit hook, JSON and SARIF output SARIF feeds GitHub Code Scanning , and compose-lint --explain CL-0007 to print any rule's rationale and fix. For what it's worth on a tool you'd wire into CI: every rule cites OWASP, CIS, or Docker docs, the image is distroless and nonroot, and releases ship SLSA provenance and Sigstore attestations. Details are in the repo. The full report has every table, the complete methodology, the per-rule breakdowns, and steps to reproduce it: State of Docker Compose Security If you maintain a popular Compose example, I'd genuinely love a PR or an issue. Hardening the examples people copy is the highest-leverage fix there is.