The Pre-Commit Hook That Catches API Keys Before They Hit Git

This article explains the security risks of accidentally committing API keys and secrets to Git repositories, noting that over 10 million secrets were detected in public commits in 2023. It provides a technical solution using a pre-commit hook script that scans staged files for known secret patterns (like AWS keys and Stripe keys) and blocks the commit if any are found. The hook reads only staged content to avoid false negatives and includes a suppression mechanism for legitimate high-entropy strings that trigger false positives.

The problem: secrets in git are forever You know the drill. A developer hardcodes a Stripe secret key to test a webhook handler locally. They commit. They push. Maybe they catch it themselves and run git rm . Problem solved, right? Wrong. The key is still in your git history. Anyone who clones the repo can run git log -p and find it. Bots scrape GitHub for exactly this pattern. GitGuardian reported over 10 million secrets detected in public commits in 2023 alone, and the number keeps climbing. Scrubbing secrets from git history means git filter-branch or BFG Repo-Cleaner, force-pushing to every remote, and hoping nobody already pulled the old history. If the key reached a public repo for even a few minutes, you need to rotate it. For AWS, that means updating every service, Lambda, and CI pipeline that uses it. For Stripe, that means regenerating keys and redeploying payment infrastructure. The real cost is not the cleanup. It is the blast radius. A leaked AWS key can rack up tens of thousands in compute charges before you notice. A leaked Stripe key gives an attacker access to your customer payment data. Prevention is not optional. The fix: a POSIX pre-commit hook A git pre-commit hook runs automatically before every commit. If it exits with a non-zero status, the commit is blocked. The strategy: scan every staged file for patterns that look like secrets, and refuse to commit if anything matches. Here is the skeleton. This goes in .git/hooks/pre-commit or use a symlink from a checked-in scripts/ directory so every developer on the team gets it . Shell .git/hooks/pre-commit bash /bin/sh Pre-commit hook: block secrets from reaching git history set -e STAGED FILES=$ git diff --cached --name-only --diff-filter=ACM if -z "$STAGED FILES" ; then exit 0 fi FOUND=0 for file in $STAGED FILES; do Skip binary files if file "$file" | grep -q "binary"; then continue fi Get only the staged content not working tree CONTENT=$ git show ":$file" 2 /dev/null || continue Check for known secret patterns if echo "$CONTENT" | check patterns "$file"; then FOUND=1 fi done if "$FOUND" -eq 1 ; then echo "COMMIT BLOCKED: potential secrets detected." echo "Add a pii-ok comment to suppress false positives." exit 1 fi Key detail: we use git show ":$file" to read the staged content, not the working tree. This prevents false negatives where a developer stages a file with a secret, then removes it from the working copy but does not re-stage. Pattern matching: what to look for The core of the hook is a set of regular expressions that match known secret formats. These are not hypothetical patterns. They are extracted from real-world key formats. Shell Pattern definitions check patterns { file="$1" matched=0 AWS Access Key ID if grep -nE 'AKIA 0-9A-Z {16}' | filter suppressed; then echo " AWS $file: AWS Access Key ID" matched=1 fi Stripe secret key if grep -nE 'sk live|test 0-9a-zA-Z {24,}' | filter suppressed; then echo " STRIPE $file: Stripe secret key" matched=1 fi Stripe restricted key if grep -nE 'rk live|test 0-9a-zA-Z {24,}' | filter suppressed; then echo " STRIPE $file: Stripe restricted key" matched=1 fi GitHub personal access token if grep -nE 'ghp 0-9a-zA-Z {36}' | filter suppressed; then echo " GITHUB $file: GitHub PAT" matched=1 fi Generic high-entropy strings API keys, tokens if grep -nE " '\" 0-9a-zA-Z {32,} '\" " | filter suppressed; then echo " ENTROPY $file: high-entropy string =32 chars " matched=1 fi return $matched } The high-entropy check at the end is the catch-all. Any quoted string of 32+ alphanumeric characters is flagged. This catches tokens, API keys, and secrets that do not match a known vendor pattern. It will also flag some legitimate values like UUIDs and hashes, which is where the suppression pragma comes in. The pii-ok pragma: handling false positives Every secret scanner produces false positives. A SHA-256 hash in a test fixture. A base64-encoded public key. A long CSS class name generated by a build tool. If there is no escape hatch, developers will disable the hook entirely, which defeats the purpose. The solution is a suppression comment: pii-ok . If a line contains this marker, the scanner skips it. Shell Suppression filter filter suppressed { Remove lines containing the suppression marker grep -v "pii-ok" | grep -c . /dev/null 2 &1 } In practice it looks like this: JavaScript Example usage in code js // This SHA-256 is a test fixture, not a secret const EXPECTED HASH = 'a1b2c3d4e5f6...'; // pii-ok // This WILL be caught no pragma const STRIPE KEY = 'sk live abc123...'; The rule is simple: if you know a value is not a secret, add pii-ok on the same line. If you are not sure, leave it off and let the hook flag it. The inconvenience of a false positive is nothing compared to the cost of a leaked key. Going further: .htaccess and env files The pattern-matching approach extends to other dangerous file types. .htaccess files with SetEnv directives often contain database passwords. .env files are secrets by definition. Your hook should flag both. Shell Additional checks Block .env files entirely if echo "$file" | grep -qE '\.env$'; then echo " ENV $file: .env files must be .gitignored" FOUND=1 continue fi Flag SetEnv with real values in .htaccess if echo "$file" | grep -qE '\.htaccess$'; then if echo "$CONTENT" | grep -nE 'SetEnv\s+\S+\s+\S+' | filter suppressed; then echo " HTACCESS $file: SetEnv with real values" FOUND=1 fi fi The convention: commit .env.example with <REPLACE ME placeholders. The real .env stays in .gitignore . Same for .htaccess files that contain credentials -- commit a sanitized version, keep the real one out of version control. The AI layer: catching what regex cannot Pattern matching catches known secret formats. But what about a database connection string with an embedded password? Or a hardcoded JWT that does not match any vendor prefix? Or code that is technically functional but has a SQL injection vulnerability? This is where an LLM-powered code review gate comes in. The idea: after the regex-based pre-commit hook passes, run a second pass that sends the diff to an LLM and asks it to identify security concerns. The model can catch patterns that regex never will -- SQL injection, logic errors that expose data, hardcoded credentials in unusual formats, and more. The review gate reads your staged diff, sends it to an LLM with a security-focused system prompt, and blocks the commit if the model identifies high-severity issues. It complements the fast, deterministic regex hook with the contextual understanding of a language model. Making it a team standard A pre-commit hook that lives in .git/hooks/ only works on one machine. To make it a team-wide standard: - Check the hook into the repo under scripts/pre-commit - Add a setup script that symlinks it: ln -sf ../../scripts/pre-commit .git/hooks/pre-commit - Document the pii-ok pragma so developers know how to suppress false positives without disabling the hook - Run the same patterns in CI as a backup, because developers can skip hooks with --no-verify The hook must be fast. If it takes more than a second or two, developers will bypass it. Pure POSIX shell with grep keeps it under 200ms even on large commits. The AI review gate is optional and can be configured to run only on push or in CI if latency is a concern. < -- CTA Box -- Claude Code Kit The complete pre-commit safety system, ready to drop in The patterns in this post are a starting point. Claude Code Kit is the production-hardened version: a complete PII scanner, AI-powered code review gate, pre-commit hooks covering 15+ secret formats, and CLAUDE.md templates for teams using AI coding assistants. 53 tests. Zero dependencies. Pure POSIX shell. - PII scanner with 15+ patterns - AI code review gate LLM-powered - pii-ok pragma suppression - CLAUDE.md team templates - .env / .htaccess protection - 53 passing tests - Zero dependencies - POSIX shell -- works everywhere Get Claude Code Kit -- $29 https://aiclarityau.gumroad.com/l/claude-code-kit Full Stack Bundle -- $149 //../ bundles One-time purchase. No subscription. What to do next If you do nothing else today, add the basic pattern-matching hook from this post to your repositories. It takes five minutes, it costs nothing, and it will save you from at least one costly key rotation. For a production-ready implementation with broader pattern coverage, the AI review gate, team templates, and a full test suite, Claude Code Kit https://aiclarityau.gumroad.com/l/claude-code-kit has it all packaged up and documented. It is $29, it is a one-time purchase, and every line of source code is included. No binaries, no obfuscation, no vendor lock-in. Your git history should contain your work, not your secrets.