You fix a bug. You deploy. Three hours later, production is on fire. Here's how I built a tool to stop that.
I was working on a Chinese chess app. Changed is_checkmate()
β added a simple empty-board guard clause. Two lines. Trivial.
Ran my tests. Green. Deployed.
Two hours later: puzzle solving was broken. Level submission returned wrong results. I changed one function. It silently broke two callers I didn't even know existed.
The test coverage report said 87%. It didn't matter. Coverage tells you which lines ran β not which callers your change silently corrupted.
Test Shield β a Claude Code skill with one job:
/test-shield
β
βΌ
Changed: is_checkmate() β chess_utils.py:320
β οΈ Unexpected impact:
β solve_puzzle() β levels.py:687 β wouldn't have caught this
β submit_level_solution() β levels.py:516 β or this
β Generate 6 regression tests
β 4/6 pass, 2 need review
β Safe to merge. Here's the evidence.
Coverage tells you: "Line 42 was executed during tests."
Test Shield tells you: "Line 42 changed. These 3 callers depend on it. They have zero tests. You didn't know about 2 of them."
One is about what ran. The other is about what will break.
I learned this the hard way during building: Python's dynamic nature means some callers are invisible to static analysis. getattr()
, importlib
, decorator injection, monkey-patching β AST tracing can't see through those.
So Test Shield doesn't pretend. Every run ends with a "honesty report":
Tracing method: AST static analysis
Dynamic call risks: 0 detected
Known blind spots: getattr, importlib, decorators
If it can't trace something, it tells you. No false confidence.
git clone https://github.com/dominicharmon-commits/test-shield.git
cd your-python-project
python ../test-shield/scripts/analyze.py .
/test-shield
Zero dependencies. Pure Python stdlib. Python 3.10+.
PRs welcome. MIT licensed.
Built with Claude Code. Tested on a real production codebase. Stars appreciated. β