Don't fork the code — fork the design: introducing DeepFork

A developer introduced DeepFork, a five-phase Claude Code skill that reverse-engineers open-source repositories to extract design intent rather than code. The tool uses an AST-based knowledge graph to surface core abstractions, opinionated choices, and invariants, enabling clean-room rebuilds. DeepFork was initially built to understand Andrej Karpathy's micrograd autograd engine.

When I want to learn from a library I admire, my instinct is to read the code. But reading code and understanding a design are two different problems. Most of the time I end up in a loop — I can trace what the code does, but I can not figure out why it was shaped that way, or confidently rebuild it from first principles without dragging along the original author's choices. That is what DeepFork is for. DeepFork https://github.com/GerardoRdz96/deepfork is a five-phase Claude Code skill that reverse-engineers any open-source repo into two things: The slogan is: Don't fork the code, fork the design. The usual way to learn from a good library is to clone it and poke around. The problem is that "reading the code" is a passive act — you follow the execution path and end up understanding the implementation, not the design. Worse, you might unconsciously reproduce the original author's choices in your own project because you spent two hours reading their version. DeepFork makes the design explicit before you write a line. You come out with a document that describes the core abstractions, the opinionated choices, and the invariants — not a copy of the code. 1. License gate The first thing DeepFork does is check whether the repo's license permits clean-room analysis and rebuild. This is not a legal opinion, but it surfaces the obvious blockers e.g. strong copyleft + commercial use before you invest time. If the license is unclear, it stops and tells you why. 2. Graphify comprehension DeepFork uses graphify https://github.com/safishamsi/graphify 65.9k★ to generate an AST-based knowledge graph of the repo. You get the structural map — modules, dependencies, call graph — before reading a single file. This is what makes the interrogation phase tractable for large repos. 3. Interrogation A guided conversation with the graph: what are the core abstractions? What would break if you removed each one? Where did the author make opinionated choices vs. follow conventions? The interrogation phase is the one I am least confident about — it works well for Python/JS but needs more testing on other ecosystems. 4. Behavioral blueprint with user deltas This captures the design patterns not the code and surfaces where your own version might diverge based on your requirements. The output is a structured document: core data model, component contracts, non-obvious invariants, and a list of "if you do X differently, here is what changes." 5. Clean-room rebuild A sequenced implementation plan: data model → core primitives → behavior layer → test surface → integration points. The plan references the blueprint doc, not the original code. I built the first version of DeepFork to understand micrograd https://github.com/karpathy/micrograd — Andrej Karpathy's 100-line autograd engine. Most people read micrograd for the aha moment. DeepFork turns that moment into an artifact. The graphify pass produces the module graph in seconds. The interrogation phase surfaces the two core design insights: the Value node as the unit of computation, and why backpropagation is just reverse topological sort applied to a simple recursive structure. The blueprint makes those explicit and separable from the implementation. Full worked example is in the README. Clone the skill into your Claude Code setup npx skills add GerardoRdz96/deepfork Optional but recommended — the free AST graph engine no billing uv tool install graphifyy double-y Then in your Claude Code session: /deepfork <path-to-repo-you-want-to-understand Requires an active Claude Code session Sonnet or Opus — no extra API setup . It is a learning tool, not a production workaround. The clean-room output is a starting point for your own implementation, not a drop-in replacement. The license gate exists because clean-room analysis does not make GPL obligations disappear — read the disclaimer in the README before using it on commercial projects. This is early. The core loop works — license gate, graphify comprehension, interrogation, blueprint, rebuild plan — but the interrogation prompts are hand-tuned for Python/JS and I know they miss things in larger, more opinionated codebases. I am releasing v0.1 now because I want feedback on specifically that phase: what questions produce the most insight? What does the current prompt miss for the repos you care about? Open an issue, open a discussion, or just reply here. If this is useful to you, a ★ on GitHub helps others find it. Built with graphify + Claude Code. Penguin Alley OSS, MIT license.