{"slug": "codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode", "title": "CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode", "summary": "Based solely on the provided text, CodeDNA is an AI tool that uses Google's Gemma 4 model in \"Thinking Mode\" to analyze a codebase's git commit history. It reconstructs the narrative of a project's evolution, identifying key events like bug storms, architectural pivots, and refactors, and outputs a verifiable archaeological report with a health score and milestone timeline. The tool processes up to 400 commits, using a map-reduce design to separate causal reasoning from structured data output for higher insight quality.", "body_md": "You inherited this codebase 6 months ago. You can feel something went wrong around 2021. Bug reports spiked. Velocity dropped. The original authors left. The commit history has 3,000 entries — and every answer is in there.\n\nNobody has time to read 3,000 commits.\n\nCodeDNA does.\n\n## What I Built\n\n**CodeDNA** is an AI Codebase Archaeologist. You paste your `git log`\n\n, and Gemma 4 — using Thinking Mode — reconstructs the story of your codebase: bug storms, architectural pivots, refactor eras, feature bursts, and an overall health score with a transparent breakdown.\n\nThe output is 100% verifiable. You can check every milestone against your actual commit history. No hallucinated CVEs, no unverifiable financial claims — just pattern-extracted facts from structured text you already own.\n\n## **GitHub:**\n# CodeDNA — AI Codebase Archaeologist\n\nFeed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.\n\nEvery codebase has a turning point. The moment before is clean commits and clear intent\nThe moment after is hotfixes, reverts, and growing entropy. **CodeDNA finds it.**\n\n## What It Does\n\n-\n**Maps your codebase history with Gemma 4** — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.\n\n-\n**Returns a structured archaeological report** — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.\n\n-\n**Streams Gemma 4's live reasoning** — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…\n\n# CodeDNA — AI Codebase Archaeologist\n\nFeed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.\n\nEvery codebase has a turning point. The moment before is clean commits and clear intent\nThe moment after is hotfixes, reverts, and growing entropy. **CodeDNA finds it.**\n\n## What It Does\n\n-\n**Maps your codebase history with Gemma 4**— up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data. -\n**Returns a structured archaeological report**— health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value. -\n**Streams Gemma 4's live reasoning**— watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…\n\n## The Problem It Solves\n\nYou inherit a codebase. Something went wrong around late 2021 — you can feel it. Bug reports spiked, velocity dropped, the original authors left. The commit history has everything, but nobody has time to read 3,000 commits manually.\n\nTraditional tools give you graphs of commit frequency. That tells you *how much* happened, not *what* happened or *why* one period was chaotic and another stable.\n\nCodeDNA uses Gemma 4's Thinking Mode to reason across your entire commit history and surface the narrative that was always there.\n\n## Live Demo\n\n*The live demo in action: CodeDNA processing the React repository’s architectural transition history.*\n\n## Core Features\n\n| Feature | Description |\n|---|---|\n| Animated timeline | Color-coded milestones — red = bug storm, yellow = refactor, green = pivot, blue = feature burst |\n| Health score + breakdown | 0–100 score with transparent factor table (not a black-box number) |\n| Live Thinking Mode stream | Watch Gemma 4 reason step-by-step as it analyzes your history |\n| Smart preprocessing | Caps at 180 commits, extracts monthly histograms and file hotspots before inference |\n| Multi-provider fallback | Google AI Studio (26B → 31B) → OpenRouter (gemma-2-27b-it → gemma-3-12b-it → more) |\n| Analysis caching | Same git log = instant results on repeat runs |\n| Markdown export | Download a complete archaeological report |\n| Messy commit handling | Detects vague history and gives honest, low-confidence analysis instead of hallucinating |\n\n### Screenshots\n\n*The timeline builds milestone by milestone. Red = bug storm, yellow = refactor, green = pivot.*\n\n*Health Score is never a black-box number. Every factor cites commit evidence.*\n\n*The reasoning panel shows Gemma 4's step-by-step analysis as it happens. This is Thinking Mode — not post-hoc summarization.*\n\n## Architecture\n\n```\ngit log --stat (your paste or .txt upload)\n        ↓\npreprocessor.py\n  → parse commits, build monthly histogram, extract file hotspots\n  → metadata header injected: MONTHLY_COUNTS, TOP_CHANGED_FILES, BUG_FIX_RATIO\n        ↓\nStep 1: Reasoning Stream (REASONING_SYSTEM_PROMPT)\n  → Gemma 4 Thinking Mode streams clean markdown report\n  → Visible live in right panel\n        ↓\nStep 2: JSON Structuring (JSON_SYSTEM_PROMPT)\n  → Separate Gemma call converts reasoning → typed AnalysisResult JSON\n  → Pydantic v2 validates schema\n        ↓\nReact UI\n  → Health Score ring + breakdown table (center, always visible)\n  → Animated vertical timeline (left)\n  → Live reasoning stream (right)\n  → Markdown export\n```\n\n**Map-reduce design:** By splitting reasoning (Step 1) from JSON structuring (Step 2), Thinking Mode output is clean prose instead of polluted with schema enforcement constraints. Insight quality is significantly higher.\n\n**Stack:**\n\n- Backend: FastAPI + httpx async + SSE streaming\n- Frontend: React 18 + Vite + Tailwind CSS\n- LLM: Gemma 4 via Google AI Studio (primary) + OpenRouter (fallback)\n- State: In-memory + disk cache (no database)\n\n## Why Gemma 4 — Not \"Just Any LLM\"\n\nThis is the most important section for me to get right.\n\n**1. Thinking Mode for causal chain reasoning — not summarization**\n\nStandard completion models count keywords. Gemma 4's Thinking Mode traces *why* patterns emerged. When it sees 14 \"fix\" commits targeting `ReactFiberHooks.js`\n\nin a 3-week window after a large API change, it connects them causally — it doesn't just report a spike.\n\nThe live reasoning stream in the UI makes this directly observable. Judges (and users) can watch Gemma's chain-of-thought in real time. This is the intentional use criterion — not decorative AI, but AI whose reasoning process is the deliverable.\n\n**2. 128K context — the archaeology window**\n\n180 commits × ~200 tokens each = ~36K tokens of compressed history in one request. No chunking, no context loss, no multi-call stitching. Gemma 4 holds the full narrative arc in one reasoning window, which is the only way to detect multi-month causal patterns (e.g., a March 2019 API change causing a June 2019 bug cluster).\n\n**3. Structured output drives the UI deterministically**\n\nThe JSON schema is strict (Pydantic v2 validated). If Gemma returns valid JSON, the timeline renders. If not, the error is surfaced honestly. No post-processing guesswork.\n\n**4. Privacy-first by design**\n\nGit history contains proprietary code, unreleased feature names, security patches, and competitive intelligence in commit messages. CodeDNA passes everything under your own API key. Zero data retention. This is not a UX choice — it's the only architecture engineering teams will actually trust with real repositories.\n\n## Demo: React Hooks Era (2018–2019)\n\nI ran CodeDNA on React's public git history during the Hooks transition — one of the most architecturally significant periods in any major open-source project.\n\n**What Gemma 4 found:**\n\n-\n**2018-07:** Feature burst — Scheduler time-slicing and Fiber pool infrastructure added (5 commits,`Scheduler.js`\n\ndominant) -\n**2018-09–10:** Pivot —`React.lazy`\n\n,`Suspense`\n\n, and`createContext v2`\n\nintroduced across 6 commits -\n**2019-01–02:** Stability → Bug storm — 4 rapid fixes for`useRef`\n\nand`useEffect`\n\ninfinite loops following the 16.8.0 release -\n**2019-05:** Feature burst —`useTransition`\n\n,`useDeferredValue`\n\n,`unstable_createRoot`\n\n(5 commits,`ReactFiberHooks.js`\n\ndominant)\n\n**Health score: 58/100** — justified by 21% bug-fix ratio, two high-severity bug storms in 2019-01 and 2019-02, partially offset by clear feature burst eras and high commit message quality (83% of commits have descriptive messages ≥8 words).\n\n## Quick Start\n\n```\n# Clone\ngit clone https://github.com/acchasujal/codeDNA.git\ncd codeDNA\n\n# Backend\ncd backend\npip install -r requirements.txt\ncp .env.example .env\n# Add your Google AI Studio key as GEMINI_API_KEY\nuvicorn main:app --reload\n\n# Frontend (new terminal)\ncd ../frontend\nnpm install\nnpm run dev\n# Opens http://localhost:5173\n```\n\n**Get your git log:**\n\n```\n# Any repo you have locally:\ngit log --stat | head -3000 > my_history.txt\n# Upload the .txt file or paste directly\n\n# React demo (what the screenshots use):\ngit clone https://github.com/facebook/react\ncd react\ngit log --stat --after=\"2018-09-01\" --before=\"2019-06-01\" | head -3000 > react_hooks.txt\n```\n\n**.env.example:**\n\n```\nGEMINI_API_KEY=your_google_ai_studio_key_here\nGEMMA_MODEL=models/gemma-4-26b-a4b-it\nMAX_COMMITS=180\nOPENROUTER_API_KEY=optional_for_fallback\n```\n\n## Technical Highlights\n\n**Multi-provider fallback chain** — At startup, CodeDNA queries the OpenRouter API to dynamically discover available Gemma models and builds a priority chain. Google AI Studio is primary; OpenRouter provides up to 9 additional Gemma models as fallback. The chain is logged at startup so you always know what's running.\n\n**Preprocessor intelligence** — Before any model call, the preprocessor extracts a `MONTHLY_COMMIT_COUNTS`\n\nhistogram and `TOP_CHANGED_FILES`\n\nlist from the raw git log. This ground-truth metadata is injected directly into the prompt, so Gemma cites real numbers (\"commit count tripled to 47 in March 2019\") rather than inferring from prose.\n\n**Anti-fluff enforcement** — The system prompt contains an explicit `FORBIDDEN_PHRASES`\n\nlist (`\"technical debt\"`\n\n, `\"the team\"`\n\n, `\"seems like\"`\n\n, `\"likely indicates\"`\n\n, and 12 others). Every insight must cite a specific commit hash, date, file name, or count — or say \"insufficient evidence.\"\n\n**Honest confidence** — Every milestone includes a `confidence`\n\nfield (`high | medium | low`\n\n) with a justification sentence. Low-quality commit histories get a `QUALITY_WARNING`\n\nheader and produce conservative, clearly-labeled micro-analyses rather than dramatic fabrications.\n\n## The Reasoning System Prompt\n\nThe full prompt that drives Step 1 (the reasoning stream):\n\n```\nYou are CodeDNA, a concise git-history analyst.\nProduce a clean public report, not private reasoning.\n\nRules:\n- Output markdown prose only. No JSON. No code fences.\n- No meta-commentary, self-correction, planning notes, or internal monologue.\n- Never write \"wait\", \"I used\", \"the prompt says\", or any phrase from this\n  forbidden list: technical debt, the team, engineers, developers, working hard,\n  prioritized, decided to, management, business logic, seems like, appears to,\n  it looks like, likely indicates, possibly, perhaps, might have.\n- Use only observable evidence from the metadata header and commit log.\n- Cite commit hashes, dates/months, file names, commit counts, and ratios\n  whenever making a claim.\n- If evidence is thin, say \"insufficient evidence\" and name the missing signal.\n  Do not invent intent, people, architecture, risk, or causality.\n- Keep every sentence useful. Avoid repetition.\n\nFormat exactly:\n## Overview\nTwo to three factual sentences covering commit count, date range,\nmost changed files or file types, and BUG_FIX_RATIO.\n\n## Milestones\nFour to eight bullets when evidence allows. Each bullet:\n- **YYYY-MM** - type - concise evidence sentence with commit hash(es),\n  changed file(s), and count(s).\n  Allowed types: bug_storm, refactor, pivot, feature_burst, stability.\n\n## Health Signals\nThree bullets: one positive signal, one negative signal, one confidence note.\nEach bullet must cite evidence.\n\n## Churn Summary\nOne concise sentence naming the peak period and the files or commits behind it.\n```\n\n## See the REASONING_SYSTEM_PROMPT\n\n## The Hardest Problem: Making Gemma Say Something Real\n\nThe biggest technical challenge wasn't the UI, the SSE streaming, or the fallback chain. It was getting Gemma 4 to produce *specific, verifiable* insights instead of confident-sounding nonsense.\n\nHere's what the first version produced on a repo with commits like `\"fix navbar bug\"`\n\n, `\"update readme\"`\n\n, `\"refactor utils\"`\n\n:\n\n\"This period reflects a time of organizational growth and technical maturity. The team worked hard to address accumulated complexity while balancing feature delivery with stability concerns.\"\n\nThat output is useless. It contains zero commit references, zero file names, zero numbers. A junior consultant could have written it without looking at the code. A judge would mark it dead on arrival.\n\n**Three iterations to fix it.**\n\n**Iteration 1 — Forbidden phrases list.**\n\nAdded an explicit blocklist to the system prompt:\n\n```\nFORBIDDEN PHRASES — never use these:\n\"technical debt\", \"the team\", \"engineers\", \"developers\",\n\"working hard\", \"prioritized\", \"decided to\", \"management\",\n\"seems like\", \"appears to\", \"it looks like\", \"likely indicates\",\n\"possibly\", \"perhaps\", \"might have\"\n```\n\nThe output became less flowery but still vague: *\"There were many fixes in early 2019.\"* How many? Which files? Which period exactly?\n\n**Iteration 2 — Mandatory evidence citation.**\n\nAdded to the prompt: *\"Every milestone description must cite at least one commit hash, date/month, file name, count, or ratio. If you cannot cite evidence, write 'insufficient evidence' and stop.\"*\n\nBetter, but Gemma was still counting commits itself — and sometimes miscounting.\n\n**Iteration 3 — Pre-computed metadata injection (the breakthrough).**\n\nInstead of asking Gemma to figure out what happened, I tell it what happened and ask it to *interpret* it.\n\nThe preprocessor now builds a metadata header before any model call:\n\n```\n# META: 180tot|180ana|Q:HIGH|Fx:21%|Vg:0%\n# DATES: 2019-06-20..2018-07-02\n# MONTHS: 2018-09:3,2018-10:3,2019-01:4,2019-02:2,2019-05:5,2019-06:2\n# HOTSPOTS: ReactFiberHooks.js:8,Scheduler.js:5,package.json:4\n```\n\nNow instead of asking *\"were there a lot of fixes in early 2019?\"*, I'm asking *\"given that commits spiked to 5 in 2019-05 and ReactFiberHooks.js was modified 8 times — what does that pattern indicate?\"*\n\nThe model's job shifted from counting to interpreting. The output became:\n\n\"2019-01 through 2019-02 saw 6 commits (bf32345, ca53456, cb54567, cc55678, cd56789, ce57890) concentrated in ReactFiberHooks.js and ReactFiberBeginWork.js. ca53456 fixed an incorrect useRef identity across re-renders; cb54567 resolved an infinite useEffect loop triggered by object dependency comparison. The 16.8.0 release on 2019-02-06 (cd56789) was followed two days later by ce57890 — a hooks state regression fix, indicating at least one edge case reached production.\"\n\nEvery claim is checkable. Every hash is real. That's the difference.\n\n**The map-reduce split was the second breakthrough.**\n\nAsking Gemma 4 to simultaneously produce flowing Thinking Mode prose *and* valid JSON produces neither well. I split it:\n\n-\n**Step 1 (stream):** REASONING_SYSTEM_PROMPT — output clean markdown only, no JSON, no schema constraints -\n**Step 2 (analyze):** JSON_SYSTEM_PROMPT — read the reasoning trace, output strict AnalysisResult JSON The reasoning panel now shows actual analytical prose. The timeline data is reliably structured. Both improved dramatically when separated.\n\n## Limitations (Honest)\n\n- Works best with 100–200 commits. Very large histories (1000+) need more aggressive preprocessing.\n- Commit message quality determines insight quality. A repo full of\n`\"fix\"`\n\n,`\"wip\"`\n\n,`\"update\"`\n\ncommits will produce low-confidence analysis (CodeDNA tells you this clearly rather than inventing drama). - The reasoning stream uses the primary model; fallback models handle JSON structuring. If all Google models are slow, the stream may be empty — but the timeline will still render from the fallback result.\n- Currently runs locally only. Cloud deployment would require careful handling of API key security.\n\n## What's Next\n\n- Actual GitHub API integration (analyze any public repo by URL, no manual log export)\n- Branch comparison (main vs. feature branch health)\n- Team velocity metrics (authors per period, bus factor analysis)\n- CI/CD integration — run CodeDNA as a PR check to flag risky commit patterns\n\n*Built solo in 4 days for the Google Gemma 4 Challenge. Every commit in this repo is real — you can run CodeDNA on its own history.*\n\n**GitHub:**\n\n# CodeDNA — AI Codebase Archaeologist\n\nFeed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.\n\nEvery codebase has a turning point. The moment before is clean commits and clear intent\nThe moment after is hotfixes, reverts, and growing entropy. **CodeDNA finds it.**\n\n## What It Does\n\n-\n**Maps your codebase history with Gemma 4**— up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data. -\n**Returns a structured archaeological report**— health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value. -\n**Streams Gemma 4's live reasoning**— watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…", "url": "https://wpnews.pro/news/codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode", "canonical_source": "https://dev.to/sujal_gupta_3dc0d9052e350/codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode-1ihg", "published_at": "2026-05-22 22:01:12+00:00", "updated_at": "2026-05-22 22:32:38.528712+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools"], "entities": ["CodeDNA", "Gemma 4"], "alternates": {"html": "https://wpnews.pro/news/codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode", "markdown": "https://wpnews.pro/news/codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode.md", "text": "https://wpnews.pro/news/codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode.txt", "jsonld": "https://wpnews.pro/news/codedna-ai-codebase-archaeologist-built-with-gemma-4-thinking-mode.jsonld"}}