{"slug": "i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience", "title": "I built an AI résumé tool that refuses to lie about your experience", "summary": "A developer built Citevault, a local-first résumé tailoring tool that verifies every claim against the user's own evidence rather than hallucinating experience. The tool processes each résumé bullet through a pipeline that returns verdicts of SUPPORTS, PARTIAL, UNCLEAR, or CONTRADICTS, flagging unverifiable claims as gaps instead of fabricating them. Running entirely on CPU with no API keys or outbound connections, Citevault achieves a 98.2% first-pass grounding rate using a hybrid RAG system combining dense embeddings, BM25 keyword search, and SQLite FTS5.", "body_md": "Most AI résumé tools have the same flaw: they hallucinate. Ask them to tailor your résumé for a job requiring \"Rust experience\" and they'll happily invent a Rust project you never worked on. It reads great — until the technical interview.\n\nI wanted the opposite. So I built **Citevault**: a local-first résumé tailoring tool where every claim is either grounded in your own evidence, or refused and flagged as a gap.\n\nNo fabrication. No API keys. Runs entirely on your laptop. *(Model weights are pulled from Hugging Face once on first boot; after that, no outbound connections.)*\n\nEvery bullet in your résumé starts as a *claim*. Citevault processes each one through a pipeline:\n\n`SUPPORTS`\n\n, `PARTIAL`\n\n, `UNCLEAR`\n\n, or `CONTRADICTS`\n\n`SUPPORTS`\n\n→ the claim is verified and cited; `PARTIAL`\n\n→ rewritten to match only what the evidence actually says; `UNCLEAR`\n\n→ a rewrite is attempted, and if it still can't be grounded, refused and gap-reported; `CONTRADICTS`\n\n→ refused immediately and gap-reportedThe result is a résumé where every bullet has a `[^sp-...]`\n\nfootnote traceable back to a specific span in your source material.\n\nToggle \"Compare with naive AI\" before starting a tailoring run. Citevault runs its grounded pipeline *and* a second single-pass run — same model, same evidence, same task description, no verification loop. The only difference is the grounded pipeline checks every claim against its source before including it.\n\nThe diff is striking:\n\n`[Candidate Name]`\n\nand invented achievements that never appeared in the evidence| Component | Role |\n|---|---|\nGemma 4 E4B (`gemma4:e4b` ) via Ollama |\nClaim drafting, verification, cover letter composition |\nBGE-small-en-v1.5 |\nDense embeddings for semantic retrieval |\nBGE cross-encoder |\nRe-ranking retrieved candidates |\nBM25 + SQLite FTS5 |\nKeyword retrieval (hybrid RAG) |\nsqlite-vec |\nVector store — no external database required |\n\nGemma 4 E4B was chosen specifically for this role: it is instruction-tuned well enough to return consistent structured JSON verdicts, small enough to run on CPU without a GPU, and open-weight so no API key or data exposure is involved. The `e4b`\n\ntag is the Q4_K_M quantised build — the best size/quality tradeoff for local inference via Ollama.\n\nThe entire stack runs on CPU. Measured on a 4-core/8-thread laptop with 32 GB RAM and no discrete GPU: 3–8 tokens/second generation speed, 20–30 minutes per tailoring run; add another 10–20 minutes if naive comparison is enabled. Slower than a cloud API, but zero cost, zero data exposure, and no dependency on an upstream service staying alive.\n\n**Structured generation is the hard part.** Getting Gemma 4 to consistently return structured JSON verdicts from the verifier took more prompt iteration than anything else. The final verifier prompt is tightly constrained: it gives the model a specific rubric, a strict output format, and a worked example. It still occasionally returns malformed output — those claims are logged and omitted from the output rather than silently passed through.\n\n**Hybrid RAG matters.** Pure dense search misses exact keyword matches. Pure BM25 misses semantic similarity. On the five-case golden eval set, the hybrid combination recovered ~15 percentage points in first-pass grounding rate over either retrieval strategy alone — enough to tip borderline claims from UNCLEAR to SUPPORTS.\n\n**Eval-driven development pays off.** I built a golden evaluation set of five synthetic candidates and ran the pipeline against it after every significant change. The final first-pass grounding rate is 98.2% — but more importantly, I caught two regressions that looked fine in manual testing.\n\n**Local-first is a real constraint, not a marketing line.** Your career data is sensitive. Résumés contain salary history, reasons for leaving, private project details. I didn't want to be a data controller. Building local-first forced specific architectural decisions — no cloud storage, no async job queue, no third-party embedding API.\n\n```\ndocker compose up -d ollama\ndocker compose exec ollama ollama pull gemma4:e4b\ndocker compose up -d\n# Then open http://localhost:5173/admin in your browser\n```\n\nUpload your evidence, paste a job posting, and watch the grounding happen in real time via SSE stream.\n\nHeads up — this runs on CPU.On a 4-core laptop without a GPU, expect 20–30 minutes per tailoring run. With naive comparison enabled, add another 10–20 minutes for the second pass. It is slow by cloud-API standards, but fully offline and costs nothing after the first model pull.\n\nThe best test: pick a role where you have a genuine skill gap — that is where the gap report is most useful.\n\nThe full architecture (hexagonal layout, RAG pipeline, Docker Compose stack) is documented in [ docs/architecture.md](https://github.com/jaberoma/citevault/blob/main/docs/architecture.md) in the repo.\n\nThe code is on GitHub: ** github.com/jaberoma/citevault** — MIT licensed, no account required, runs on any laptop with Docker.\n\nCitevault's contract is simple: every claim in your résumé either links to a source span in your own evidence, or it does not appear. No exceptions.", "url": "https://wpnews.pro/news/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience", "canonical_source": "https://dev.to/jaberoma_77/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience-n7b", "published_at": "2026-05-25 17:53:13+00:00", "updated_at": "2026-05-25 18:03:21.050694+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "ai-startups", "large-language-models", "generative-ai"], "entities": ["Citevault", "Hugging Face"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience", "markdown": "https://wpnews.pro/news/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience.md", "text": "https://wpnews.pro/news/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience.txt", "jsonld": "https://wpnews.pro/news/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience.jsonld"}}