# I built an AI résumé tool that refuses to lie about your experience

> Source: <https://dev.to/jaberoma_77/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience-n7b>
> Published: 2026-05-25 17:53:13+00:00

Most AI résumé tools have the same flaw: they hallucinate. Ask them to tailor your résumé for a job requiring "Rust experience" and they'll happily invent a Rust project you never worked on. It reads great — until the technical interview.

I wanted the opposite. So I built **Citevault**: a local-first résumé tailoring tool where every claim is either grounded in your own evidence, or refused and flagged as a gap.

No fabrication. No API keys. Runs entirely on your laptop. *(Model weights are pulled from Hugging Face once on first boot; after that, no outbound connections.)*

Every bullet in your résumé starts as a *claim*. Citevault processes each one through a pipeline:

`SUPPORTS`

, `PARTIAL`

, `UNCLEAR`

, or `CONTRADICTS`

`SUPPORTS`

→ the claim is verified and cited; `PARTIAL`

→ rewritten to match only what the evidence actually says; `UNCLEAR`

→ a rewrite is attempted, and if it still can't be grounded, refused and gap-reported; `CONTRADICTS`

→ refused immediately and gap-reportedThe result is a résumé where every bullet has a `[^sp-...]`

footnote traceable back to a specific span in your source material.

Toggle "Compare with naive AI" before starting a tailoring run. Citevault runs its grounded pipeline *and* a second single-pass run — same model, same evidence, same task description, no verification loop. The only difference is the grounded pipeline checks every claim against its source before including it.

The diff is striking:

`[Candidate Name]`

and invented achievements that never appeared in the evidence| Component | Role |
|---|---|
Gemma 4 E4B (`gemma4:e4b` ) via Ollama |
Claim drafting, verification, cover letter composition |
BGE-small-en-v1.5 |
Dense embeddings for semantic retrieval |
BGE cross-encoder |
Re-ranking retrieved candidates |
BM25 + SQLite FTS5 |
Keyword retrieval (hybrid RAG) |
sqlite-vec |
Vector store — no external database required |

Gemma 4 E4B was chosen specifically for this role: it is instruction-tuned well enough to return consistent structured JSON verdicts, small enough to run on CPU without a GPU, and open-weight so no API key or data exposure is involved. The `e4b`

tag is the Q4_K_M quantised build — the best size/quality tradeoff for local inference via Ollama.

The entire stack runs on CPU. Measured on a 4-core/8-thread laptop with 32 GB RAM and no discrete GPU: 3–8 tokens/second generation speed, 20–30 minutes per tailoring run; add another 10–20 minutes if naive comparison is enabled. Slower than a cloud API, but zero cost, zero data exposure, and no dependency on an upstream service staying alive.

**Structured generation is the hard part.** Getting Gemma 4 to consistently return structured JSON verdicts from the verifier took more prompt iteration than anything else. The final verifier prompt is tightly constrained: it gives the model a specific rubric, a strict output format, and a worked example. It still occasionally returns malformed output — those claims are logged and omitted from the output rather than silently passed through.

**Hybrid RAG matters.** Pure dense search misses exact keyword matches. Pure BM25 misses semantic similarity. On the five-case golden eval set, the hybrid combination recovered ~15 percentage points in first-pass grounding rate over either retrieval strategy alone — enough to tip borderline claims from UNCLEAR to SUPPORTS.

**Eval-driven development pays off.** I built a golden evaluation set of five synthetic candidates and ran the pipeline against it after every significant change. The final first-pass grounding rate is 98.2% — but more importantly, I caught two regressions that looked fine in manual testing.

**Local-first is a real constraint, not a marketing line.** Your career data is sensitive. Résumés contain salary history, reasons for leaving, private project details. I didn't want to be a data controller. Building local-first forced specific architectural decisions — no cloud storage, no async job queue, no third-party embedding API.

```
docker compose up -d ollama
docker compose exec ollama ollama pull gemma4:e4b
docker compose up -d
# Then open http://localhost:5173/admin in your browser
```

Upload your evidence, paste a job posting, and watch the grounding happen in real time via SSE stream.

Heads up — this runs on CPU.On a 4-core laptop without a GPU, expect 20–30 minutes per tailoring run. With naive comparison enabled, add another 10–20 minutes for the second pass. It is slow by cloud-API standards, but fully offline and costs nothing after the first model pull.

The best test: pick a role where you have a genuine skill gap — that is where the gap report is most useful.

The full architecture (hexagonal layout, RAG pipeline, Docker Compose stack) is documented in [ docs/architecture.md](https://github.com/jaberoma/citevault/blob/main/docs/architecture.md) in the repo.

The code is on GitHub: ** github.com/jaberoma/citevault** — MIT licensed, no account required, runs on any laptop with Docker.

Citevault's contract is simple: every claim in your résumé either links to a source span in your own evidence, or it does not appear. No exceptions.
