I built an AI résumé tool that refuses to lie about your experience

wpnews.pro

cd /news/ai-tools/i-built-an-ai-resume-tool-that-refus… · home › topics › ai-tools › article

[ARTICLE · art-13760] src=dev.to ↗ pub=2026-05-25T17:53Z topic=ai-tools verified=true sentiment=↑ positive

I built an AI résumé tool that refuses to lie about your experience

A developer built Citevault, a local-first résumé tailoring tool that verifies every claim against the user's own evidence rather than hallucinating experience. The tool processes each résumé bullet through a pipeline that returns verdicts of SUPPORTS, PARTIAL, UNCLEAR, or CONTRADICTS, flagging unverifiable claims as gaps instead of fabricating them. Running entirely on CPU with no API keys or outbound connections, Citevault achieves a 98.2% first-pass grounding rate using a hybrid RAG system combining dense embeddings, BM25 keyword search, and SQLite FTS5.

read4 min views8 publishedMay 25, 2026

Most AI résumé tools have the same flaw: they hallucinate. Ask them to tailor your résumé for a job requiring "Rust experience" and they'll happily invent a Rust project you never worked on. It reads great — until the technical interview.

I wanted the opposite. So I built Citevault: a local-first résumé tailoring tool where every claim is either grounded in your own evidence, or refused and flagged as a gap.

No fabrication. No API keys. Runs entirely on your laptop. (Model weights are pulled from Hugging Face once on first boot; after that, no outbound connections.)

Every bullet in your résumé starts as a claim. Citevault processes each one through a pipeline:

SUPPORTS

, PARTIAL

, UNCLEAR

, or CONTRADICTS

SUPPORTS

→ the claim is verified and cited; PARTIAL

→ rewritten to match only what the evidence actually says; UNCLEAR

→ a rewrite is attempted, and if it still can't be grounded, refused and gap-reported; CONTRADICTS

→ refused immediately and gap-reportedThe result is a résumé where every bullet has a [^sp-...]

footnote traceable back to a specific span in your source material.

Toggle "Compare with naive AI" before starting a tailoring run. Citevault runs its grounded pipeline and a second single-pass run — same model, same evidence, same task description, no verification loop. The only difference is the grounded pipeline checks every claim against its source before including it.

The diff is striking:

[Candidate Name]

and invented achievements that never appeared in the evidence| Component | Role | |---|---| Gemma 4 E4B (gemma4:e4b ) via Ollama | Claim drafting, verification, cover letter composition | BGE-small-en-v1.5 | Dense embeddings for semantic retrieval | BGE cross-encoder | Re-ranking retrieved candidates | BM25 + SQLite FTS5 | Keyword retrieval (hybrid RAG) | sqlite-vec | Vector store — no external database required |

Gemma 4 E4B was chosen specifically for this role: it is instruction-tuned well enough to return consistent structured JSON verdicts, small enough to run on CPU without a GPU, and open-weight so no API key or data exposure is involved. The e4b

tag is the Q4_K_M quantised build — the best size/quality tradeoff for local inference via Ollama.

The entire stack runs on CPU. Measured on a 4-core/8-thread laptop with 32 GB RAM and no discrete GPU: 3–8 tokens/second generation speed, 20–30 minutes per tailoring run; add another 10–20 minutes if naive comparison is enabled. Slower than a cloud API, but zero cost, zero data exposure, and no dependency on an upstream service staying alive.

Structured generation is the hard part. Getting Gemma 4 to consistently return structured JSON verdicts from the verifier took more prompt iteration than anything else. The final verifier prompt is tightly constrained: it gives the model a specific rubric, a strict output format, and a worked example. It still occasionally returns malformed output — those claims are logged and omitted from the output rather than silently passed through.

Hybrid RAG matters. Pure dense search misses exact keyword matches. Pure BM25 misses semantic similarity. On the five-case golden eval set, the hybrid combination recovered ~15 percentage points in first-pass grounding rate over either retrieval strategy alone — enough to tip borderline claims from UNCLEAR to SUPPORTS.

Eval-driven development pays off. I built a golden evaluation set of five synthetic candidates and ran the pipeline against it after every significant change. The final first-pass grounding rate is 98.2% — but more importantly, I caught two regressions that looked fine in manual testing.

Local-first is a real constraint, not a marketing line. Your career data is sensitive. Résumés contain salary history, reasons for leaving, private project details. I didn't want to be a data controller. Building local-first forced specific architectural decisions — no cloud storage, no async job queue, no third-party embedding API.

docker compose up -d ollama
docker compose exec ollama ollama pull gemma4:e4b
docker compose up -d

Upload your evidence, paste a job posting, and watch the grounding happen in real time via SSE stream.

Heads up — this runs on CPU.On a 4-core laptop without a GPU, expect 20–30 minutes per tailoring run. With naive comparison enabled, add another 10–20 minutes for the second pass. It is slow by cloud-API standards, but fully offline and costs nothing after the first model pull.

The best test: pick a role where you have a genuine skill gap — that is where the gap report is most useful.

The full architecture (hexagonal layout, RAG pipeline, Docker Compose stack) is documented in docs/architecture.md in the repo.

The code is on GitHub: ** github.com/jaberoma/citevault** — MIT licensed, no account required, runs on any laptop with Docker.

Citevault's contract is simple: every claim in your résumé either links to a source span in your own evidence, or it does not appear. No exceptions.

source & further reading

dev.to — original article The smartest model lost — and it just redrew the 2026 AI race The Paintbrush Paradox: Why the Monolithic Era of AI Is Crumbling The Skill Level Of Young Developers Is Dropping. And Barely Anyone Wants To Admit It.

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-an-ai-resume-too…

Read original on dev.to → dev.to/jaberoma_77/i-built-an-ai-resume-tool-tha…

mentioned entities

Citevault

Hugging Face

metadata

slugi-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience

topic#ai-tools

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevHow AI Is Taking Away Your Abili…

next →Show HN: Cursed Browser – a VLM …

── more in #ai-tools 4 stories · sorted by recency

sourcefeed.dev · 9 Jul · #ai-tools

Why AI Agents Are Outgrowing Cloudflare Durable Objects

sourcefeed.dev · 9 Jul · #ai-tools

The New Economics of the AI Software Rewrite

discuss.huggingface.co · 9 Jul · #ai-tools

Distinguish between thinking and responding during generation

zdnet.com · 9 Jul · #ai-tools

OpenAI's GPT-5.6 and ChatGPT Work aim to beat Anthropic on price, speed, and productivity

── more on @citevault 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

Anthropic's "J-lens" reveals workspace in Claude mirrors theory of consciousness

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required