cd /news/large-language-models/stop-asking-the-llm-whether-its-sour… · home topics large-language-models article
[ARTICLE · art-42530] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Stop Asking the LLM Whether Its Source Is Real

A developer building technical dossiers warns against asking LLMs to verify their own hallucinated citations, as the model lacks access to a real database and suffers from self-evaluation bias. Instead, the developer implements a three-filter pipeline using external APIs like Crossref and arXiv to check existence, credibility, and fidelity of references before they enter a document. The approach extends to any agent citing sources, emphasizing that verification must be a separate step wired to authoritative systems.

read3 min views1 publishedJun 28, 2026

You ask the AI for a bibliography. It hands you a title, authors, a journal, a year, a well-formed DOI. Everything is plausible, everything is clean. And one reference in two doesn't exist. Not "approximate": nonexistent. The DOI resolves to nothing, the paper was never written.

The reflex is to ask the model again: "are you sure this source is real?" It says yes. Always. You just asked the forger about the authenticity of his forgery.

An LLM doesn't store a database of publications. It generates likely sequences of words. A citation, to it, is a shape: a surname, an initial, two more names, a capitalized journal, a recent year, ten DOI digits. It produces that shape perfectly, because that's exactly what it's good at. The content doesn't need to be true to be plausible, it just needs to resemble.

That's why a hallucinated reference is so vicious: it doesn't look like an error. A wrong calculation jumps out. An invented citation looks like a real one, until you click.

The golden rule fits in one sentence: never ask the model that hallucinated a citation whether that citation is real. For two reasons that compound. First, it doesn't have the information: it has no access to a registry, it can only regenerate something plausible. Second, even if it doubted, its self-evaluation bias pushes it to confirm what it already produced. You get a "yes" worth nothing.

Verification has to come from elsewhere. From a source the model neither controls nor can invent: a metadata API.

In my pipeline for writing technical dossiers, no reference enters the document before clearing three filters, in this order.

Existence. The DOI must resolve. It's binary, and it's free. Crossref exposes its whole database:

curl -s "https://api.crossref.org/works/10.1145/3290605.3300233" \
  | jq '.message.title[0], .message.author[0].family, .message["published"]'

If the API returns a title and authors, the paper exists. If it returns a 404, the reference is out, full stop. For preprints, same logic with the arXiv API (export.arxiv.org/api/query

) or HAL for French research. This step alone removes the bulk of hallucinations, because an invented DOI never resolves.

Credibility. Existing isn't enough. A predatory journal, one that publishes anything for a fee, gives a valid DOI to a worthless paper. This filter checks that the journal or conference is real and recognized, not a shell. The DOI proves the source exists, not that it's worth anything.

Fidelity. The most demanding filter, and the one the API won't do for you. The source exists, it's serious, but does it actually say what you make it say? You have to read the paper, spot what's measured versus what's merely asserted, and not extrapolate past its abstract. A real citation slapped onto a claim it doesn't support is still false evidence.

This pipeline is nothing specific to academic dossiers. The moment an agent cites a source, a ticket, a CVE number, a doc page, a commit, the same discipline applies: the reference must resolve against the authoritative system, not against the model's memory. An agent that says "per ticket JIRA-1242" must have resolved JIRA-1242; otherwise it may have invented the number with as much confidence as a DOI.

The most common architecture mistake in RAG is trusting the generation layer to self-verify. It can't. Verification is a separate step, wired to an external truth, run before the output reaches the user.

There's a lot of talk about lowering models' hallucination rate. That's the wrong fight: a plausible-text generator will always hallucinate a little, it's its nature. The real lever isn't making the model more honest, it's ceasing to take it at its word. A citation you can't resolve against an external registry isn't a citation. It's a guess in a lab coat.

── more in #large-language-models 4 stories · sorted by recency
── more on @crossref 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/stop-asking-the-llm-…] indexed:0 read:3min 2026-06-28 ·