The failure mode that scares me most in RAG isn't a wrong answer. It's a confident wrong answer with three citations that don't actually say what the answer claims.
So in SWIRL 5 I stopped trusting the model to police itself and added a check that runs after generation.
The flow:
The interesting part wasn't the entailment model; it was everything around it.
Claim segmentation is harder than it sounds. Naive sentence splitting produces claims that are unverifiable on their own because the subject lives two sentences up.
Citations lie by omission. A model will cite a document that's topically relevant but doesn't contain the specific number it just quoted. The whole point of the check is to catch exactly that gap.
Latency budget. An honesty layer nobody waits for is an honesty layer nobody ships. SWIRL 5 batches and optionally caches passage embeddings and more.
The result isn't "SWIRL never hallucinates." Nothing can promise that. The result is: when it's on thin ice, it tells you, and it points at the exact sentence.
That's the version of trustworthy I can actually build.