The Andreessen Horowitz-backed startup wants to hit 99.99% accuracy by wrapping language models in deterministic validators
AI hallucinations have graduated from amusing party trick to genuine liability. Probably, a San Francisco-based startup founded by Peter Elias, just raised $9M in seed funding to make sure AI systems stop confidently lying to people.
The round was led by Andreessen Horowitz, and the capital will fund what Probably describes as a reliability layer for AI: a system designed to catch factual errors before they ever reach end users.
The 99.99% problem #
Probably is targeting 99.99% accuracy on precision-sensitive tasks. In English: for every 10,000 answers, the system aims to get no more than one wrong.
The company’s approach wraps language models in what it calls deterministic validators. The LLM generates a first-pass answer, and the validation layer checks whether that answer holds up against verifiable data before it gets delivered.
Every response comes with citations and audit trails. In industries like accounting and healthcare, being able to trace how an answer was generated isn’t a feature, it’s a regulatory requirement.
Smaller models, bigger ambitions #
The company says its system runs on models described as “four classes weaker than frontier models.” By engineering reliability into smaller models rather than brute-forcing accuracy with larger ones, Probably is betting it can deliver comparable results at a fraction of the cost.
The system can also run on local hardware, reducing token expenses and keeping AI inference on-premises for companies handling sensitive financial or medical data.
Probably’s first product is a data science tool that generates precise answers from complex datasets, allowing non-technical users to interrogate datasets and get reliable, cited answers without needing to understand the underlying data architecture.
Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our