cd /news/large-language-models/this-a16z-backed-startup-says-the-fi… · home topics large-language-models article
[ARTICLE · art-29577] src=thenextweb.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

This a16z-backed startup says the fix for AI errors is a weaker model, not a smarter one

A startup called Probably has raised $9 million in a seed round co-led by Andreessen Horowitz and Accel to catch AI factual errors using a weaker model instead of a smarter one. The company's 'verifiable data agent' runs answers through a deterministic validator that checks against actual data, aiming for 99.99% accuracy while reducing costs and running locally on open-source database DuckDB.

read3 min views1 publishedJun 16, 2026

Most of the AI industry is trying to fix hallucinations by building bigger, smarter models. A startup called Probably is betting on the opposite.

The company has raised $9m in a seed round co-led by Andreessen Horowitz and Accel, with Tokyo Black and Vermilion Cliffs Ventures, to catch AI’s factual errors before they ever reach a user. It is aiming for the 99.99% accuracy that ordinary software takes for granted but large language models rarely hit.

Its trick is to lean on the model less, not more. Probably’s first product, a local ‘verifiable data agent’ that answers questions from messy datasets, runs each answer through what founder Peter Elias calls a ‘data science mech suit’.

A harness, not a bigger brain #

The model takes a first pass, then a separate, deterministic validator checks the answer against the actual data and bounces anything that does not match. The model is trained against that validator, and every result ships with a citation and an audit trail.

‘The better your harness engineering is, the weaker the model can be,’ Elias says. Reduce the ambiguity enough, the argument goes, and the AI barely has to think.

That has a striking consequence for cost. Probably’s tool runs on a model Elias describes as ‘four classes weaker’ than the frontier, small enough to run on a desktop rather than a data centre, which strips out most of the token bill.

It also doubles as a privacy pitch. The whole thing runs locally on the open-source database DuckDB, and the company says the model only ever sees metadata and statistics, never the raw data, which stays on your machine.

Aimed at the token-cost backlash #

The timing is pointed. Companies are watching AI bills balloon even as per-token prices collapse, and a tool that delivers accuracy on cheap, local hardware speaks directly to that anxiety.

It also lands where errors hurt most. Probably says the same engine could extend to accounting or medical work, any ‘precision-sensitive’ job, the kind where a confident wrong answer is the whole problem, as researchers warning about hallucinations in science keep pointing out.

A provocative claim, and the catch #

Elias goes further, arguing the big labs have not built this because ‘they make money the more times you have to correct the model’. It is a tidy sales line, and a contestable one: the major labs pour resources into cutting hallucinations, and a smaller player has every reason to cast itself as the honest broker.

The bigger caveat is scope. A validator only works when there is a hard ground truth to check against, such as a dataset, which is why Probably started with data rather than open-ended writing. It is a $9m seed, the product is in public preview at version 0.1, and the 99.99% figure is still a goal, not a result. But in a market crowded with attempts to tame hallucinations, betting on smaller models is at least a refreshingly different wager, and one a16z and Accel were willing to fund.

Get the TNW newsletter #

Get the most important tech news in your inbox each week.

── more in #large-language-models 4 stories · sorted by recency
── more on @probably 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/this-a16z-backed-sta…] indexed:0 read:3min 2026-06-16 ·