LLM evaluation tools like Patronus AI excel at hallucination detection, toxicity checks, and semantic relevance. But they don't catch the structural failures:
These aren't hallucinations. They're verification failures.
correctover-patronus is an adapter that runs Correctover's 87 deterministic verification rules as native Patronus evaluators. Every verdict comes with a recomputable proof hash β meaning you can verify the verifier.
pip install correctover-patronus
| Dimension | What It Checks | Example |
|---|---|---|
| Structure | ||
| Output format validity | JSON parses correctly | |
| Schema | ||
| Field presence & types | Required fields exist | |
| Identity | ||
| Semantic relevance to input | Response addresses the question | |
| Integrity | ||
| Forbidden pattern absence | No Tracebacks or error messages | |
| Latency | ||
| Response time budget | Under 30s threshold | |
| Cost | ||
| Token usage budget | Under 10k token limit |
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig
config = CorrectoverConfig(
min_confidence=0.7,
latency_rules={"max_ms": 5000},
cost_rules={"max_tokens": 4000}
)
evaluator = CorrectoverEvaluator(config=config)
result = evaluator.evaluate(
task_input="Summarize this article...",
task_output="The article discusses...",
task_context={"source": "article", "word_count": 1500}
)
print(f"Overall: {result.score:.2f} ({'PASS' if result.pass_ else 'FAIL'})")
print(f"Proof hash: {result.metadata['proof_hash']}")
for dim, info in result.metadata['dimensions'].items():
print(f" {dim}: {info['status']} (score={info['score']:.2f})")
python
from correctover_patronus import correctover_structure, correctover_integrity
is_valid = correctover_structure(task_output='{"key": "value"}')
is_clean = correctover_integrity(task_output="Result: 42")
python
from correctover_patronus import correctover_full
results = patronus.evaluate(
evaluators=[correctover_full],
dataset=my_dataset,
experiment_name="correctover-benchmark"
)
Every evaluation produces a proof_hash
in the metadata. This hash covers:
You can re-run the same verification and get the same hash. No black boxes.
βββββββββββββββββββ ββββββββββββββββββββββββ
β Patronus AI βββββ>β correctover-patronus β
β Framework β β (this adapter) β
βββββββββββββββββββ ββββββββββββ¬ββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β Correctover SDK β
β (87 rules, 6 dimensions) β
β P50 verification: 22us β
ββββββββββββββββ¬βββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β Verification Request β
β -> Verdict + Proof Hash β
β -> Metadata + Tags β
βββββββββββββββββββββββββββββββ
*Failover β Correctover.*β’
*Correctover verifies. Patronus evaluates. Together: complete output assurance.