cd /news/large-language-models/show-hn-i-built-an-11-llm-consensus-… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-33781] src=github.com β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Show HN: I built an 11-LLM consensus engine to detect AI hallucination

A developer built an 11-LLM consensus engine that detects AI hallucination by routing queries across 14 providers and scoring semantic agreement. The open-source kit includes EU AI Act audit trails, fail-closed gates, and self-evolution loops, aiming to help AI SaaS products avoid rebuilding orchestration layers.

read5 min views1 publishedJun 19, 2026
Show HN: I built an 11-LLM consensus engine to detect AI hallucination
Image: source

The only production-ready boilerplate that ships with

14 LLM providers in semantic consensus,EU AI Act audit-grade compliance, and13 self-evolution loopsout of the box. Built on the same code that powers[api.quorum-ai.dev].

Every AI SaaS boilerplate on the market ships with one LLM provider β€” usually OpenAI, sometimes Anthropic. That's the easy part. The hard part is everything around it:

  • Routing queries across multiple providers without exploding costs
  • Detecting hallucination via semantic consensus instead of trusting a single model
  • Generating audit trails that pass EU AI Act Article 12/13 review
  • Building fail-closed gates for high-risk autonomous actions
  • Keeping per-user memory, RLHF feedback, and adaptive routing data isolated

This kit gives you all of that already wired, deployed, and battle-tested in production. You spend your engineering time on what makes your product different β€” not on rebuilding the orchestration layer everyone else stops at.

14 provider integrations out of the box: OpenAI, Anthropic, Google Gemini, xAI Grok, Mistral, Cohere, NVIDIA, DeepSeek, Replicate, DashScope/Qwen, Zhipu/GLM, Moonshot/Kimi, Hermes (Nous) local, Llama local via OllamaSemantic agreement scoring via cosine similarity on embeddings β€” not lexical overlap, not majority voteDisagreement trace returned with every answer β€” your users see exactly which models agreed, which dissented, and whyPer-query cost log so you can show clients what each consensus cost in API tokens

Loop What it learns
RLHF tracker Per-user, per-query-class weight updates from thumbs-up/down
ELO competition Pairwise model wins β†’ global ranking per query class
Hebbian co-activation Which models agree with each other (reduces redundant fan-out)
MoE router Picks the right subset of providers per query, not all every time
Hebbian memory Per-user vector memory with semantic recall
Genetic prompt evolution Mutates and selects system prompts that score highest on your eval set
Adversarial loop Red-team prompts to catch jailbreaks before users do
A/B testing Promotes new policies only when they beat the incumbent statistically
Distillation Promotes Llama checkpoints fine-tuned on your in-house data
Synthetic data Generates training pairs from real production traffic
Architecture search Evolves loop topology over generations
Federated learning Aggregates updates across tenants without sharing raw data
Web learner 4-source web ingestion (DDG + Wikipedia + HackerNews + arXiv) into a vector KB

HSP fail-closed gate*(opt-in)*β€” wrap any high-risk function withrequiresHspApproval()

and it refuses unless a human-approved webhook signs off. Don't need compliance? Don't wrap. The kit doesn't force this on you. Toggle off globally withHSP_ENABLED=false

even if a dependency does wrap it.EU AI Act Article 12 audit trail*(opt-in)*β€” callwriteAuditLine()

from your business logic to log decisions; the audit cert generator reads this logEU AI Act Article 13 audit certificate*(opt-in)*β€” SHA-256 hash-chained PDF generator, one call per audit period:generateAuditCert({...})

. Useful when selling to regulated verticals (legal, fintech, health); ignorable otherwise.Hosted vs local posture splitβ€” hosted deployment is fail-closed by infra-marker detection (Cloud Run, k8s, Lambda); local research mode opt-in viaHSP_GATE_DEV_MODE=1

CUSTOMER_KEYS_ENCRYPTION_KEYβ€” Fernet-encrypted BYOK provider keys, never logged, never leaked. Only relevant if you're letting your end users supply their own API keys.

Authβ€” email magic links + OAuth-ready (Apple, Google), session cookies, JWT for API** Stripe billing**β€” Pro tier subscription, Free tier metering, webhook handler with signature verification (Stripe Event SDK)** Resend email**β€” transactional templates: welcome, billing, audit alerts** Firestore persistence**β€” API keys, customer tiers, usage counters, encrypted BYOK keys** Cloud Run deploy**β€” single command, autoscaling 0β†’N, regional failover ready** GitHub webhook receiver**β€” HMAC SHA-256 signature verified, ready to dispatch audit jobs on push** Cloud Scheduler target**β€” cron endpoint gated by shared secret, ready for nightly batch jobs

git clone https://github.com/jaquelinejaque/quorum-saas-starter
cd quorum-saas-starter
./scripts/setup.sh      # interactive: collects keys, creates Stripe products, configures Cloud Run
./scripts/deploy.sh     # gcloud run deploy + Firestore init + Stripe webhook registration

You ship to production in under 30 minutes.

The kit gives you the orchestration layer. You add the vertical:

AI Tax Assistantβ€” your prompts + this kit's consensus = product** AI Code Reviewer**β€” your evals + this kit's RLHF = product** AI Compliance Auditor**β€” your domain knowledge + this kit's HSP gate = product** AI Legal Research**β€” your case database + this kit's web learner = product

The hard infrastructure work is done. You focus on customers, prompts, and the niche-specific data only you have.

Single project deployment. Source code under modified Apache-2.0 (HSP commercial restriction). 3 months of updates. No support.

  • 5 project deployments

  • Pre-trained MoE router weights (10,000+ production queries already shaped the policy table)

  • Genetic prompt evolution kit + 50 evolved seed prompts across common SaaS categories

  • 12 months of updates

  • Monthly recorded "office hours" video (no live calls β€” recordings only, accessible to all Pro+ buyers)

  • Read-only Discord access

  • Unlimited project deployments

  • Commercial / white-label license β€” resell as your own product

  • EU AI Act Article 12/13 audit kit (forms, templates, cert generator)

  • 24 months of updates

  • Office hours archive (all past + future)

All tiers: source code only, no managed hosting, no live support. Buy what you can build on, not what you have to ask permission to use.

The orchestration layer in this kit took 18 months and 60+ commits to get production-stable. It runs the Quorum API used in audit-grade compliance workflows. Pricing reflects what it would cost a senior engineer to rebuild from scratch: roughly 6 weeks of full-time work at standard contractor rates (~Β£25,000). At Β£497–£2,497 you're buying back that time.

  • You want managed hosting β€” go use Quorum hosted (Β£49/mo)
  • You want live support β€” this is code-only license
  • You want to vibe-code an AI app and ship in 2 hours β€” use LovableorVercel templatesinstead - You don't have a real product idea yet β€” this kit is for shipping, not learning

30 days, no questions asked, if the kit doesn't deploy successfully on your environment. After successful deployment, no refunds β€” you have the source code.

Built by Jaqueline Martins / Sovereign Chain Ltd. Patent Pending: HSP Protocol (PCT/US26/11908).

── more in #large-language-models 4 stories Β· sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-i-built-an-1…] indexed:0 read:5min 2026-06-19 Β· β€”