{"slug": "synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-here-s-what-we-learned", "title": "Synthadoc: We Built an AI Judge for Our AI Wiki Compiler - Here's What We Learned", "summary": "Here is a 2-3 sentence factual summary of the article:\n\nThe article describes Synthadoc, an open-source LLM knowledge compiler that synthesizes raw source files into structured wikis, and details its v0.5.0 update which introduces an adversarial validation feature. This feature uses a separate, independent AI model to review each page for overstated or unsupported claims, flagging concerns directly in the page's metadata. The authors argue this process provides a documented audit trail that is more reliable than having the same model review its own output, though it is not a guarantee of accuracy.", "body_md": "There's a particular kind of anxiety that comes from reading an LLM-compiled document you wrote six months ago. The prose is clean. The structure is coherent. And then you spot a sentence like \"this approach became the industry standard by the late 1990s\" - and you have no idea whether that came from your source material, or whether the model just... said it.\nThat's the problem we've been chipping away at with Synthadoc, an open-source LLM knowledge compiler. You feed it raw source files - PDFs, text docs, YouTube transcripts, web pages - and it synthesises them into a structured, queryable wiki. We shipped structural lint early on: contradiction detection, orphan page checks, broken link validation. But structural checks don't tell you whether the content is trustworthy. They tell you the wiring is correct, not whether the building should fall down.\nIn v0.5.0 we tackled this directly with two features. Here's how they work and why we think they matter.\nThe first instinct when you want to validate LLM output is to ask the same model to review itself. This doesn't work well. A model that just synthesised a page with a particular framing will tend to confirm that framing when asked to check it. It's not hallucinating - it's just consistent in the way humans are consistent with their own reasoning.\nThe fix is simple in principle: use a different model to review the output. A judge with different training data, different inductive biases, different tendencies to hedge or assert. In practice, that means configuring a second model - ideally from a different provider entirely - to act as a sceptical editor after each lint run.\n# config.toml\n[agents]\nlint = { provider = \"minimax\", model = \"MiniMax-M2.5\" }\nadversarial = { provider = \"anthropic\", model = \"claude-sonnet-4-6\" }\nThe adversarial pass runs after structural checks complete. Each page is sent to the judge with a single brief: find claims that are overstated, unsupported, or contradicted elsewhere in the source material. Results come back as {claim, concern}\npairs, capped at a configurable limit per page (default: 2 - enough signal without drowning the author in noise).\nWarnings are written directly into each page's YAML frontmatter:\nlint_warnings:\n- claim: \"Saved over fourteen million lives.\"\nconcern: \"This figure lacks scholarly consensus — historians dispute both\nthe precision and the causal attribution to Turing's cryptanalysis alone.\"\nThe field is absent when no warnings exist, and cleared automatically if you run --no-adversarial\n, so stale warnings never persist past the last lint run that produced them.\nWhen we ran this on the history-of-computing demo wiki for the first time, the judge flagged a paragraph about the second AI winter that described symbolic approaches as having been \"largely abandoned.\" The compilation model had written that confidently. The judge noted it was contested - hybrid approaches continued in some research communities throughout the period. That's exactly the kind of subtle overstatement that's invisible to a format checker, and invisible to the model that wrote it.\nThe adversarial pass produces documented evidence that a second, independent reviewer assessed each page. For enterprise knowledge bases - compliance documentation, research synthesis, internal policy wikis that audit trail has real value. \"The LLM wrote it and we ran a second LLM over it with a different model\" is a meaningfully stronger claim than \"the LLM wrote it.\" Not a guarantee of accuracy. A documented process.\nThe adversarial pass tells you a claim might be wrong. Provenance tells you exactly where it came from so you can check yourself.\nThe core mechanism is a citation annotation pass that runs during ingest, immediately after each page section is written. The model receives the numbered source text alongside the compiled paragraph and returns the paragraph with a ^[filename:L–L]\nmarker appended:\nAlan Turing proposed the Turing Test in 1950.^[turing-biography.txt:12-24]\nThe marker encodes the source filename and the exact line range in the raw document that supports the claim. These markers are stored in the page body, recorded in the audit database, validated by lint, and - in Obsidian - rendered as interactive chips in Reading View.\nClick a chip and you get the Source Viewer: the referenced lines, highlighted, with surrounding context. For PDF sources, a pagemap sidecar resolves line numbers to the correct PDF page so the viewer offers an \"Open PDF at page N →\" button that navigates directly to the passage.\nThe annotation pass has a fallback: if the LLM fails, returns unparseable output, or references line numbers that don't exist, the original un-annotated section is used and the failure is recorded as an audit event. Ingest always completes. Results are also cached by section SHA-256, so re-ingesting an unchanged file doesn't incur an extra LLM call just to re-annotate the same paragraphs.\nEvery citation is stored in claim_citations\nin the audit database - page slug, source file, line range, and a 100-character excerpt of the annotated paragraph. You can query it directly:\nsynthadoc audit citations -w history-of-computing --page alan-turing\nsynthadoc audit citations -w history-of-computing --broken\nOr browse the whole wiki via Synthadoc: View Page Provenance in the Obsidian command palette - a sortable, paginated table where every row opens the Source Viewer for that citation's exact line range.\nTraceability is one of the hardest blocks to enterprise AI adoption. \"The model synthesised this from your documents\" is not a citation. A recorded ^[turing-biography.txt:12-24]\nthat links back to the primary source and surfaces that passage on click, that's closer. It doesn't eliminate the need for human review, but it makes human review fast enough to actually happen.\nThese features are complementary and designed to be used together. Provenance tells you where a claim came from. Adversarial review tells you whether you should trust it. Neither is sufficient alone.\nA page that has full citation coverage and a second-model lint review is not a guaranteed-accurate page. But it's a page where you know where every claim came from, and where an independent model has registered any concerns it had. That feels like the right foundation for knowledge systems that get used in high-stakes contexts, as opposed to knowledge systems that sit in a folder and get ignored because nobody trusts them.\nThe annotation pass is itself an LLM call, which means it can make mistakes: misidentifying the supporting line range, over-annotating trivial sentences, or under-annotating genuinely sourced claims. We've found it to be reasonably accurate in practice, and the fallback-to-unannotated behaviour means a bad annotation result doesn't corrupt the page. But it's not ground truth. Treat it as strong evidence, not proof.\nSimilarly, the adversarial judge is only as good as the model you point it at. A judge that's too agreeable produces noise. A judge that's too aggressive produces fatigue. The adversarial_max_per_page\ncap (configurable, default 2) helps, but choosing the right model for the right domain still takes some experimentation.\nIf any of this is interesting or you're thinking through similar problems in your own knowledge tooling, feedback is genuinely welcome. The project is at https://github.com/axoviq-ai/synthadoc , and stars all appreciated.", "url": "https://wpnews.pro/news/synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-here-s-what-we-learned", "canonical_source": "https://dev.to/paul_chen_90371fe7426cb44/synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-heres-what-we-learned-13k3", "published_at": "2026-05-23 17:19:15+00:00", "updated_at": "2026-05-23 17:34:54.177384+00:00", "lang": "en", "topics": ["large-language-models", "open-source", "developer-tools", "artificial-intelligence", "machine-learning"], "entities": ["Synthadoc"], "alternates": {"html": "https://wpnews.pro/news/synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-here-s-what-we-learned", "markdown": "https://wpnews.pro/news/synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-here-s-what-we-learned.md", "text": "https://wpnews.pro/news/synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-here-s-what-we-learned.txt", "jsonld": "https://wpnews.pro/news/synthadoc-we-built-an-ai-judge-for-our-ai-wiki-compiler-here-s-what-we-learned.jsonld"}}