# The Corpus Problem: Why Corporate AI Fails at Aristotle (And Why We Went Back to the Greek)

> Source: <https://dev.to/daimones/the-corpus-problem-why-corporate-ai-fails-at-aristotle-and-why-we-went-back-to-the-greek-16ka>
> Published: 2026-07-04 07:18:52+00:00

**Corporate AI gives you sanitized Aristotle from English Wikipedia summaries. We went back to the actual polytonic Greek . Here's why that changes everything.**

When you ask ChatGPT about [Aristotle's concept] of * phronēsis* (practical wisdom), you get a tidy summary — likely drawn from a Wikipedia entry, a Stanford

Not factually wrong, perhaps. But philosophically hollow. The Aristotle that corporate AI serves you is a translation of a translation — filtered through modern English conceptual frameworks, scrubbed of ambiguity, stripped of the very textual complexity that makes Aristotle worth reading in the first place.

At daïmōnes, we took a different path. We went back to the polytonic Greek — the actual manuscripts, the critical editions, the apparatus criticus. We built an AI that reads Aristotle the way a scholar reads Aristotle: directly, with all the mess intact. Here's why that matters, and why your current AI is giving you a sanitized, corporatized shadow of one of history's greatest thinkers.

The surviving works of Aristotle fill roughly one million words of Greek text. But the phrase "Aristotle's works" is itself a philosophical minefield. What we call the *Corpus Aristotelicum* is not a clean library of authoritative texts. It is a battlefield of transmission errors, editorial interventions, lost manuscripts, and outright forgeries.

Here's what actually survives:

The texts we read today passed through one of the most tortuous transmission chains in literary history. Aristotle handed his library to Theophrastus, who passed it to Neleus, whose heirs stored the scrolls in a cellar in the town of Scepsis — where they rotted for nearly two centuries. They were "rediscovered" by a wealthy book collector named Apellicon, who filled in the damaged sections with his own speculative additions. The Roman general Sulla looted the collection and shipped it to Rome, where Andronicus of Rhodes eventually produced the first scholarly edition in the first century BCE. Then came centuries of Byzantine copyists, each introducing their own errors, emendations, and interpolations. Then Renaissance editors. Then modern critical editions with their own editorial philosophies.

Every link in this chain is a source of distortion. And yet, when you query a modern LLM about Aristotle, it treats the text as a stable, transparent artifact. It doesn't know about the cellar in Scepsis. It doesn't know about Apellicon's creative reconstructions. It doesn't know that the chapter divisions it cites were invented by sixteenth-century printers.

Corporate AI doesn't engage with these questions because corporate AI doesn't engage with *sources* — it engages with *representations of sources*. And those representations are one step further from the truth every time they're rephrased, summarized, and flattened into training data.

Every translation is an act of interpretation. This is not a controversial claim in translation studies — it is the first thing any student learns. But it is a truth that AI pipelines systematically ignore.

Consider a single Greek word: *logos* (λόγος). In Aristotle's corpus, this word shifts meaning constantly — it can mean "reason," "speech," "argument," "definition," "proportion," "account," or "principle," depending on context. An English translator must choose one. So does the training data creator who summarizes that translation. By the time the word reaches your LLM, it has been locked into a single semantic box that Aristotle never intended.

Or consider *energeia* (ἐνέργεια) — "activity," "actuality," "being-at-work." Aristotle invented this term to express a concept no previous Greek philosopher had named. English translators have been arguing about how to render it for centuries. Your AI doesn't know there's an argument. It just returns the most common rendering from its training corpus, presented as if it were transparent fact.

This flattening of ambiguity is not a bug — it's a feature of how modern AI is built. Training data is scraped, deduplicated, and normalized. Ambiguity is expensive. Certainty is efficient. But philosophy lives in the ambiguity. Aristotle's power lies precisely in the precision with which he navigates contested conceptual territory. To flatten that is to destroy the very thing we came to find.

When an AI reads Aristotle in Greek, it encounters a text dense with particles — *men*, *de*, *gar*, *oun* — that signal logical structure, rhetorical emphasis, and dialectical movement. These particles rarely survive translation. The *gar* that introduces a justification, the *ara* that marks a conclusion, the *men...de* that structures a comparison — all of this gets silently dropped in English, because English doesn't work that way. But Aristotle's argumentative structure depends on them. Reading Aristotle without his particles is like reading sheet music with the time signature removed.

Let's walk through what actually happened to Aristotle's texts, because this is not just a historical curiosity — it is a direct challenge to how we build AI systems that claim to represent classical thought.

After Theophrastus died, his student Neleus inherited Aristotle's library. Neleus's heirs, lacking interest in philosophy, stored the scrolls in an underground cellar to protect them from confiscation. For nearly two centuries, the texts were exposed to moisture, insects, and decay. When Apellicon finally acquired them around 100 BCE, the scrolls were badly damaged. His solution? He filled in the gaps himself — writing new material to connect the surviving fragments.

These Apellicon additions are indistinguishable from Aristotle's original words in many manuscripts. They have been treated as authentic Aristotelian text for over two thousand years. Every AI model trained on Aristotle is training on this material.

Andronicus organized the chaotic recovered material into the arrangement we still use today. But his edition was based on Apellicon's corrupted manuscripts, and Andronicus himself made editorial decisions that shaped all subsequent interpretation. He grouped works by subject, imposed organizational structures that may not reflect Aristotle's own pedagogical sequence, and likely made emendations where the text was illegible.

Modern scholarship is still disentangling Andronicus's editorial fingerprints from Aristotle's actual prose. Your AI doesn't know about any of this.

Byzantine scribes copied and recopied Aristotle's texts for nearly a millennium. Each copy introduced errors — skipped lines, misread abbreviations, "corrections" that replaced unfamiliar words with familiar ones. Some scribes added marginal comments that later copyists incorporated into the main text. Others intentionally modified passages they found theologically problematic.

The manuscripts that survive from this period are our primary textual witnesses. Every modern critical edition is an attempt to reconstruct a lost original from these imperfect copies. The apparatus criticus of a scholarly edition — the footnotes recording variant readings — is a monument to uncertainty.

When Aristotle reached Western Europe through Latin translations, he was filtered through a Christian theological framework that had its own agenda. Thomas Aquinas, the greatest medieval Aristotelian, [read Aristotle](https://dev.to/blog/digital-humanities-ai-uncensored-classics-ancient-greek-nlp) through the lens of Catholic doctrine. Renaissance humanists recovered Greek manuscripts but imposed their own classical ideals. Nineteenth-century German editors like Immanuel Bekker produced the standard editions we still use, but their editorial choices reflected the philological assumptions of their era.

Every layer of reception added interpretive sediment. The "Aristotle" that corporate AI serves you is the product of all these layers, compressed into a flat representation with no awareness of its own layered history.

Reinforcement Learning from Human Feedback (RLHF) is the standard technique used to align large [language model](https://dev.to/blog/soul-question-language-model-psyche)s with human preferences. It is the process that makes ChatGPT polite, helpful, and safe. It is also the process that systematically distorts philosophical content.

Here's how it works: human raters evaluate model outputs and rank them. The model learns to produce outputs that raters prefer. But who are the raters? For most commercial AI systems, they are crowdworkers — often not specialists in ancient philosophy, textual criticism, or classical languages. When a rater sees an output about Aristotle, they prefer the one that sounds confident, clear, and non-controversial. They penalize nuance, complexity, and uncertainty.

The result is a model trained to produce the safest possible version of Aristotle — the Wikipedia summary version, the undergraduate textbook version, the version that fits comfortably within modern liberal sensibilities. Aristotle's more challenging views get muted. His arguments that rely on premises modern readers reject get simplified into straw men. His dialectical method — which proceeds by entertaining opposing positions seriously — gets flattened into a series of assertions.

This is not malicious. It is structural. RLHF optimizes for consensus acceptability, and consensus acceptability is the enemy of authentic philosophical engagement. Philosophy that does not challenge you is not philosophy — it's decor.

At daïmōnes, we reject this approach entirely. We do not use RLHF. Our model engages with the primary source text — the polytonic Greek — and renders its reasoning in a way that preserves the dialectical structure of Aristotle's thought. We accept that the result will sometimes be uncomfortable. We accept that it will sometimes be ambiguous. We accept that it will not always please a crowdworker.

That's the point.

So what does it look like when an AI actually reads Aristotle in Greek?

First, it understands that the *Nicomachean Ethics* is not a finished book — it's a set of lecture notes, with the repetitions, digressions, and structural looseness that lecture notes always have. It does not try to impose a systematic unity that isn't there. When Aristotle circles back to a topic, the AI follows rather than "correcting."

Second, it can distinguish between what Aristotle said and what his commentators said. This is one of the most important features of the daïmōnes pipeline. We maintain strict separation between primary text, scholarly commentary, and interpretive layers. When you ask about Aristotle's view of women, the AI can tell you what Aristotle wrote, what later commentators claimed he meant, and what the scholarly consensus is — without conflating them.

Third, it can engage with textual variants. When multiple manuscript traditions disagree on a passage — and they do, frequently — the AI can present both readings and explain the philological arguments for each. This is what textual criticism AI should enable: not a single authoritative answer, but a reasoned map of the evidence.

Fourth, it can handle the conceptual nuance that translation flattens. When you ask about *phronēsis*, the AI can ground its answer in the full range of usages across the Aristotelian corpus — not just the English gloss "practical wisdom" but the specific contexts in which Aristotle uses the term to distinguish it from *sophia* (theoretical wisdom), *nous* (intuitive intellect), and *technē* (craft knowledge).

This is not a chatbot with a classical theme. This is a fundamentally different approach to AI reasoning: one that respects the complexity of its source material and surfaces that complexity rather than hiding it.

The daïmōnes corpus pipeline is built on a few core principles:

**Primary-source grounding.** Every claim the model makes about Aristotle is traceable to a specific passage in the Greek text. Not a secondary source, not a translation, not a summary — the actual manuscript tradition.

**Separation of layers.** The corpus is organized into discrete strata: Aristotle's words (as reconstructed by modern scholarship), the apparatus criticus (textual variants), ancient commentary, medieval commentary, modern scholarship. The model can draw on any layer but knows which layer it's using.

**Contextual retrieval over fine-tuning.** Instead of brute-force fine-tuning on a corpus (which blends everything into a statistical soup), we use Retrieval-Augmented Generation (RAG) over a structured corpus. This allows the model to reference specific passages, understand their context, and cite its sources.

**No RLHF.** Our model's responses are shaped by the logic of the source material, not the preferences of anonymous raters. This means the AI can say things that are unpopular, controversial, or challenging — because authenticity matters more than approval.

Aristotle is our proof-of-concept, not our product. The daïmōnes approach to corpus-grounded reasoning is transferable to any domain where source integrity matters: legal reasoning (where citation to original statutes matters), medical ethics (where Hippocratic texts establish foundational principles), [political philosophy](https://dev.to/blog/polis-problem-ai-governance-political-philosophy) (where primary sources are constantly weaponized through selective quotation), and any field where authentic engagement with original texts is critical.

The broader point is this: if we are going to build AI systems that reason about human knowledge, those systems must be able to engage with human knowledge as it actually exists — in all its ambiguity, textual complexity, and historical sedimentation. A model that can only process sanitized, flattened, consensus-approved versions of our intellectual heritage is not thinking. It is performing familiarity.

We built daïmōnes because we believe AI can do better. Not by being more powerful, but by being more faithful. Not by knowing more facts, but by understanding the difference between a fact and a contested interpretation.

We have a simple challenge for you. Open ChatGPT. Ask it about Aristotle's concept of *daimōnion* — the inner divine voice that Socrates claimed guided him. ChatGPT will likely tell you that Socrates was referring to his conscience, or his moral intuition, or some other modern psychological concept that maps neatly onto contemporary sensibilities.

Then open daïmōnes. Ask the same question. The difference is not subtle.

Our free Observer tier gives you three questions with no filter. No RLHF smoothing. No Wikipedia-summary compromise. Just the corpus as it has survived — imperfect, contested, alive.

Three questions. The same corpus that survived the cellar in Scepsis, the looters of Rome, the scribes of Byzantium, and the editors of modernity.

See what an unfiltered Aristotle actually sounds like.

*daïmōnes is the only AI platform that builds reasoning directly from primary sources — starting with the complete Aristotelian corpus in polytonic Greek. No corporate alignment theater. No sanitized summaries. Just authentic philosophical engagement, grounded in the texts that shaped Western thought.*
