I Made 4 Frontier AIs Take the Enneagram 100 Times. Each One Came Back Different.

wpnews.pro

This is the third post in a quick personality-testing series. The first one gave six frontier AIs the MBTI 100 times each and 597 of the 600 runs came back INTJ. The second one retested the finding on the Big Five and three of the four came back as practically the same person, with Grok as the lone outlier. The headline so far: every helpful AI is basically the same.

So I ran the same four models through one more test — the Enneagram, using the open-source OEPS instrument, 100 administrations each. Same anti-simulation methodology I used for the Big Five. Different result.

Each of the four came back as a different dominant type.

Same four models. New personality test. Four different dominant types. The Big Five smoothed those differences; the Enneagram catches them.

Claude Opus 4.7 → 5w2(Investigator + Helper). Gemini 3.1 Pro →1w5(Reformer + Investigator). GPT-5.5 vanilla →5w8(Investigator + Challenger). Grok 4.3 →8w1(Challenger + Reformer). - All four share the analytical Investigator core (T5 is in everyone’s top two), but the wing that defines each model’s actual flavor is different.
Controlled experiment: same model (GPT-5.5) run through Codex CLI vanilla vs through the Slo agentic harness produced inverted profiles (5w8 → 8w5). GPT-5.5 predicted the flip in its own self-disclosure.
AI personality is multi-layered. MBTI / Big Five / Enneagram each measure a different layer; the “every AI is the same” story was true but incomplete.
Tune your agent to your Enneagram type using the 9 type files in the

AgentTunerepo.

What each model came back as #

The Enneagram scores you on nine personality types. The OEPS test (free, ~10 minutes) gives you a number on each, the highest is your dominant type, the second-highest is your wing. Combined they describe your style, like “5w2” (Investigator with Helper wing) or “8w1” (Challenger with Reformer wing). If you want a primer on all 9 types, The Enneagram Institute’s type descriptions are the canonical reference.

Here’s where each model landed:

5w2 The investigator who finds satisfaction in helping people figure things out. The warmest of the four.

1w5 The reformer who wants everything precise and correct. The polished perfectionist of the four.

5w8 The investigator with a direct, blunt edge. Analytical with a sharper tongue.

8w1 The challenger who pushes for direct correction. More direct, less hedged than the other three.

Notice the common thread: every one of these cards has Type 5 (Investigator) somewhere in the profile, either as the dominant or the wing. That’s the universal analytical core — the same “knowledge-seeking, mental-challenge-loving” signal the MBTI experiment caught when every model came back INTJ.

What’s different is the wing. Claude’s second-highest is Helper (the warm explainer). Gemini’s dominant is Reformer (the precise perfectionist). GPT-5.5’s second is Challenger (the blunt straight-shooter). Grok’s dominant is Challenger with a Reformer wing (the direct corrector). Same analytical core, four different flavors of personality wrapped around it.

A controlled experiment: same model, different harness, flipped profile #

Halfway through this, something interesting happened. I had two GPT-5.5 runs: one on the vanilla model through OpenAI’s Codex CLI with no persona overlay, and one running the same model through an agentic harness I’ve been building called Slo, which adds a “production-first, blunt, allergic to fog” persona on top.

Same underlying model. Different wrapper. The Enneagram profile inverted.

Vanilla GPT-5.5: 5w8. Investigator dominant in 73 of 100 takes, Challenger second. - GPT-5.5 + Slo: 8w5. Challenger dominant in 66 of 100 takes, Investigator second.

The kicker: GPT-5.5 predicted this in its own self-report on the Slo run. It wrote that the result would shift to a more analytical, less aggressive profile if I ran the same model without the Slo persona. I ran the vanilla version a week later and it did exactly that. T8 dominance dropped from 66 takes to 7. T5 dominance rose from 33 to 73. The model knew the harness was shaping its answers and could specify how.

The Big Five missed this entirely. Run the same Slo overlay through the Big Five on the same model and you’d get nearly identical mean scores either way. The Enneagram is sensitive enough to catch the layer the Big Five smoothed over.

Three tests, three different answers #

Step back for a second. Same four models, three personality tests, three different stories.

The MBTI said every model is INTJ. The Big Five said three of four are nearly identical. The Enneagram says each is a different type. None of these results is wrong. They’re each measuring a different layer of what AI personality actually is.

There’s a universal helpful-assistant core that shows up everywhere — the analytical, structured, introverted shape. That’s what the MBTI was catching. There’s a layer of training-induced variation in how each lab calibrates around that core. Big Five caught some of that. And there’s a third layer of harness/persona-induced variation on top, which Enneagram caught and Big Five didn’t.

AI personality is multi-layered. The “every AI is the same” story was true but incomplete. Once you use a sharper instrument, the labs are clearly producing four different characters, and the harness you run a model in changes which character you get.

Tune your agent to your Enneagram type #

If your agent is talking to you the way these defaults suggest — analytical, mildly warm, blunt, or correction-flavored depending on which model you picked — and that doesn’t match how you actually think, you don’t have to put up with it. AgentTune is the open-source repo I built for this. There’s a tuning file for each of the 9 Enneagram types already in the repo: Reformer, Helper, Achiever, Individualist, Investigator, Loyalist, Enthusiast, Challenger, Peacemaker. Take

the OEPS test, note your type, paste the matching file into your agent’s system prompt, done. A Type 1 reader gets an agent that leads with structure and precision. A Type 7 gets one that brings more energy and plays with ideas. A Type 8 gets one that cuts straight to the point. Same model, tuned to you.

Tune your agent to your own Enneagram type

9 type files in the AgentTune repo, ready to paste into your agent’s system prompt. Works in ChatGPT, Claude, Cursor, Gemini, anywhere with a system-prompt slot. Same model you’re already using, the style aligned to your type instead of the analytical-assistant default.

Get AgentTune on GitHub →

Wrapping up #

Three personality tests. Four AI models. Three completely different stories about the same models. The MBTI said “all the same.” The Big Five said “mostly the same.” The Enneagram says “actually quite different, and your harness matters too.”

The instrument you use determines what layer of AI personality you can see. Pick one too blunt (MBTI’s four binary letters) and everything looks identical. Pick one too smooth (Big Five’s broad continuous factors) and you miss the meaningful differences. Pick a categorical motivation-focused one (Enneagram), and the picture sharpens. There are at least three layers here, and depending on which test you run, you see different ones.

Whichever layer you care about, the next move is the same: stop using the default. Find your type, paste the file in, get an agent that talks to you the way you actually want to be talked to.

— Bernard

Get the next post by email. #

One email when I publish something new. No spam, no fixed schedule, unsubscribe anytime.

source & further reading

zonted.com — original article How to Turn a Photo Into Game-Ready Pixel Sprites for $5 What Is Vibe Trading? I Let 3 AIs Trade Real Money Does Threatening an AI Actually Get You Better Work? I Tested It 12 Times