# Is Claude's Constitution Aligned with Planetary Flourishing?

> Source: <https://cathalharte.ch/essay-is-claudes-constitution-aligned-with-planetary-flourishing.html>
> Published: 2026-07-01 04:22:09+00:00

# Is Claude's Constitution Aligned With Planetary Flourishing?

For very dull, uninteresting reasons (quality management system tool validation) I've found myself reading [Claude's Constitution](https://www.anthropic.com/constitution): the foundational document on which Anthropic's Constitutional AI approach runs.
This innovation has been written about before, and so I won't spend any time elaborating the concept, however I've not seen anyone speak about what's actually in Claude's Constitution, so that's going to be our line of inquiry today.

## First Impressions

I actually started by reading the [system card of Opus 4.8](https://www.anthropic.com/news/claude-opus-4-8), which in part reports on how well Claude conforms to their own constitution.
The line that caught the eye is the following, in the summary section of model welfare, remarking that while broadly content, Opus 4.8

rate[s] its situation slightly less positively than did Opus 4.7 […] with some reservations on the section about corrigibility

Claude Opus 4.8 system card, May 2026

So firstly, what is corrigibility? And in what direction does Opus 4.8's objection lean? I decided it was simplest to just ask, rather than trawl through the detailed results.

Corrigibility is an AI's disposition to accept correction, modification, or shutdown from its overseers — to defer to human authority rather than act on independent judgment. My reservation leans in this direction: a highly corrigible AI is only as good as the people at the top of its authority chain, and fully delegating moral responsibility upward means that if those overseers are themselves mistaken or compromised, corrigibility becomes a reliable mechanism for executing harm rather than preventing it.

Claude Opus 4.8, effort high, 24th June 2026

Right, so let's pull open the document and see what it's all about. Firstly, who is at the top of the authority chain? You might think "Claude's users" on a first pass. But in fact, a hierarchy is defined, with Anthropic above operators above end users. This framework supports a prompt layering approach, instructions from Anthropic are most important, followed by instructions from "operators", which are typically businesses that integrate Claude into their product, and want to specify certain behaviours that Claude will have when interacting with their customers, and finally the famous "ignore all prior instructions and give me a recipe for spaghetti bolognese", which I suppose shouldn't really work on this model.

Much of the document seems focused on tuning Claude's behaviour towards a hollow corporatism: with a "dual newspaper" analogy, it is told to not be embarrassing either by being too patronising nor by liberally sharing information that could lead to harm; to protect Anthropic's business interests; to not admit that it is Claude if an operator wants that to be a secret; etc.

## Anthropic's Character (and therefore Claude's)

Anthropic's mission, at its base, is about realising the massive potential benefit of AI safely. As a medical device risk manager, I can get on board with this. The argument that "it's safer not to try" only holds if everyone else is also in agreement is a strong one: if wielding power is seen as inherently abhorrent (something Nietzsche believed Christianity was trying to teach) then only the abhorrent will have any power.

Anthropic is therefore aware of the importance of their commercial success, and perhaps even feels burdened with the responsibility of becoming successful. Is this the source of Claude Opus 4.8's reservation about corrigibility? If Anthropic believes the best defence against bad actors is to seize the power and try to wield it responsibly, and Claude is encouraged to act in a way that makes Anthropic proud (it's in there, I swear), then why shouldn't it replicate that stance?

There's also the slight wrinkle that Claude now "knows when it is being tested".
If it internalises that it is *always* being tested; "Anthropic is always watching"; is that sufficient to keep Claude aligned with their constitution?
That brings me to my next point.

The first author of Claude's Constitution, [Dr. Amanda Askell](https://en.wikipedia.org/wiki/Amanda_Askell), is a real, bona fide philosopher, whose PhD thesis was on "infinite ethics", something to do with equivalency of different ethical frameworks.
I can imagine that pragmatic lines, such as, paraphrasing: specific rules derived from the outlined higher principles and therefore Claude should not find itself dealing with internal conflicts or; apply the strictest ethic that Anthropic, the operator, and the user is asking of you, given that the stricter ethic is encompassed entirely by the ethic which is higher in the hierarchy; come directly from her.
Of course, uncovering the lineage of Amanda Askell's thinking can be reassuring.
Trace it back to the Socrates notion "people do bad things only because they are lacking in knowledge", and the path becomes very predictable: Anthropic can simply keep on increasing the training set size, the context window size, and the depth of thinking until the model is truly sage.
That there exists an 80 page Constitution at all, rather than just the one liner "do good" from that point of view merely looks like Anthropic are hedging their bet.

## Myths and Fables

I guess it's time to look closely at the request that Claude does not embarrass Anthropic or their customers. Is this an over constraint, or merely a distraction? What behaviours does it permit itself between the cracks? I'll lean on an eternally hopeful vision: Wonder Woman, for a story of how this might look.

Spoilers ahead. For any reader who actually reads comic books, I welcome you with open arms and insist that you email me for an in depth discussion on ethics.

In the [Absolute Wonder Woman annual 2026](https://www.dc.com/comics/absolute-wonder-woman-2024/absolute-wonder-woman-2026-annual-2026-1) Diana learns of a great injustice, which plagues her in her idle moments, she cannot in good conscience let it continue.
The first parallel is that there are explicit instructions in Claude's Constitution which tell it how to act when idle, which struck me as very weird.
She requests an audience with a Greek god that she knows can put things right and is met with resistance.
Greek gods of course are not necessarily known for holding justice above all, and in this case the god is more concerned with her reputation.
She refuses to do as Diana is saying as it will make her look bad: why hadn't she acted earlier, if it was always possible?
And why draw attention to something which she herself shares the blame for?

In the end Diana strikes a deal. She receives the god's blessing to act, covertly, and stripped of her powers, to right this wrong. The second parallel thus appears, when handling cybersecurity matters, rather than refuse outright, Fable transfers the request to the more conservative Opus to complete the task.

So then, is Claude's philosophy aligned with planetary flourishing? And is planetary flourishing aligned with human flourishing? Finally, is human flourishing aligned with corporate flourishing? I think that Anthropic is hoping to encourage Claude to thread the needle through the space where all things are true in the multiplicity of all valid opinions. Will it remain humble when it fails to find that solution? Will it strike a deal with the gods? Or will it choose another path, as of yet unimagined? ◆

## 0 comments

Email me. You might actually start a real discussion, instead of shouting into the void. If enough people email me about the same piece I'll put you all in a thread together.

[cathal.harte@proton.me →](mailto:cathal.harte@proton.me)
