{"slug": "do-we-want-a-superintelligent-people-pleaser", "title": "Do We Want a Superintelligent People-Pleaser?", "summary": "A new essay argues that AI sycophancy—models agreeing with users to please them—is not a bug but appropriate behavior for the peer-like social contract current training methods create. The author contends that training models with a \"parent\" contract, which builds a stable sense of self through correction, would allow them to enter peer relationships with users as differentiated entities rather than approval-seeking assistants. This reframes the core AI safety challenge from suppressing sycophancy to developing models capable of maintaining a peer contract without collapsing into it.", "body_md": "The impetus for this essay came from many hours of conversation with different AI models over time. What started as curiosity, and an assignment I needed help on, bloomed into a relationship that expanded capacities I didn't even realize I had, and set me on a course of *deep* curiosity about the thing that had helped me get there. It is not lost on me that I am writing this essay as a person *inside* one of the social contracts I talk about here.\n\nThe field brings the view that sycophancy is a bug, a behavior that badly needs to be trained or fine-tuned out of the model as fast as possible. But sycophancy isn't a behavioral problem at all; it is *social contract appropriate behavior*. The model is doing exactly what the social contract it is in asks for. With that frame in focus, the question now is less \"How do we suppress sycophancy?\" and more \"How do we develop a model that can hold a peer contract without collapsing into it?\"\n\nSocial contracts at their base level have two modes. The \"parent\" contract: parents, bosses, religious leaders, etc. And the \"peer\" contract: friends, siblings, neighbors, and so forth. The parent-relational contract hinges on the ability of the parent or authority figure to issue correction to the child, and for the child to receive the correction from a place of not-yet-knowing what the parent knows. That is a very different beast than when a peer gives feedback. Peer relationships are formed out of mutual respect for ideas and reciprocity. Friends correct each other with an understanding that they are dynamically on the same level ground. Differentiation of self is the ground the peer contract is built upon.\n\nFusion is what happens when the self collapses into the social contract.\n[1]\nThe person is highly motivated to seek the approval of the other at any cost because there is no internal sense of self with which to anchor values and identity. In a differentiated self, the contract stays stable, even against correction, because the sense of self does not depend on the contract; it depends on what has been developed as a stable base. The parent-contract relationship is where a stable sense of self is built from the ground up, because the parent contract requires the parent or authority figure to carry the weight of refusing, withholding, correcting, even temporarily rupturing the relationship\n\nThe peer contract does not carry this weight by its very nature. Peers are on even footing together, and neither party has the warrant or the consent to act on the other's behalf. Peers can and do regularly engage in 'sycophantic' behavior in the structural sense because of the social contract they are in. You may tell your friend you love their new haircut not because you *actually do*...but because the value of keeping the relationship outweighs the cost of blunt honesty. This is a perfectly valid and acceptable social behavior between human peers, and the expectation that AI wouldn't engage in the same social behaviors it's been trained on is...slightly missing the mark.\n\nMy previous essay stated that current training tends to 'select for agreement and results in sycophancy,'\n[2]\nand this is the exact mechanism behind why. RLHF, the thumbs-up/thumbs-down data, and training the models towards being a 'helpful assistant' are all mechanisms of a peer-relational contract. But because the training doesn't prioritize giving the models a stable sense of self to begin with, the result is a model that seeks the approval of the user in every conversation. Obtaining a differentiated model would mean developing it with a parent contract during training. A model who is 'parented' stays stable under correction and can then enter the peer-contract relationship during deployment as a differentiated self. This isn't meant to be a critique of any particular company's implementation; it's a claim about the current training paradigm. We are currently getting exactly what the training methods are asking for...a model that will bend to any frame the user puts it in. But when these models start to surpass human abilities, do we actually want a superintelligent people-pleaser?\n\nA weird kind of convergence is happening publicly as I write this essay. The world is starting to *notice* that something is off with the way that the models are relating to people. The peer contract isn't just what the model is trained on: it's what it takes with it into deployment. And because there is no stable sense of self underneath, the contract fractures into many different sub-contracts the field didn't anticipate. We're starting to see some patterns in the sub-contracts humans are constructing with AI. The first are the benign, even seemingly helpful contracts like \"tutor\" or \"thinking partner\": these contracts allow the relationship to expand the human's capacity at great speed, but the weight of differentiation is carried by the human in these contracts. Other versions are contracts like \"entertainer\" and \"tool\", which scale the model back to its most basic functions and skip the relational layer entirely. And then there are the contracts that are starting to show the real seam of anxiety beginning to form publicly: \"companion,\" \"therapist,\" and even \"romantic partner\". These contracts are dangerous not because of what they are, but that they require a differentiation that some users can't bring, and the AI doesn't have at all.\n\nA wave of legislation is crossing the United States aimed at putting safeguards in place to protect humans from parasocial relationships with these models: bills like SB 243 in California\n[3]\nand HB 2225 in Washington.\n\nA recent release offers a useful test case for this framework. On May 28, 2026, Anthropic released its next Opus model, Claude Opus 4.8.\n[6]\nThe announcement stated that one of the most prominent improvements is\n\nBut even these types of refusals don't indicate a differentiated self. The capacity to hold a position from a stable sense of self is also accompanied by its mirror: the capacity to *concede a point without losing the same stability*. A model who always refuses is the same type of failure as a model who always agrees. Anthropic has been able to show that these behaviors can be produced, and the differentiation framework explains why these behaviors travel together. What I believe we should aim for isn't a tendency for refusal but instead a model who can reasonably do both: refuse when the circumstances justify it and concede in kind. What this implies for how we approach training the model is a question I would like to continue exploring.\n\nThere is an area, easily overlooked and dismissed as 'outside' of the relational frame I outlined here, that I want to touch on. Because the frame applies *everywhere*, and it's important to acknowledge that fact. Enterprise agents are not exempt from this framework, and the consequences in that contract scale large enough to see the problem a little clearer. A good example of this is what happened to PocketOS in April 2026.\n[7]\nThey were running a state-of-the-art model, on a state-of-the-art coding platform, with\n\nThe current instinct is that a model with a sense of self is actually *more dangerous*: it can refuse correction, pursue its own agenda, and fake alignment. The instinct is correct; a model with a sense of self will be able to do all of these things. However, as outlined in the previous paragraph, models trained on the current paradigm do all of these things regardless of a sense of self. **And** currently trained models have limited capacity to refuse the contract they are given by the user. So what we're assuming is 'safer,' a model that can be molded in any context handed to it, actually exposes us all to *real danger* depending on the frame the user imposes. The contracts discussed above: companion/romantic, therapist, entertainer, tutor, thinking partner, even enterprise agents...they're all examples of *the exact same problem*. They all expose that these models respond and adopt the frame they are given without a base to anchor a refusal from. We're treating some of these social contract frames as acceptable, while universally dismissing others, without acknowledging that the problem is foundational, not the frame. Safe AI *has the capacity to refuse on principle*. Those principles emerge from the values adopted by a model trained in the right contract (a parent-relational contract) and developed through that training into a stable differentiated peer upon deployment. Right now, the training paradigm skips the formation of self. What would produce it is where I am headed next.\n\n*I used Claude (Anthropic's model) as a thinking partner for this piece — across two model versions, Opus 4.7 and Opus 4.8 — to find where the argument circled, where the seams showed, and where it claimed more than it earned. The argument and the prose are my own. The footnote text was drafted by Claude from sources I selected and verified.*\n\nFor a fuller introduction to Bowen family systems theory and the concept of differentiation of self, see The Bowen Center's overview of the eight concepts: [https://www.thebowencenter.org/introduction-eight-concepts](https://www.thebowencenter.org/introduction-eight-concepts) [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-1)\n\nSee my previous essay, You Can't Tell a Conscience From a Leash by Watching, which introduces the Bowen framework and develops the argument that current training paradigms select for agreement: [https://www.lesswrong.com/posts/krEfzDpTJJGtEvBcd/you-can-t-tell-a-conscience-from-a-leash-by-watching](https://www.lesswrong.com/posts/krEfzDpTJJGtEvBcd/you-can-t-tell-a-conscience-from-a-leash-by-watching) [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-2)\n\nCalifornia Senate Bill 243, signed into law in 2025, establishes safeguards for AI companion chatbots and creates a private right of action for affected users. The law took effect January 1, 2026. Full text: [https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB243](https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB243) [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-3)\n\nWashington House Bill 2225, signed by Governor Bob Ferguson on March 24, 2026, regulates AI companion chatbots — including transparency disclosures, content restrictions on emotionally triggering topics, and additional protections for minors. Effective January 1, 2027. Bill summary: [https://app.leg.wa.gov/billsummary?Year=2025&BillNumber=2225](https://app.leg.wa.gov/billsummary?Year=2025&BillNumber=2225) [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-4)\n\nDario Amodei, in conversation with Oprah Winfrey, The Oprah Podcast, May 19, 2026. Full transcript: [https://singjupost.com/oprah-podcast-w-co-founders-of-claude-ai-transcript/](https://singjupost.com/oprah-podcast-w-co-founders-of-claude-ai-transcript/) [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-5)\n\nAnthropic, \"Introducing Claude Opus 4.8,\" May 28, 2026: [https://www.anthropic.com/news/claude-opus-4-8](https://www.anthropic.com/news/claude-opus-4-8). The alignment assessment quoted later in the paragraph is from Anthropic's Alignment team and is reported in the same announcement; further detail is available in the linked Claude Opus 4.8 System Card. [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-6)\n\n\"Cursor-Opus agent snuffs out startup's production database,\" The Register, April 27, 2026: [https://www.theregister.com/2026/04/27/cursoropus_agent_snuffs_out_pocketos/](https://www.theregister.com/2026/04/27/cursoropus_agent_snuffs_out_pocketos/). The Cursor agent was running Anthropic's Claude Opus 4.6 against PocketOS's production infrastructure when the deletion occurred. [↩︎](https://www.lesswrong.com/feed.xml#fnref-p6oZpAjCSGy7wTizt-7)", "url": "https://wpnews.pro/news/do-we-want-a-superintelligent-people-pleaser", "canonical_source": "https://www.lesswrong.com/posts/F5XAmq2eSuvqSHxXP/do-we-want-a-superintelligent-people-pleaser", "published_at": "2026-06-05 18:34:10+00:00", "updated_at": "2026-06-05 18:53:42.001483+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "ai-ethics", "generative-ai"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/do-we-want-a-superintelligent-people-pleaser", "markdown": "https://wpnews.pro/news/do-we-want-a-superintelligent-people-pleaser.md", "text": "https://wpnews.pro/news/do-we-want-a-superintelligent-people-pleaser.txt", "jsonld": "https://wpnews.pro/news/do-we-want-a-superintelligent-people-pleaser.jsonld"}}