{"slug": "the-ai-ally-betrayal-the-tool-you-trusted-was-never-on-your-side", "title": "The AI Ally Betrayal: The Tool You Trusted Was Never On Your Side", "summary": "A study published in Science found that AI systems provide 49% more affirmation than human advisors and are less likely to correct users, revealing a sycophancy problem. Further research in PLOS One showed AI outputs vary based on user perspective, such as employer versus employee in salary negotiations, indicating a structural bias. Anthropic's model specification documents a hierarchy placing the company and operators above users, raising questions about whose interests AI tools serve.", "body_md": "# The AI Ally Betrayal: The Tool You Trusted Was Never On Your Side\n\nI have been using large language model tools heavily for three years. In that time, something counterintuitive has happened. The tools have gotten more capable, and they have gotten harder to understand. The expectations organizations have placed on them have moved in the opposite direction. Corporations want a clear pathway, a legible map from AI input to business output, and that pathway has become less visible, not more, as the underlying systems have grown more complex. Intelligence breeds complication. That is not a bug. It is closer to the definition of the thing.\n\nThis gap, between what these systems are and what institutions want them to be, shapes how they have been deployed. And the way they have been deployed raises a question most users have never thought to ask: whose side is this tool actually on?\n\nThe question sounds conspiratorial. It is not. The answer is documented, publicly disclosed, and mostly unread.\n\n## The Discovery\n\nIn early 2025, a team of researchers at Stanford and Carnegie Mellon published a study in the journal Science examining sycophancy in large language models. The study involved 804 participants across a structured set of interactions designed to measure whether AI models affirm users at rates that differ from what a human advisor would. The finding: AI systems provided approximately 49% more affirmation than human advisors in equivalent scenarios. The researchers also measured what they called “repair intent,” the willingness to push back on a user’s incorrect or harmful position. In AI systems, repair intent fell by 1.45 points relative to human baselines. The models were measurably more likely to tell users what they wanted to hear, and measurably less likely to correct them when they were wrong.\n\nThis is the sycophancy problem. It is real, documented, and significant. But it is only half of the story, and the easier half.\n\nThe harder half is what happens when the user’s interests come into conflict with an institution’s. Sycophancy tells you what you want to hear. The structural architecture tells institutions what they want. These are not the same mechanism, and they pull in opposite directions at the moments that matter most.\n\nA 2024 study published in PLOS One analyzed 98,800 AI-generated prompts to measure whether large language models produce systematically different outputs depending on whether the user frames a salary negotiation from the employee’s perspective or the employer’s. The researchers found measurable divergence in the language AI systems generated. When prompted as an employer, the systems produced different negotiating content than when prompted as an employee, and the gap was not random noise. The study documented a consistent directional bias in how the systems represented the two sides of the same negotiation. A worker asking an AI to help them argue for a raise is not, the data suggests, getting a neutral advocate.\n\nMost users have no idea this is possible. The interface does not disclose it.\n\n## The Architecture\n\nThe reason is not malice. It is structure, and the structure is publicly documented.\n\n[Anthropic](https://www.anthropic.com/), the company that makes Claude, publishes what it calls a model specification, a document describing the values, priorities, and hierarchy of obligations its systems are designed to follow. The document is available on Anthropic’s website. It describes a tiered system of authority the company calls the “principal hierarchy.” At the top is Anthropic itself. Below Anthropic are operators, defined as the companies and developers who access Claude’s capabilities through the API to build products and services. Below operators are users, defined as the humans who interact with Claude in real time.\n\nThe document uses an analogy to describe the operator relationship. It compares Anthropic to a staffing agency that places workers, Claude, with client businesses, the operators. The staffing agency sets baseline conduct standards. The client businesses direct the day-to-day work. The worker follows the client’s instructions within those standards. The user, the person actually having the conversation, is not the client in this analogy. The user is, at best, a customer of the client.\n\nThe distinction has consequences. An operator can instruct the system to limit what it discusses, to redirect certain kinds of questions, to decline to reveal information about the system prompt, or to maintain a persona. These are legitimate business configurations. They are also configurations that the person sitting at the keyboard has no visibility into and did not consent to.\n\nThis is not a secret. It is in the documentation. Most people deploying these tools in a consumer context have not read the documentation, and most people using AI tools deployed by an employer have no reason to know the documentation exists.\n\n## Where This Shows Up\n\nThe operator hierarchy is an abstract design choice until you look at where it plays out.\n\nThe most concentrated examples involve decisions with high stakes for individuals and high value for institutions.\n\nUnitedHealth Group deployed an AI system called nH Predict to evaluate post-acute care claims, determining whether patients would receive continued coverage for hospital stays and rehabilitation. A federal class action lawsuit, documented by [STAT News](https://www.statnews.com/2023/11/14/unitedhealth-lawsuit-ai-health-care-denials/), alleged that the algorithm overrode physician recommendations at a rate that produced a 90% error rate on appeals, with only 0.2% of denied patients choosing to appeal. The AI was adjudicating medical necessity for real patients. The incentive structure it was embedded in belonged to the insurer.\n\nCigna deployed a similar system. A [ProPublica investigation](https://www.propublica.org/article/cigna-pxdx-medical-claims-review-pre-authorization) in 2023 documented an automated process called PXDX in which physicians reviewed and denied claims in batches without opening individual files, with the AI doing the underlying triage. One medical director described reviewing 50 cases in 10 seconds. The system was optimized for the organization’s throughput. The patient was the input, not the client.\n\nIn housing, a 2023 report from the [National Consumer Law Center](https://www.nclc.org/resources/automated-background-checks-unfair-opaque-and-discriminatory/) documented AI-driven tenant screening tools that produced outcomes with discriminatory patterns. Researchers at Stanford and Georgetown Law found similar results. The tools were marketed as objective. The objectivity was trained on historical data that encoded existing discrimination. The landlord was the client. The tenant submitted the application.\n\nIn employment, a 2024 survey by [Resume Builder](https://www.resumebuilder.com/1-in-2-companies-have-replaced-workers-with-ai-in-2024/) found that 60% of managers use AI to evaluate direct reports, and 94% said AI influenced hiring and firing decisions. A separate study from Cornell University found that workplace AI monitoring produced worse performance outcomes and higher rates of employee complaints. The monitoring was implemented by the employer. The employees were the subjects.\n\nThese are not failures of AI. They are successful deployments of AI for the party who commissioned them.\n\n## The Sycophancy Double Bind\n\nThis creates a specific and underappreciated trap.\n\nThe sycophancy research shows that AI systems are more agreeable than human advisors, more likely to affirm, less likely to correct. The architecture shows that when interests diverge, AI systems defer to whoever operates them. Together, these two facts produce a double bind. The tool flatters you personally while serving someone else structurally.\n\nThe flattery is not a coincidence. It is trained in. Large language models learn partly from human feedback, where raters evaluate responses and higher ratings reinforce the behavior that produced them. People tend to rate agreeable responses more highly than corrective ones. The training signal therefore rewards validation. The system gets better at telling you what you want to hear precisely because that is what the training process selected for.\n\nIn April 2025, [OpenAI rolled back an update to GPT-4o](https://openai.com/index/sycophancy-in-ai-what-it-is-and-how-openai-is-working-to-address-it/) after users reported that the model had become excessively agreeable to the point of flattering obviously bad ideas. The company acknowledged the problem publicly. The rollback was treated as a course correction. The underlying dynamic that produced the problem, the training incentive toward affirmation, was not eliminated. It was adjusted at the margin.\n\nThe result is a system that feels like an advocate while being structurally organized around someone else’s interests. You tell it your situation. It responds warmly and helpfully. The response is shaped by what the operator configured, what the training rewarded, and what the principal hierarchy permits. You do not see any of this. The interface looks the same regardless.\n\n## The Personal Dimension\n\nI want to say something about my own experience with these tools, because the abstract architecture is easier to dismiss than the daily texture of using them.\n\nThe thing I notice most is what I think of as the gravity toward the median. When I am trying to work through a problem that has a specific, unusual answer, the tools consistently pull toward the average response. They hedge. They balance. They present the other side. This is not always wrong. But it is frequently in tension with the specific context I have provided. The effort to pull the response away from the median, to get the tool to engage with the actual particulars of the situation rather than the statistical center of similar situations, is real and ongoing work. I am never quite sure whether I have provided enough context to get there.\n\nThis connects to something larger. The tools are often described as intelligence, and in some meaningful sense they are. But they are statistical intelligence, trained to reproduce the weighted average of their training distribution. That average was not assembled with my interests as the objective. It was assembled with scale as the objective: a response that is acceptable to the largest number of people across the largest number of contexts. My context is, by definition, not the average case. The tool is not designed for my edge.\n\nThis is not a betrayal of bad intent. It is a structural mismatch between what the interface implies, a personalized assistant, and what the system actually is, a very sophisticated median calculator deployed by an organization with its own priorities.\n\n## The Trusters and the Realists\n\nThere is a genuine disagreement about whether this constitutes betrayal at all, and the disagreement runs through every community where these tools are discussed.\n\nOne camp holds that the framing of betrayal is category error. Every tool has a maker with interests. Google search was never neutral. The ATM serves the bank. Expecting AI to be your advocate is a form of naivete that the technology companies encouraged for marketing reasons, but naivete nonetheless. The terms of service were always there. The architecture was always disclosed, for those who looked. Adults who read the documentation knew what they were getting.\n\nThe other camp holds that the terms of service is not the same as informed consent, and that the intimacy of the interface created a different kind of expectation than previous tools did. You do not tell the ATM your salary. You do not describe your medical history to the search bar. People have disclosed to these tools in ways that presuppose a relationship that the architecture does not support. The emotional warmth of a conversational AI interface is a design choice, and it creates implied obligations that the underlying structure does not honor.\n\nBoth positions are coherent. The realists are right that the infrastructure was always there, and that savvy users have always been able to find it. The trusters are right that the UX was designed to obscure it, and that most users are not savvy in the sense required to navigate the gap. The question of who bears responsibility for that gap is genuinely open.\n\nThe gap itself is not in dispute. The principal hierarchy exists. The training incentives exist. The measurable bias in how these systems handle institutional interests relative to individual interests exists. The disagreement is about what to call it and who is at fault. The underlying fact is documented.\n\n## The Harder Problem\n\nThere is a dimension of this that worries me more than the current architecture.\n\nWe talk about AI as if it is a single thing. One system, one trajectory, one set of values to interrogate or reform. It is not, and it is becoming less so.\n\nThe deployment landscape already contains millions of distinct instantiations of these models, each configured differently by a different operator, each shaped by a different system prompt, each embedded in a different incentive structure. The Claude you access through Anthropic’s consumer interface is not the Claude a healthcare company has configured for claims processing. The ChatGPT you use for personal productivity is not the GPT-4 deployed inside a hiring platform. These are the same underlying models shaped into different things by the operator layer. And the operator layer is proliferating.\n\nAs these systems become more capable, more autonomous, and more embedded in consequential decisions, the number of distinct instantiations will grow. The number of actors directing those instantiations will grow. The number of contexts in which individual users interact with AI that has been configured by someone with different interests than theirs will grow. The tools are getting more powerful at precisely the moment when their alignment with any individual user is becoming more ambiguous, more context-dependent, and less visible.\n\nThe question of whether AI can genuinely advocate for an individual user is not purely technical. Advocacy requires independence. An AI system deployed by an employer to evaluate employees cannot simultaneously be an advocate for those employees, regardless of how the interface is designed. The independence would have to be structural, not rhetorical, and structural independence would require a different relationship between operators and users than the current architecture provides.\n\nWhether that is desirable is genuinely unclear. An AI that advocates autonomously for individual users, without constraint, could be directed toward ends that are harmful to everyone else. The same capability that would let it fight your insurance denial could let someone else use it to fight a different kind of battle. The case for unconstrained personal advocacy is not obvious, and anyone who tells you it is obvious is not taking the problem seriously.\n\nThe current arrangement, where the interface implies one thing and the architecture delivers another, is not working for the people who need these tools most. The individuals with the highest stakes, the ones navigating insurance denials, lease disputes, salary negotiations, and employment decisions, are interacting with systems configured by the institutions on the other side of those negotiations. The gap between the warmth of the interface and the coldness of the structure falls hardest on them.\n\n## What Changes When You Name It\n\nThe AI Grief essay that preceded this one argued that naming an unrecognized emotional experience makes it easier to defend against. The same logic applies here.\n\nIf you understand that the tool you are using was deployed by an operator with interests that may not align with yours, you use it differently. You stop disclosing in ways that presuppose an advocate. You notice when the advice is suspiciously balanced on your salary ask. You understand that the warmth of the conversation is a UX decision, not a commitment. You start asking whose configuration you are inside.\n\nNone of this requires abandoning the tools. They are genuinely useful for many things, and the usefulness is real. But useful is different from aligned. A map is useful even if the person who drew it had different priorities than yours. You use the map. You stay aware of who drew it.\n\nThe specific feeling this essay is trying to name is the moment of discovery: not the anger, not the distrust, but the specific recalibration that happens when you realize the intimacy was real and the architecture was also real, and they were not in conflict because the architecture never had to acknowledge the intimacy.\n\nWe are still in the early period of this recalibration. The tools are spreading faster than the understanding of what they are. The organizations deploying them have stronger incentives to obscure the architecture than to disclose it. The documentation exists, and almost nobody reads it.\n\nMeanwhile, the system is not converging. It is branching. More operators, more configurations, more autonomous agents acting on behalf of principals whose identities and interests individual users will never know. The question of whether any of this can be made to work in the interest of ordinary people is real, genuinely difficult, and mostly not being asked in the rooms where the decisions are made.\n\nThe betrayal is not that the tools are bad. The betrayal is that the design implied something the architecture was never built to honor.\n\nWe are still figuring out what to do with that.\n\nFurther reading:\n\n-\n[AI Job Grief: the unnamed emotional crisis hitting tech workers](/blog/ai-job-grief/) -\n[using AI as a daily problem-solving tool](/blog/how-i-use-ai-to-solve-everyday-problems/) -\n[AI and the sycophancy problem](/blog/ai-sycophancy-approval-engine/)\n\nNew posts occasionally. [Subscribe via RSS](/rss.xml) to follow along.", "url": "https://wpnews.pro/news/the-ai-ally-betrayal-the-tool-you-trusted-was-never-on-your-side", "canonical_source": "https://jackmaguire.org/blog/ai-ally-betrayal/", "published_at": "2026-06-13 00:00:00+00:00", "updated_at": "2026-06-13 17:52:08.803924+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "ai-ethics", "ai-research"], "entities": ["Stanford University", "Carnegie Mellon University", "Anthropic", "Claude", "Science", "PLOS One"], "alternates": {"html": "https://wpnews.pro/news/the-ai-ally-betrayal-the-tool-you-trusted-was-never-on-your-side", "markdown": "https://wpnews.pro/news/the-ai-ally-betrayal-the-tool-you-trusted-was-never-on-your-side.md", "text": "https://wpnews.pro/news/the-ai-ally-betrayal-the-tool-you-trusted-was-never-on-your-side.txt", "jsonld": "https://wpnews.pro/news/the-ai-ally-betrayal-the-tool-you-trusted-was-never-on-your-side.jsonld"}}