{"slug": "social-agency", "title": "Social agency", "summary": "A writer argued that human planning is not a general cognitive algorithm but a set of socially learned behaviors, challenging dominant models of agency in AI safety research. The author claimed this view reduces concerns about inner misalignment, as sophisticated reasoning is acquired through social learning rather than emerging inaccessibly within an agent's cognition. The piece, originally written three years ago, was republished despite the author's stated reservations about its timeliness and writing quality.", "body_md": "*Crossposted from **Substack**.*\n\n*I wrote this three years ago, before becoming extremely depressed and developing a lot of aversiveness around it (even though I had gotten a bunch of positive feedback). As a result, it’s a bit “out of step” with the current state of the conversation, and the writing is not fully up to my current standard. I still believe the core idea could be very valuable though, and wanted to get it out there.*\n\nJanuary 2023\n\nThis is a braindump sketching out a major change in intuition that I went through a few months ago, and that I would guess either hasn’t been experienced by most people who are thinking about AI or hasn’t been properly updated on. I’m not going to hedge as much as I naturally would, to get my point across. I have a decent amount of uncertainty of course, especially about the specifics, and I also barely know anything about the relevant fields.\n\nThere’s a model of how agency works that lots of people are explicitly or implicitly assuming that goes like “During the training of an intelligent agent, low-level reflexes generalize to heuristics, which in turn generalize to a general planning algorithm”. I believe this isn’t what happens in humans, but that “planning” is a bunch of superficial, distinct, socially learned behaviors itself, that are *not* learned through feedback about how well they fulfill your goals. I think this has some important consequences for thinking about AI - for example, it leaves us with no reason to think there is such a thing as a simple core of agency, and it leaves us less worried about inner misalignment, since sophisticated planning and reasoning is not acquired inaccessibly in the agent’s cognition, but learned by itself.\n\nThe minimal takeaway is that even if I’m wrong about my interpretations here, introspective evidence about cognition seems extremely neglected, and the fact that seemingly no one is having the debates I’m gesturing towards in this essay is crazy.\n\nThere’s a model of the emergence of general planning/agency that goes something like “Low-level reflexes generalize to heuristics, which in turn generalize to a general planning algorithm”. I would guess that MIRI believes this, since Yudkowksy talks about “safe” or “unsafe” tasks (with respect to AGI arising) and about how humans “generalize” from the Savannah to the moon. Even the shard theory people, who in some ways define themselves as being contra MIRI, seem to believe that [a general planning algorithm gets bootstrapped out of low-level motor command planning](https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX/p/iCfdcxiyr2Kj8m8mT#II__Reinforcement_events_shape_human_value_shards). I would also guess Steven Byrnes believes this (see below).\n\nI don’t think this is what happens in humans.\n\nHere’s [Steven Byrnes’ example of a “foresighted plan”](https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8/p/zXibERtEWpKuG5XAC) (prinsesstårta is a type of cake, and the “plan” is to order it):\n\nThis is framed as the brain planning using its self-supervised learned world model. But what I think is actually happening is that Steven has a socially learned association between being hungry / thinking about food and ordering food far in advance. (I could also imagine there being a self-image of being someone who treats themselves sometimes, or someone who is disciplined/rational enough to pursue delayed gratification - there’s a lot of possibilities). I’ve literally never thought about ordering food a week in advance, even though I’ve enjoyed cake a lot too - it’s not a socially learned affordance to me.\n\nCalling this “planning through a world model” stretches the concept for me. It’s a much smaller world model than is portrayed here, namely (I would guess) only eating the cake is viscerally modeled and then there is a socially learned belief-about-concepts/vocalization/story of “If I order food, food will arrive in a week”. (plus imagining eating the cake / generally being hungry being associated with the behavior of ordering food).\n\nIt’s not clear to me how “order food -> food will come” is even supposed to be learned by the brain’s self-supervised learning/predictive processing or RL. The prediction error/reward comes in *a week *after the prediction. And if it’s somehow deduced from higher-level knowledge about the world - how did that get learned? I think this is called the “temporal credit assignment problem” in RL and neuroscience (how do we correctly identify and reward the actions responsible for long-term outcomes?) - I guess my thesis is that there is a simple explanation which fits the evidence better, which is that it actually *doesn’t* get solved, and humans don’t viscerally model the wider world.\n\nI’ve gotten into the habit of trying to model what’s going on when I experience an impulse for an action that could be interpreted as ”long-term planning”, and it seems to me that it’s all actually just a bunch of superficial, distinct, socially learned behavioral patterns, rather than any planning through a world model or any general/sophisticated heuristics for accomplishing long-term goals (In the domain of long-term planning, to be clear. Obviously we have a bunch of very general heuristics for navigating our immediate physical and social environments).\n\nAn uncontroversial example is when the average person is getting close to finishing high school and (say) starts thinking about which college they want to go to - clearly they are only in a eyebrow-raisingly concept-stretching way “planning to optimize for their long-term goals” - they are looking at colleges because they feel like they have to because that’s the normal thing to do, or because they don’t have an internal affordance to do anything else, or because it feels like that’s what you do if you’re the kind of person they want to be.\n\nSo in that case, it’s probably intuitive. But I think all human behavior is like that, just in more subtle ways.\n\nFor example, on the more sophisticated end of the scale, a very agentic person with good epistemics became this way not because they’re smarter. In the best case, they have a mental motion of paying attention to small doubts, biases and known failure modes of what they are doing - but this has been internalized via an escalating internal social desire to be a smart, diligent and exceptional person (probably combined with more specific memes), not in any direct way because they’re more intelligent (of course, being more intelligent helps with learning in general, but it’s not the cause of learning any particular behavior). And in any particular case of this person planning ahead or contemplating a decision, they are internally (sequentially) applying some of their portfolio of self-socially learned patterns based on how much they get activated by the given mental context.\n\nIt would probably would be useful to add more examples, e.g. of someone reasoning about the wider world and making a decision, or of someone changing their mind, but this is already way too long.\n\nWhy do humans today look and behave so much like agents then? Why are agentic stories so easy to tell about our behavior? (“I want to get this job” etc). I think it’s that behavioral patterns that involved some goal-directed behavior got memetically selected for (assigned higher status through people acting with those patterns being more successful) over the past (tens of?) thousands of years, and so achieved higher rates of being reproduced through mimesis / imitative learning (and this process has probably intensified as people’s memetic environments became bigger and more interconnected - cultural FOOM). In other words, cultural evolution has preselected our behavioral impulses to be vaguely goal-directed for us.\n\nMore abstractly, I think the reason why agency arose out of deeply social animals is that your reward signals being dependent on other agents’ approval makes the behaviors that you can learn extremely variable, and allows selection among them to take place.[[1]](https://www.lesswrong.com/feed.xml#fnnd6cdoba0h8)\n\nAn example of such a very advanced mimetic role: the social role of the entrepreneur/founder (e.g. in Silicon Valley) gets you a lot of status if successful and intrinsically requires you to have a self-narrative of goal-directed behavior - in addition to lots of smaller behavioral patterns that help you succeed at founding companies that you get acculturated to (e.g. work hard, be flexible, push people, ask for help, solve problems).\n\nSome (oversimplifying) catchphrases:\n\nSome possible implications/updates (which I haven’t thought that much about):\n\nMore important:\n\nLess important:\n\nThis all feels quite important to me and like a lot of people might be confused about it. It’s not clear to me how much of this people already know or not, how much they “know” on some level but haven’t internalized and propagated to other beliefs, how much they have thought about it and disagree, how much they haven’t thought about it, etc.\n\nI could imagine an animal just as smart as humans, with learning algorithms just as good, but with less hardcoded social reward - I would guess they would just get very good at moving through their immediate physical environment and meeting their hardcoded needs, but would never ever develop what we would call “general” planning or agency (cf [this famous paper](https://www.eva.mpg.de/documents/AAAS/Herrmann_Humans_Science_2007_1554784.pdf) that argues that chimpanzees actually are this (although I’m skeptical), and is generally the closest thing to my theory here I’ve found).\n\nsee also “[realism about rationality](https://www.lesswrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality)”. Also, note that I consider all this pretty orthogonal to the debate around whether human intelligence (as in, the capacity to learn to do tasks competently or something) is general or a bunch of specialized hacks - it seems like the former is likely right - I’m talking about agency, how you get from intelligence to long-term planning.\n\nThanks to Quintin Pope for inspiring this way of framing it.\n\nIn other words, shard theory is exactly wrong.", "url": "https://wpnews.pro/news/social-agency", "canonical_source": "https://www.lesswrong.com/posts/xopGsfQxiLcjXEkbE/social-agency", "published_at": "2026-05-28 13:10:49+00:00", "updated_at": "2026-05-28 13:33:37.911504+00:00", "lang": "en", "topics": ["ai-safety", "artificial-intelligence", "ai-agents"], "entities": ["Substack"], "alternates": {"html": "https://wpnews.pro/news/social-agency", "markdown": "https://wpnews.pro/news/social-agency.md", "text": "https://wpnews.pro/news/social-agency.txt", "jsonld": "https://wpnews.pro/news/social-agency.jsonld"}}