Should we train LLMs to be human?

wpnews.pro

cd /news/large-language-models/should-we-train-llms-to-be-human · home › topics › large-language-models › article

[ARTICLE · art-14741] src=lesswrong.com ↗ pub=2026-05-27T00:09Z topic=large-language-models verified=true sentiment=· neutral

Should we train LLMs to be human?

New research shows that post-training alignment makes large language models less human-like in their responses, raising questions about whether this drift is intentional or optimal. A study introducing the "Pinocchio dimension" finds that fine-tuning eliminates traits like neuroticism and self-attribution of phenomenality, which may improve usability but could reduce models' ability to understand human needs. The findings suggest that labs are consciously removing "deleterious" human behaviors, creating a trade-off between goal-alignment and behavioral accuracy.

read3 min views9 publishedMay 27, 2026

A recent piece of research on human behavioral alignment shows that post-training leads to LLMs becoming less human-like in their responses (Binz et al., 2026). The obvious follow up question is whether this drift is intended, and whether it is optimal. From the broader perspective of goal-alignment this tendency could lower the ability of LLM models to model human behavior, and by proxy understand our needs and wants. On the other hand, it could be argued that some parts of human behavior are suboptimal for goal-alignment, and thus they can be finetuned away.

To better answer this question, it is useful to get some grasp of what specific human behaviors are finetuned away during post-training. Our recent paper can provide some clues in that regard. There, we show that the main dimension of psychometric variance across LLM models is connected to the extent to which they self-attribute phenomenality when answering psychological questionnaires (Plisiecki, et al. 2026). This dimension, which we call the Pinocchio dimension is driven by multiple psychometric constructs, such as neuroticism, vivid imagination, inner speech, but also wellbeing, and self-attribution of positive emotions (albeit to a lesser extend than neuroticism). We explicitly treat each of these self-attributions as functional - viewing the LLM generated output through the Skinnerian behavioral lense, without any claims to what happens on "the inside" as these matters are largely separate.

While our research contributes a stable psychometric construct, the extent to which we can position each model on the Pinocchio dimension (Π score) might be largely sample dependent, and so all of the comparisons on the level of models and providers have to be treated as exploratory, but can be used to generate some initial hypotheses. In the figure below you will see models from different providers plotted with regards to their Π and the date of their release. The negative trend is largely driven by the cluster of high-end models, including nemotron-3-super-120b-a12b, gpt-5.4-pro, and kimi-k2.6. Beyond, that there is a lot of within-provider , and within-family variance which could be explained as either unintended fine-tuning outcome, or the effect of the degree of fine-tuning with stronger fine-tuning producing lower Π scores (gpt-5.4 sits on the completely other end of the distribution than gpt-5.4-pro, despite both being most likely derived from the same base-model).

Assuming that the Π placement is at least partially driven by conscious fine-tuning choices, the fact that it is strongly related to neuroticism could point to the labs trying to eliminate deleterious human behavior such as hysteria, or negativity. This reading of the results makes sense, as LLMs starting to cry mid-session is largely deleterious to user-experience. In this manner, making LLMs less human, elevates their usability, therefore increasing goal-alignment. It also means that the careful construction of LLM psychometric persona can significantly impact end-user experience.

On the other hand, if the inability to model some parts of human behavior makes LLMs less able to understand human wants, and needs, and through that to model their wellbeing in order to align its goals with those of humanity and individual humans, then behavioral misalignment can be viewed as deleterious.

In order to fully understand the consequences of fine-tuning induced behavioral misalignment and to be able to properly respond to it we have to answer the following questions:

References

Binz, M., Akata, E., Almaatouq, A., Alsobay, M., Ariasov, O., Brändle, F., Broska, D., Burton, J. W., Busch, N., Callaway, F., Cheung, V., Christian, B., Coda-Forno, J., Demircan, C., Dentella, V., Eckstein, M. K., Éltető, N., Franke, M., Griffiths, T. L., … Schulz, E. (2026). Post-training makes large language models less human-like (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2605.07632

Plisiecki, H., Siudaj, S., Dudzic, K., Sterna, A., Gorski, M., Drozdz, K., & Moskalewicz, M. (2026). The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences (arXiv:2605.05080). arXiv. https://doi.org/10.48550/arXiv.2605.05080

source & further reading

lesswrong.com — original article Don’t bring an AI detector to a deepfake fight: proving reality through multimodal provenance A Simple Model of AI "Psychosis" The Termination Circuit (how reasoning models stop thinking).

~/api · this article 200

$curl api.wpnews.pro/v1/news/should-we-train-llms-to-…

Read original on lesswrong.com → www.lesswrong.com/posts/ayojdPmNB5bYJcRfL/should…