{"slug": "some-thoughts-on-bengio-s-scientist-ai", "title": "Some Thoughts on Bengio's Scientist AI", "summary": "Yoshua Bengio's proposed \"Scientist AI\" framework contains fundamental safety flaws and practical limitations that make it unworkable, according to a critical analysis. The plan fails to address alignment risks by not accounting for how the AI could still produce dangerous outputs, such as recommending the creation of agentic systems to solve problems like cancer. Additionally, the approach of restricting the AI from taking actions undermines its scientific capabilities, as causal inference and effective scientific discovery require active experimentation rather than passive correlation analysis.", "body_md": "*Epistemic Status: I wrote this for an application then realized it might be of interest to others or spark a conversation. Yoshua Bengio and **LawZero** are important players in AI Safety, so I think we should have a conversation about their ideas.*\n\nI have two substantial concerns with [Yoshua Bengio’s Scientist AI](https://arxiv.org/abs/2502.15657). One is that it fails to think through the consequences of success, and will fall into the same kind of alignment failures as agentic AI. A second is that Bengio’s method for making a scientist AI would fall short for both practical and theoretical reasons.\n\nEven leaving aside some of his philosophically difficult claims like the mention that they want to make an AI that can't model itself but can still predict the world, the scientist AI as described would be unsafe. Logically speaking, if someone asked, \"How can cancer be cured?\" it would output some sequence of steps which could involve making an agent AI to solve cancer. I've seen Yudkowsky's blog posts from over ten years ago on why tool AI is not a solution to alignment.\n\nPractically speaking, it tends to be the case that intelligently seeking out information in an agendic way is the best way to do science. Stopping your scientist AI from doing that is weakening it. By handicapping their scientist AI, Bengio’s team will always be behind labs that allow their AI (scientist or not) to explore. Even worse, there are strong theoretical reasons to expect understanding the world to be impossible without taking actions. This is related to the field of causal inference, pioneered by figures such as Judea Pearl. “Correlation does not imply causation” is a common mantra for a reason. Correlation is evidence for causality, reverse causality, common cause, or selection bias. Without causal assumptions or taking actions, it is simply not possible to deduce the correct causal model. Reinforcement learning, the paradigm that Bengio correctly identifies as the thing that puts us most at risk, is also the training paradigm that allows AIs to learn new causal models. This tension exists throughout the paper. Bengio often writes about wanting to train the model for causal modeling, but he also says that his plan is based on inherently associative conditional probabilities. No matter how much data he gives the model during training, this fact remains.\n\nThere are other possibly insurmountable obstacles to the plan as well. Constructing a formal language to map to reality is extremely difficult if not impossible, and researchers have been trying to do that for decades. And it would be hard to train such a model without human data, but if you train on human data, you're back at pre-training an AI which can play a potentially malicious character.\n\nDespite my criticisms, there are good aspects of the paper. Their short-term plan is something I would like to see: If I’m reading them correctly, it is fine-tuning an LLM model to be good at hypothesizing what might go wrong with a user’s request. That is both practically useful and can make the system safer to use. Characterizing risky agentic AI systems by their “affordances, goal-directedness, and intelligence” is a good idea, even if enough intelligence can lead to the other two trivially. I will work “anytime preparedness” into my plans, so that I can contribute in both short and long timelines. I have done something similar, by doing both fieldbuilding (long term payoff) and research (short term payoff if the research is impactful), and I appreciate the handle.\n\nI have a lot of respect for Yoshua Bengio, but his scientist AI plan is not doable for reasons that I'm sure other AI safety people have tried telling him before. Still, I'm glad that he is working on AI safety, and I expect his group to contribute something valuable, even if the plan as written can't work.", "url": "https://wpnews.pro/news/some-thoughts-on-bengio-s-scientist-ai", "canonical_source": "https://www.lesswrong.com/posts/382m7ATXELt5eBgr3/some-thoughts-on-bengio-s-scientist-ai", "published_at": "2026-05-26 03:05:43+00:00", "updated_at": "2026-05-26 08:16:03.169313+00:00", "lang": "en", "topics": ["ai-safety", "ai-research", "artificial-intelligence"], "entities": ["Yoshua Bengio", "LawZero", "Yudkowsky"], "alternates": {"html": "https://wpnews.pro/news/some-thoughts-on-bengio-s-scientist-ai", "markdown": "https://wpnews.pro/news/some-thoughts-on-bengio-s-scientist-ai.md", "text": "https://wpnews.pro/news/some-thoughts-on-bengio-s-scientist-ai.txt", "jsonld": "https://wpnews.pro/news/some-thoughts-on-bengio-s-scientist-ai.jsonld"}}