Why Even Experts Don’t Know What to Do About AI Risk

wpnews.pro

cd /news/ai-safety/why-even-experts-dont-know-what-to-d… · home › topics › ai-safety › article

[ARTICLE · art-19217] src=forum.effectivealtruism.org ↗ pub=2026-06-02T17:59Z topic=ai-safety verified=true sentiment=↓ negative

Why Even Experts Don’t Know What to Do About AI Risk

AI Safety veteran Holden Karnofsky estimates a 49% chance his own actions are making the AI risk problem worse, while Jesse Clifton stepped down as executive director of the Center on Long-Term Risk in 2025 due to similar concerns. Top AI safety strategists acknowledge they cannot determine which interventions improve humanity's odds, a phenomenon known as "hidden failure" where projects lack measurable positive impact without participants realizing it. The field's lack of an established paradigm makes strategic impact assessment critical, as even projects with adoption, citations, and funding can inadvertently cause harm.

read5 min views23 publishedJun 2, 2026

*Cross-posted to *LessWrong

AI Safety veteran Holden Karnofsky thinks there’s a 49% chance his actions are making things worse.[1]

In 2025, Jesse Clifton even stepped down as the executive director of the Center on Long-Term risk because of similar reasons.

Even top AI Safety strategists don’t know what will make things better, and what will make things worse.

Why is it so hard to improve humanity’s odds?

And what can you do to choose your actions?

In AI Safety, impact is hard to measure, and thus lack of impact is often invisible. We call this "hidden failure". With hidden failure, projects fail to have a positive impact but the people doing the project don’t realise it.

To understand where hidden failure comes from, it’s useful to understand reasons why projects fail in general. These reasons fall on a spectrum:

These factors can cause problems with both of the things you need to be impactful – adoption and effectiveness:

With hidden failure, you might have users, citations, and funding (i.e. you have “adoption”), and still fail to have impact or even make things worse.

Let us put that more bluntly: It’s literally possible for all your friends to think you’re successful and still be making things worse. Even within AI Safety. Even outside of frontier labs. Creating a profitable startup is hard. Achieving impact in AI Safety is even harder for several reasons:

AI Safety doesn't have an established paradigm yet. [6] We can't predict with certainty what will be impactful. So why bother optimizing so deliberately?

First, imperfect predictions are still valuable. For example, AI Safety experts can often point out specific reasons why a given project or idea is unlikely to be impactful.[7]

Secondly, we argue the lack of a paradigm actually makes deliberate thinking about impact more important, not less. Without clear guides on what will lead to impact, you have to figure it out yourself.

The tools described in the next posts help you optimize for impact under uncertainty. The goal isn't to get it perfectly right or to cripple yourself with analysis paralysis. [8] But we do think most people would benefit from spending more time thinking about their impact.

So let's think strategically about impact. We’ll give a high-level overview of how to do that in an upcoming post, and we’ll help you measure your impact in another one.

Want to get notified of those upcoming posts? Subscribe at the Luc & Lens Academy substack https://lensacademy.substack.com/

We’re paraphrasing that from his appearance on the 80,000 hours podcast, around the 4:11:30 mark, where he said: “I think overall I would probably agree with you that the smaller you’re making the scope of where you’re hoping to have impact, the more reasonable it is to be like 60/40. But most people who go into AI are not going into it for that. Otherwise, if you want a small-scope, robustly positive impact, you should maybe work in a cause like farm animal welfare or global poverty. For the size of impact that tends to motivate people, I think it does get partially offset by this huge uncertainty about the sign.

I tend to think it’s worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one’s head, the more it’s going to be like 50+ε%. I think AI safety is a great cause to work in. I’m excited to work in it. I think it’s high impact. I am doing my best to do things that I will be proud to have done and hope for the best. But I really do have to live with the possibility that my ultimate impact on the utilons or whatever is going to be negative.”

Though you shouldn’t underestimate your brain’s ability to make itself comfortable, satisfice, and employ motivated reasoning to have you accept mediocrity.

We’re using “impact-effectiveness” as a synonym for “effectiveness” as meant by the Impact Equation: Impact = Adoption x Effectiveness.

I will refer here and in other place to for-profits as regular companies not aimed at AI Safety. Of course, an AI Safety project can be set up as a for-profit too.

Although arguably, adoption is sometimes easier in a nonprofit setting. For example, the various fellowships have no trouble finding enough participants. In contrast, though, many products, tools, and blog posts do struggle to get adoption.

See e.g. https://ai-safety-atlas.com/chapters/03/07 or https://www.thecompendium.ai/ai-safety. Although instead of saying AI Safety is pre-paradigmatic, it’s more accurate to say that none of the existing paradigms is widely agreed to be sufficient for making the world safe, especially by higher level researchers in that paradigm. Aka, we have a bunch of paradigms, but they’re all pretty limited, and all-in-all we don’t even know yet what approaches will be required to make the world safe enough.

Though there are also areas where experts disagree. In such cases, it becomes even more important to assess the specific arguments they use.

See e.g. Holden Karnofsky on the 80000 hours podcast, where he says "When people ask me for career advice or whatever, the usual thing I’d say is: take a bunch of options that all seem competitive, and all seem like they could be the best thing, and that it’s not obvious which ones are better than others from an impact perspective. And from there I would say go with personal fit, go with the energy you feel to work on them."

source & further reading

forum.effectivealtruism.org — original article Don't default to nonprofit Would your AI travel agent book a bullfight? Testing whether agents consider animal welfare without being prompted Animal talk: How AI is helping us understand other species (media article at El País)

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-even-experts-dont-kn…

Read original on forum.effectivealtruism.org → forum.effectivealtruism.org/posts/c83X7WXyT76Ljt…

mentioned entities

Holden Karnofsky

Jesse Clifton

Center on Long-Term Risk

LessWrong

metadata

slugwhy-even-experts-dont-know-what-to-do-about-ai-risk

topic#ai-safety

secondary3 topics

sentimentnegative

canonicalforum.effectivealtruism.org

navigation

← prevGitHub Copilot App

next →Android will now warn you if a c…

── more in #ai-safety 4 stories · sorted by recency

the-decoder.com · 18 Jul · #ai-safety

Open-weight models now match frontier cyber performance from just four months ago at a fraction of the cost

the-decoder.com · 18 Jul · #ai-safety

China's new World Artificial Intelligence Cooperation Organization is President Xi's clearest play yet for a parallel AI order

twitter.com · 18 Jul · #ai-safety

Claude shows subtle biases to Anthropic across carefully controlled tests

byteiota.com · 18 Jul · #ai-safety

Linus Torvalds Tells AI Critics to Fork Off: Linux Is Not Anti-AI

── more on @holden karnofsky 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required