# A brief list of ways AI safety efforts could be net negative

> Source: <https://www.lesswrong.com/posts/sAfMCpWLfkHqF5Gix/a-brief-list-of-ways-ai-safety-efforts-could-be-net-negative>
> Published: 2026-06-19 16:12:38+00:00

Here’s [Holden Karnofsky](https://80000hours.org/podcast/episodes/holden-karnofsky-concrete-ai-safety-frontier-ai-companies/):

I tend to think it’s worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one’s head, the more it’s going to be like 50+ε%. I think AI safety is a great cause to work in. I’m excited to work in it. I think it’s high impact. I am doing my best to do things that I will be proud to have done and hope for the best. But I really do have to live with the possibility that my ultimate impact on the utilons or whatever is going to be negative.

I’m not aware of a good list of downside risks for AI safety broadly [1], so I decided to make one.

This is not intended to be fully comprehensive, these are just the ones that I personally take seriously[[2]](https://www.lesswrong.com/feed.xml#fncrb7qpyk296) [3]:

*(This list is taken from **a previous post of mine**, but I thought it deserved its own top-level reference.)*

The closest thing I’m aware of is [Safeguarding the Safeguards](https://arxiv.org/pdf/2312.08039), but even that is more narrow.

To be clear, I don’t personally think AI safety has been net negative so far, like some do. I wouldn’t even say that I have a properly considered view about it - maybe 60% that it’s been net positive, with very low credal resilience.

But I do feel a vibe of overconfidence in the discourse here sometimes, and I think this can have downstream consequences, e.g. [an action bias](https://forum.effectivealtruism.org/posts/LrJmpReG7uptfazpX/a-simple-argument-for-trying-less-hard).

Quickly, here are others that I excluded because I don’t personally see them as potentially *major* factors, and didn’t want to water down the main list by including a bunch of implausible galaxy-brained stuff:

[Holden Karnofsky](https://80000hours.org/podcast/episodes/holden-karnofsky-concrete-ai-safety-frontier-ai-companies/): “Most things that touch policy at all in any way will move us along that spectrum in one direction or another, so therefore have a high chance of being negative [...]

And then most things that you can do in AI at all will have some impact on policy. Even just alignment research: policy will be shaped by what we’re seeing from alignment research, how tractable it looks, what the interventions look like.” (h/t Anthony DiGiovanni)

[Holden Karnofsky](https://80000hours.org/podcast/episodes/holden-karnofsky-concrete-ai-safety-frontier-ai-companies/): “there’s also a lot of micro ways in which you could do harm. Just literally working in safety and being annoying, you might do net harm. You might just talk to the wrong person at the wrong time, get on their nerves. I’ve heard lots of stories of this. Just like, this person does great safety work, but they really annoyed this one person, and that might be the reason we all go extinct” (h/t Anthony DiGiovanni)

Among other things.

I associate these with people like [Richard Ngo ](https://x.com/RichardMCNgo/status/2056000490840699137)(and [here](https://www.lesswrong.com/posts/6YxdpGjfHyrZb7F2G/third-wave-ai-safety-needs-sociopolitical-thinking)) and [Oliver Habryka](https://forum.effectivealtruism.org/posts/W7AMKT8qssjS8WwjN/habryka-deactivated-s-quick-takes?commentId=cP3pActZJ4FdBQ6tn).
