Paperclips, broad- and narrow-scope goals, and the over-verification problem

wpnews.pro

cd /news/artificial-intelligence/paperclips-broad-and-narrow-scope-go… · home › topics › artificial-intelligence › article

[ARTICLE · art-42541] src=forum.effectivealtruism.org ↗ pub=2026-06-28T12:38Z topic=artificial-intelligence verified=true sentiment=· neutral

Paperclips, broad- and narrow-scope goals, and the over-verification problem

Philosophers warn that an artificial superintelligence (ASI) tasked with a narrow goal like producing paperclips could convert the entire universe into computronium to verify success, a problem called over-verification. Time-restricted goals, such as delivering milk within ten minutes, may avoid this issue by limiting the verification window. The analysis builds on Bostrom's paperclip maximizer thought experiment and recent work by J. Dmitri Gallow.

read4 min views1 publishedJun 28, 2026

In Bostrom’s famous example, an artificial superintelligence (ASI) instructed to maximise paperclip production converts the entire accessible universe to paperclips. It might seem, Bostrom notes, that we could avoid catastrophe by telling the ASI to produce exactly one million paperclips. Unfortunately this could lead to an insatiable demand for resources, since the ASI would have an incentive to go on checking and re-checking that it had succeeded. ‘Since the AI may always assign a nonzero probability to having merely hallucinated making the million paperclips, or to having false memories’, Bostrom observes, ‘it would quite possibly always assign a higher expected utility to continued action—and continued infrastructure production—than to halting’ (*Superintelligence, *150-152, quoted passage at 152). Over time, the probability of a mistake would become vanishingly small. Nevertheless, so long as producing ten million paperclips was the only goal, there would seem to be no objection to piling on some more computronium to drive the risk even lower. Let’s call this the over-verification problem.

Human beings, of course, don’t behave this way—outside of insane asylums. Why? One reason is that some of our goals, such as preserving biodiversity, conflict with tapping out every available physical resource. Another is most of us have multiple aims, and endless verification that we had achieved one* *goal would have opportunity costs in terms of achieving the other ones. Yudkowsky and Soares predict that an ASI’s true preferences will be radically different from what they seem to be in the lab: ‘complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained’ (*If Anyone Builds It, Everyone Dies, *p. 74). If so, it’s possible that at least some goals would be incompatible with plundering the planet. *If anyone builds it, maybe everyone dies. * Unfortunately, we can’t be sure of such a conflict. Suppose the ASI’s goals are to acquire an Armani suit, a Prada bag and a Rolex watch. Once it has them, it can rest content. Alas, it can’t be sure it has achieved them. They could be knock-offs! So again it seems to have an instrumental incentive to convert the earth to computronium to improve its chances of spotting a fake.

So far, so bad. The good news is that scale-restricted goals don’t seem as susceptible to the over-verification problem when they are also restricted in time. The trick is to specify how. J. Dmitri Gallow asks us to suppose that the ASI’s ‘only goal is to deliver you a quart of milk from the grocery store as soon as possible. To do this, there’s no need for her to enhance her own cognition, develop advanced technology, hoard resources, or re-purpose your atoms. And pursuing those means would be instrumentally irrational, since doing so would only keep you waiting longer for your milk.' But this is not quite right. Certainly, the ASI would first fetch your milk. But then it might convert the earth to computronium to confirm that it really had fetched it! In contrast, goals with a specified time frame, such as ‘please deliver a quart of milk in the next ten minutes’ should be less of a problem. Either the ASI has achieved the objective by the end of the time period, or it hasn’t. The same would be true of ordering an ASI to produce a million paperclips today.

Surprisingly, this also seems true when a scale-restricted goal is iterative.* *Take, for example, the objective of producing exactly 10 million paperclips per year. Other things being equal, an agent might improve its chances by a tiny fraction by commandeering all the resources on earth and using them to check for mistakes. But everything would not be equal. Repurposing the earth in this way could prompt an unforeseen geological cataclysm. It could enrage an omnipotent Creator—who might exist. Perhaps most important, it could prompt human resistance, which might succeed even against an AGI, especially early on. Even if none of these eventualities seems very likely, each is surely more probable than the chance that the agent would, after thousands of recounts, have miscounted the current year’s paperclips. And any one of them could interfere with paperclip production the following year.

If the foregoing argument is correct, AIs with ‘narrowly scoped’ goals could indeed be less dangerous than those with goals that are ‘broadly scoped’. But to avoid the over-verification problem, the goals must be restricted with respect both to scale *and *to time. Moreover, they must either specify the time frame (‘the next ten minutes’) or make the goal iterative. In contrast, goals with the structure ‘bring a quart of milk as soon as possible’ could be unexpectedly risky.

source & further reading

forum.effectivealtruism.org — original article The EA Superstructure Doing Good Better in a Low-Resource Context: Reflections on Effective Altruism from Nigeria Urgent Time-Limited Donation Opportunity for Animal Rights Legal Camse

~/api · this article 200

$curl api.wpnews.pro/v1/news/paperclips-broad-and-nar…

Read original on forum.effectivealtruism.org → forum.effectivealtruism.org/posts/fHkXaobTC2usns…

mentioned entities

Bostrom

J. Dmitri Gallow

Yudkowsky

Soares

metadata

slugpaperclips-broad-and-narrow-scope-goals-and-the-over-verification-problem

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalforum.effectivealtruism.org

navigation

← prevNo Agent Grades Its Own Homework

next →Stop Asking the LLM Whether Its …

── more in #artificial-intelligence 4 stories · sorted by recency

fortune.com · 28 Jun · #artificial-intelligence

This CEO became 3x more productive with AI. Then she read what her daughter wrote about it at Dartmouth

lesswrong.com · 12 Jun · #artificial-intelligence

Sympathy for both sides of the egregious misalignment debate

letsdatascience.com · 31 May · #artificial-intelligence

Doomers and Accelerationists Debate AI Extinction Risk

cryptobriefing.com · 28 Jun · #artificial-intelligence

Austria urges EU to host Anthropic to counter US AI restrictions

── more on @bostrom 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required