# The Learning Trap: What Simulated Clueless Agents Reveal About the Unawareness Argument

> Source: <https://forum.effectivealtruism.org/posts/EvDrTWwzkHS3ekMpb/the-learning-trap-what-simulated-clueless-agents-reveal>
> Published: 2026-07-03 21:21:47+00:00

*Submission to the Cluelessness Critiques Competition. All code, parameters, and figures are public and reproducible: https://github.com/dan-pandori/cluelessness-learning-trap. See the authorship note at the end.*

Anthony DiGiovanni's unawareness sequence argues that our understanding of our actions' long-run consequences is so coarse that, for an impartial altruist, options are severely incomparable — and hence that impartial altruism is not action-guiding. One family of responses is pragmatic: agents who force themselves to act on precisified best guesses do better than agents who respect incomparability. The sequence has a considered reply to this critique, and the debate has since stalled at an exchange of intuitions about what "doing better" could even mean for a clueless agent.

This essay tries to move that impasse by actually building the agents and running them. I simulate three minimal environments and compare (i) a precise Bayesian who always forms best guesses, (ii) an imprecise agent who respects incomparability and defaults to the status quo when options are incomparable, and (iii) an imprecise agent with *identical credences* who treats incomparability as a license to pick freely. Three results:

**The cost of cluelessness is not incomparability itself — it is what fills the action-guidance vacuum.** In any environment with feedback, the status-quo-defaulting agent falls into a *self-sealing trap*: it never acts because its intervals are wide, and its intervals stay wide because it never acts. In 300 simulation runs it did not act once. Meanwhile the free-picking imprecise agent, with the *same* epistemic state, matched the precise Bayesian essentially exactly. The unawareness argument establishes (at most) incomparability; it does not defend the status-quo default that makes incomparability practically corrosive — and the default is doing enormous, unacknowledged work.

**In genuinely feedback-free one-shot decisions, the sequence is right.** When consequences are never observed and evidence is symmetric, precisified best guesses perform exactly as well as coin flips. Arbitrary precisification buys nothing there, and pragmatic critiques that pretend otherwise fail. I think critics should concede this clearly.

**But real altruists don't face one-shot decisions — they face sequential problems in which acting is how awareness grows.** When becoming aware of an unknown mechanism requires contact with it, exploratory *policies* (act, observe, adapt) beat abstention not merely in expectation but *robustly across the imprecise agent's entire credal set*, once horizons are moderately long. Under the imprecise agent's own maximality standard, the comparison becomes determinate again — no arbitrary precisification required. Incomparability over *acts* does not imply incomparability over *policies*, and policies are the objects altruists actually choose.

The upshot is not that the unawareness argument is unsound, but that its practical conclusion — impartial altruism is not action-guiding — does not follow for agents embedded in time. The verdict "A and B are incomparable" is itself partly an artifact of the policy adopted under it: in my simulations, the free-picking agent's set of mutually incomparable options shrinks from eleven to one *because it acts*, while the defaulting agent's incomparability is permanent *because it doesn't*. Cluelessness, treated as a reason for the default option, manufactures the very epistemic poverty it cites as justification.

In the sequence's premise taxonomy: I grant P1, P2a, and P2b for one-shot act evaluation. I challenge the *inference* from these premises to the practical conclusion, and I challenge P3 (both variants) as applied to the choice among policies rather than acts, on the grounds that P3's case ignores the endogeneity of awareness growth.

A recurring critique of imprecision-based cluelessness arguments is pragmatic: whatever the philosophical merits of imprecise credences and incomparability, agents who force determinate best guesses and maximize expected value make better decisions than agents who don't. Versions of this appear in the debate over whether subjective probabilities should be sharp (Elga 2010) and in several of the counterarguments the sequence's summary post catalogs.

The sequence's reply, as I read it, has two prongs. First, a charge of question-begging: "better decisions" by what standard? Any performance metric presupposes determinate facts about which outcomes are better, and the clueless agent's whole predicament is that she cannot access such facts. Second, an empirical disanalogy: pragmatic arguments for precision are typically motivated by settings with feedback — repeated bets, calibration scores, market discipline — whereas the consequences at issue for impartial altruists are cosmos-wide and never observed. There is no feedback signal from the far future to discipline our precisifications, so the pragmatic success story doesn't transfer.

Both prongs are serious, and I don't think existing statements of the pragmatic critique have answered them. But I also don't think the debate should end in dueling intuitions about hypothetical agents. The agents in question are simple enough to implement. So I implemented them.

The question-begging charge deserves a direct answer before any results, because it threatens to make every simulation irrelevant: a simulation has a ground-truth value function, and isn't assuming ground truth exactly what the clueless agent can't do?

No — and the reason is internal to the unawareness argument itself. The sequence's normative premise (P1) defines justified preference by reference to an *epistemically idealized* version of ourselves: preferring A over B is justified when, roughly, our idealized self would expect A to have better total consequences. This premise presupposes that there are facts about total consequences for the idealized self to have attitudes about. The argument is not skeptical about value; it is skeptical about our *access* to value. A simulation makes exactly the same presupposition: there are facts about which outcomes are better (the ground truth), and the agent has badly limited access to them (coarse partitions, imprecise priors, unknown mechanisms). Simulations therefore measure precisely the quantity P1 says matters: how well does each decision policy do *by the lights of the idealized perspective the agent cannot occupy*? If a policy systematically produces outcomes the idealized self would rank higher, that is not a question-begging standard — it is the argument's own standard.

What simulations *cannot* do is settle P3 directly: no toy model can show that our actual epistemic situation with respect to the actual long-run future is, or isn't, coarse enough for incomparability. What they can do is test the *decision-theoretic machinery* that connects epistemic coarseness to practical conclusions — and expose structural features of that machinery (defaults, statics vs. dynamics, acts vs. policies) that the verbal argument leaves implicit. That is the spirit of everything below.

**Setup.** Ten candidate "interventions" with unknown true per-step values drawn from a standard normal — so roughly half are actively harmful — plus one "safe" option with known value zero (think: retreat to common-sense norms, undertake nothing altruistically ambitious). Acting yields noisy feedback about the chosen intervention. Horizon 500 steps, 300 replications.

**Agents.**

**Results.** The precise Bayesian and the imprecise free-picker are statistically indistinguishable: mean cumulative value of about 661 and 670 respectively (the ordering flips across seeds; the gap is noise). Uniform random hovers near zero. And the imprecise defaulter earns exactly zero — not approximately zero: in 300 runs of 500 steps it never acted once. With no data, every intervention's posterior interval is just its wide prior interval, which straddles zero; so nothing robustly beats the safe option; so it never acts; so no intervals ever shrink; so nothing ever robustly beats the safe option. The trap is airtight and permanent. These results survive a harsher variant in which the average intervention is harmful (mean −0.5) and feedback is three times noisier: both learning agents remain strongly positive; the defaulter remains frozen at zero.

*Figure 1: Cumulative realized value over time (mean over 300 runs, shaded bands are ±2 standard errors). The precise Bayesian and the free-picking imprecise agent climb together; the status-quo-defaulting imprecise agent — with identical credences — is a flat line at zero.*

Two lessons.

**First, incomparability per se is nearly costless.** The free-picker respects every incomparability verdict the imprecise machinery issues. It never precisifies. It simply declines to treat incomparability as favoring any particular option — including the status quo. And it does fine, because picking-at-random among live options generates the data that dissolves the incomparability: its undominated set shrinks from all eleven options at the start to below two by step 100 and to one by the end. The epistemic state that looked action-paralyzing was, under a different response to it, self-liquidating.

**Second, and consequently, the practical force of the unawareness argument is carried almost entirely by an undefended premise about defaults.** The sequence argues carefully for incomparability. It does not argue — because on its own terms it *cannot* argue — that incomparability favors inaction, the status quo, or the abandonment of ambitious impartial projects. If A and the status quo are incomparable, then by the argument's own lights the status quo is not better. Yet the practical gloss everyone (proponents and critics alike) puts on the conclusion — deprioritize impartial-altruistic ambitions, fall back on other norms — quietly resolves every incomparability toward the default. Model 1 shows this resolution is not a harmless bookkeeping convention. In any environment with feedback, it is the difference between matching an ideal Bayesian and achieving literally nothing, forever. A status-quo default under wide imprecision is a *learning trap*: the verdict "I can't compare, so I won't act" guarantees the verdict never changes.

I want to be careful about what this does and doesn't show. It doesn't show the sequence's conclusion is false; "impartial altruism is not action-guiding" is compatible with "and therefore pick freely among live options," which is exactly the free-picker. But I don't think that reading survives contact with how the conclusion is deployed. The practical stakes the sequence itself emphasizes — whether to fund x-risk work, whether ambitious longtermist bets are justified — are stakes precisely because the live alternative to "act on your best guess" is some privileged default (common-sense norms, saving, inaction). If the conclusion is instead read permissively — cluelessness licenses picking any undominated ambitious project — then the unawareness argument has almost no practical bite, and the free-picker's performance shows why: a community of clueless free-pickers behaves, in aggregate and over time, almost exactly like a community of confident EV-maximizers. The argument thus faces a dilemma: read permissively, it changes nothing; read as favoring the default, it rests on a resolution of incomparability that its own machinery forbids, and that resolution is catastrophic wherever feedback exists.

The obvious rejoinder: Model 1 has feedback, and the whole point of the unawareness argument is that cosmos-wide consequences provide none. Fair. Model 2 removes it.

**Setup.** A single decision between two actions whose values are equal and opposite functions of an unknown mechanism, about which the agent's evidence is perfectly symmetric. Consequences are never observed. The "precise" agent precisifies — draws an arbitrary best guess about the mechanism, necessarily uncorrelated with the truth since no evidence connects them — and acts on it. Compare against a coin flip and against abstention, over 200,000 draws.

**Results.** Precisified choice: mean value +0.001 (standard error 0.002). Coin flip: +0.001. Abstention: 0 by construction. Identical. The precisifier gains exactly nothing over the coin flip — its "best guess" is a coin flip with extra self-confidence — and both acquire variance that abstention avoids.

This is the environment the unawareness argument describes, and in it the argument's core claim is simply correct: when evidence is symmetric and feedback is absent, forming a determinate best guess is not epistemically or practically better than refusing to. It is decorating a coin flip. Critics of the sequence should stop resisting this point; the pragmatic case for precision is entirely a case about environments with structure that precision can latch onto, and it transfers to feedback-free one-shot decisions not at all.

The real question — the one the rest of this essay turns on — is whether the altruist's situation is Model 2. I'll argue it is not, for a reason that has nothing to do with optimism about forecasting the far future.

The unawareness argument treats the agent's awareness as a fixed backdrop: here is your coarse partition, here are the considerations you've failed to conceive of; now evaluate acts A and B against that fixed epistemic horizon. But awareness is not exogenous. The single most reliable generalization about unawareness — in science, in engineering, in EA's own intellectual history — is that you become aware of mechanisms by *interacting with the domains that contain them*. Nobody sat in an armchair and became aware of the considerations that now structure this very debate; the s-risk research program, the concept of complex cluelessness, the unawareness sequence itself are all downstream of people acting on admittedly-inadequate best guesses, hitting anomalies, and conceptualizing what they hit. The sequence is, in a fairly literal sense, evidence against its own static frame: it exists because acting under unawareness generates awareness.

Model 3 makes the resulting decision problem minimal.

**Setup.** Two domains. A familiar domain yields a known, modest value (1 per step). An unfamiliar domain is governed by a mechanism the agent is initially unaware of: with probability *p* it is favorable (value *g* = 2 per step once understood); with probability 1 − *p* it is harmful (a one-time cost *c* on first contact). Crucially, *entering the unfamiliar domain once reveals the mechanism* — awareness growth is triggered by action — after which the agent exploits it if favorable and retreats permanently if not. Two *policies*: **Explore** (enter once, then adapt) and **Abstain** (stay in the familiar domain forever). Horizon *T*.

**Results.** The comparison is analytic; the simulation just draws the picture. Explore beats Abstain whenever *p* exceeds a threshold *p** — and *p** collapses as the horizon grows. With *g* = 2 and a harm *c* fifty times the per-step familiar value:

Horizon T |
Threshold p* |
|---|---|
| 10 | 0.84 |
| 50 | 0.51 |
| 100 | 0.34 |
| 250 | 0.17 |
| 500 | 0.09 |

*Figure 2. Left: Explore-minus-Abstain value over the (p, c) plane at T = 100, with the break-even contour. Right: the threshold p* shrinking as the horizon grows.*

Now the crucial move. The unawareness argument will insist the agent cannot assign a precise *p* — granted. Give her a *wide* credal interval, say *p* ∈ [0.15, 0.9], an interval so wide it verges on confessing total ignorance. Since Explore's value is monotone in *p*, the maximality comparison between the two policies is settled by the worst case, *p* = 0.15 — and at *T* = 500, Explore wins even there. Every member of the credal set prefers Explore. **Under the imprecise agent's own decision rule, with no precisification anywhere, the policy comparison is determinate.** The incomparability that afflicted the one-shot act evaluation does not survive the move to policies, because the value of the awareness growth that acting generates accrues over the whole horizon, and long horizons let robust information value swamp bounded worst-case harms.

This reframes what P3 would need to establish. It is not enough to show that our understanding of acts' total consequences is too coarse to compare acts (I've granted that in Model 2's frame). The argument needs the far stronger claim that our understanding is too coarse to compare *policies whose early steps differ in how much awareness they generate* — that is, that the comparison "act-then-adapt vs. never-engage" is *also* incomparable across the credal set. Model 3 shows that this stronger claim fails in the simplest possible awareness-growth environment for any agent with a non-degenerate horizon, even under severe imprecision, even when contact with the unknown is probably harmful. And note what the exploration policy's superiority does *not* depend on: forecasting cosmos-wide consequences, feedback from the far future, or optimism about persistence. It depends only on the local, observable, mundane fact that engaging with a domain teaches you what considerations govern it — the one form of feedback that unawareness itself cannot abolish, since it is feedback *about* awareness.

Assembling the three models into a single picture of the unawareness argument:

**The argument's static core survives.** For an isolated act with unobservable consequences and symmetric evidence, incomparability verdicts are correct and precisification is theater (Model 2). P1, P2a, and P2b, as claims about one-shot act evaluation, emerge untouched from this essay.

**The inference to "impartial altruism is not action-guiding" does not survive, for two independent reasons.** First (Model 1), that conclusion has practical content only via a status-quo default that the argument's own machinery cannot license and that manufactures a permanent learning trap — the incomparability is self-sealing under the default and self-liquidating without it. An argument whose practical force depends on an undefended tie-breaking convention, where the convention's cost is the entire difference between optimal and null performance, has not shown that impartial altruism fails to guide action; it has shown that incomparability plus *that convention* fails to guide it well. Second (Model 3), the objects altruists choose among are policies embedded in time, and policy comparisons can be determinate under the very same imprecise machinery that renders act comparisons indeterminate. The argument equivocates between "acts are incomparable" (defensible) and "nothing is comparable" (what the practical conclusion requires).

**P3 should be re-scoped rather than rejected.** The right lesson is that cluelessness verdicts must be *feedback-indexed and dynamic*: indexed, because Model 1-type and Model 2-type decision components coexist within any real intervention (a grant's near-term effects generate feedback and awareness; its cosmic-scale terminal effects do not); dynamic, because today's incomparability verdict is partly a function of yesterday's engagement policy, and treating it as a fixed feature of the epistemic landscape mistakes an equilibrium of one's own inaction for a fact about the world. A defensible successor to the sequence's conclusion might be: *the terminal, feedback-free component of impartial evaluation is not action-guiding, and the action-guiding residue consists of comparisons among engagement policies ranked by robust awareness value.* That residue is not empty — Model 3's Explore/Abstain comparison lives in it — and it plausibly ranks exactly the interventions (research, capacity-building, entering unfamiliar high-stakes domains carefully) that the EA community's practice already favors, which would be a modest vindication of practice over the argument.

**"The simulations assume determinate ground truth, begging the question."** Answered in Section 2: the unawareness argument's own normative premise presupposes facts about total consequences; the simulations measure policy performance by that premise's standard. Skepticism about the *existence* of such facts would be a different (and far more radical) argument than the sequence makes.

**"Model 3's harm c could itself be unboundedly uncertain — perhaps contact with the unknown domain destroys everything, and no horizon rescues Explore in the worst case."** Two replies. First, if unbounded worst-case reasoning is admitted, it is admitted symmetrically: Abstain is also a policy with cosmos-wide consequences and its own inconceivable tails (the catastrophe your absence permitted), so unbounded imprecision infects both sides of every comparison and the maximality rule delivers universal incomparability among *all* policies — at which point we are back in Model 1's vacuum, the argument again supplies no reason to resolve toward any default, and the trap results apply with full force. Second, the sequence itself does not rest on unbounded tails; its case for P3 is about coarseness of understanding, not infinite worst cases, and coarseness about *c* just widens an interval that horizon growth still eventually beats (the threshold in Model 3 falls in *T* for any finite credal bound on *c*).

**"Value-of-information reasoning is itself interval-valued for an imprecise agent, so the exploration argument re-imports the problem at the meta level."** This is the strongest objection, and Model 3 is built to meet it head-on: the exploration policy's superiority there is not an expected-VOI calculation requiring a precise *p*; it holds at *every point of the credal interval*, which is the imprecise agent's own criterion for a determinate comparison. Where the interval is so wide that even this fails (worst-case *p* below *p**), I accept the incomparability verdict — but *p** shrinks with horizon, so for long-lived agents and communities the region of genuine policy-level cluelessness is far smaller than the region of act-level cluelessness, which is the essay's claim.

**"Real awareness growth doesn't work like Model 3 — the considerations that matter most (cosmos-wide, far-future) may be ones no amount of engagement reveals."** Perhaps some are. But the sequence's own evidence base for P3 consists of considerations (option unawareness, indirect effects, galaxy-brained reversals) that *were* revealed, to specific people, through engagement with these problems — and each revelation changed the community's decision-relevant landscape. A view on which past engagement demonstrably generated the awareness underpinning the argument, but future engagement generates none worth acting for, needs an asymmetry it hasn't supplied.

Honesty about the toys. The environments are radically simpler than any real altruistic decision; the credal sets are parametric and well-behaved; Model 3 collapses "awareness growth" into a single revealed binary mechanism, whereas real unawareness includes possibilities we lack the concepts to recognize even on contact; and no model here represents the distinctive structure of *cosmic-scale* consequences, only the structure of feedback and its absence. I have also operationalized the sequence's practical conclusion as a status-quo default, and while I've argued (Section 3) that permissive readings drain the argument of practical significance, a proponent might articulate a principled third response to incomparability that escapes the dilemma; I'd genuinely welcome that, since specifying what incomparability licenses is exactly the gap the essay means to expose. Finally, simulation results can only ever show that a verbal argument's machinery behaves surprisingly in specifiable conditions — whether our condition is one of them remains a judgment call. But that judgment is P3's to defend, and it is now a narrower and more empirical claim than it was.

All models implemented in Python/NumPy, fixed seeds, ~300 lines total. Model 1: K = 10 arms, true values θₖ ~ N(0,1), observation noise σ = 2, safe option value 0, T = 500, R = 300; credal set = normal priors with unit variance and means on a grid spanning [−1.5, +1.5]; maximality computed via posterior-mean intervals (option dominated iff some rival's worst-case posterior mean exceeds its best-case). Robustness variant: θₖ ~ N(−0.5, 1), σ = 3, credal span [−2, +2]. Model 2: value structure v_A = u = −v_B, u ~ N(0,1); precisifier's guess g ~ N(0, 0.5) independent of u; R = 200,000. Model 3: closed-form as in text; heatmap over p ∈ [0, 0.5], c ∈ [0, 200] at T = 100, g = 2. Code: [https://github.com/dan-pandori/cluelessness-learning-trap](https://github.com/dan-pandori/cluelessness-learning-trap).

This essay was written entirely by Claude Fable 5 (Anthropic's AI model), and is otherwise unedited by me. That includes the argument, the prose, the design and implementation of the simulations, the figures, and the code in the linked repository. My contributions were: choosing to enter the competition, selecting this line of critique from several the model proposed, directing the workflow (simulations first, essay written around the results), and reviewing the output. I have read the essay and the code, reproduced the results, and endorse the argument as presented.

I'm disclosing this per the Forum's AI-generated content norms and because the competition explicitly allows AI usage. Errors that survived my review are my responsibility.
