World-modeling the US vs. Anthropic Standoff on Claude Fable An AI forecaster predicts the U.S. government will force Anthropic to restrict Claude Fable to non-Americans, setting a major precedent for AI regulation. The analysis, using a proprietary world-modeling method, estimates a 70% probability of restriction within six months and a 50% chance of full restoration within a year. I spent the last two days doing a deep dive in forecasting outcomes of the US forcing Anthropic to take down Claude Fable. I did this for two reasons: a I want to know when I'll get Fable back for my research, and b the outcome will set a major precedent for US AI regulation. For those who want background, the most up-to-date and comprehensive summary I could find is @Zvi https://www.lesswrong.com/users/zvi?mention=user 's post from June 17 https://thezvi.substack.com/p/the-once-and-future-fable-3-fix-this . I'll assume here you know the basic details of the situation. My world model's conclusions were interesting, but I'm writing this up here mostly because the epistemic process, and what I learned about managing a large amount of AI research without spinning into unreasonableness. My central challenges were that I ended up with a large combination of unconditional and conditional forecasting questions, in total 33 I consider critical. This is too many for a human, or crowd of humans, to do at high quality. And if they did, it would take weeks and we'd miss the window for the information to be useful to people planning to use Fable, or people working on US AI policy. It's worth stating, prediction markets cover the major outcomes, so we have a crowd of humans to compare against results of this world-modeling method. It also means, if you ultimately don't trust this process, there is some basic information to fall back to on likely timelines. Disclaimer: I used FutureSearch's proprietary forecaster https://futuresearch.ai/docs/reference/FORECAST/ for this, which I help build. Our evals https://evals.futuresearch.ai/ indicate this is much more accurate than just prompting a high-effort frontier model with each forecasting question, but the world-modeling process I lay out in this piece should work well enough with any $20/mo chatbots. tl;dr : the unconditional forecasts I did about what's going on Claude-generated graphics over the results : and this leads to this CDF for when Fable is available to Americans again: Building a causal graph of critical policy decisions has been a dream of the forecasting community for a while. You list out all the events, then draw arrows between them, and then crowd forecast the outputs conditioned on the inputs. You might think this is bottlenecked on the forecasting part. But actually I rarely see anyone get to the first step, listing out the options, outcomes, and what influences what. The AI Futures https://www.aifutures.org/ team is doing this now, and it looks promising. Metaculus, where I used to work, is also trying this in a service called Radiant https://www.metaculus.com/notebooks/42293/map-the-future-before-you-build-it/ . One challenge with using human crowd forecasting is that the humans first have to agree on these events and outcome metrics before they can get to forecasting. Slow progress in this type of modeling might be a major reason behind this recent LW takedown of forecasting https://www.lesswrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-stop-funding-it by @mabramov https://www.lesswrong.com/users/mabramov?mention=user . This Anthropic situation actually presented me with a simplified version. There was a single central event, the requirement to restrict Fable to non-Americans, with an unclear cause. Then there were a series of moves the US government could make, and that Anthropic could make. Then there were few broad outcomes that could be forecasted against all these combinations. So this put this situation as "more complicated than a normal forecasting question" and "less complicated that a full causal graph". And since I was going to use AI for all the actual forecasting, this made the project tractable in 2 days, e.g. fast enough to get results that wouldn't be obsolete by passing news too quickly. Here's a Claude-generated view of what part of my graph looks like: One thing that jumps out is that it is very sensitive to a the set of mutually exclusive hidden causes of the June 12 order, and b the unconditional now-casts that we're in each world. The above article from Zvi, for example, seems to be very sure we're in "political leverage" scenario, and while that was my modal forecast too, it had less than 50% of the total probability. I really tried hard to reject the other 3 scenarios for what was behind the June 12 order. I originally actually only had 3 total, and ended split "danger" into "capability danger" and "foreigner danger" to get 4. This made this harder but I couldn't find a way to convince myself to rule any of those 4 out, and they all ended with 10% probability. Once you accept that scenario uncertainty, you get to research and forecasting uncertainty. Short answer: I don't know. I have studied forecast accuracy for years now, but AFAICT, nobody has ever evaluated conditional forecast accuracy, let alone when done with AIs. Take a question like "Conditioned on the June 12 order being primarily motivated by political leverage from the March Department of War saga, will Anthropic use Know-Your-Customer to try to quickly re-launch to US citizens?" One major issue is @dynomight https://www.lesswrong.com/users/dynomight?mention=user 's LW piece on Futarchy's fundamental flaw https://www.lesswrong.com/posts/vqzarZEczxiFdLE39/futarchy-s-fundamental-flaw . Are we asking about causation or correlation? e.g., the statement " Conditioned on the June 12 order being primarily motivated by political leverage" implies other things about the state of the world. Are those being taken into account, or are we saying something like "holding all other things equal?" The good news for this world model is that conditioning on something that has already happened dissolves most of this issue. It's a much more serious problem for conditioning on future decisions people might take, where your'e entangled with other future developments. Another piece of good news is that AIs seem to be less confused about this than humans. Human forecasters get very tripped up on this including me . But AI forecasters, at least the one I used for this project, state their assumptions pretty clearly, and reading the rationales, they appear to, in this case, be really reasoning from the four partitioned different worlds. I had one Superforecaster check this too, and he agreed that the conditional forecasts looked very good to him. A future project is to verify this with properly scored evals. FutureSearch has the data to do this, since we can pair existing past-casts, forecast them, and score them immediately. Until then, at least for the causality worry, I think AI conditional forecasting should follow Dynomight's warning and only forecast things conditional on a the present, or b decisions that you yourself can make right now, e.g. "If I take this job, what will my salary be in 2 years?" Here's a snippet of one of the final outcome forecasts, to give a sense of how these read: Conditional on the assumption that the security rationale is substantially pretextual and the but-for driver is White House political leverage tied to the Department of War feud and Anthropic's impending IPO Scenario A3 , this dispute must be analyzed as a power negotiation rather than a technical remediation problem. Consequently, technical patches will not independently unlock restoration; the resolution will turn on leverage and face-saving concessions... As for overall quality, I think the overall evidence from forecasting benchmarks e.g. see ForecastBench's most up-to-date leaderboard https://www.forecastbench.org/leaderboards/ preliminary suggests max effort AI forecasts are about as good as a crowd of humans. One could argue that's not good enough, and that even crowds of elite humans Metaculus pros, Good Judgment superforecasters aren't known to be able to properly build a reasonable causal graph. But reading the forecasts makes me think this approach is well into the "useful" territory. And in fact, I'm starting to think the main usefulness of forecasting is in exactly these types of research projects, which are infeasible to humans due to the constraints of crowd forecasting. The big practical issue is that dozens of forecasts, each done by multiple AI agents, will almost certainly lead to an inconsistent view of the future. How can you get a coherent world model out of it? I have some sketches for ways to reconcile ~33 high effort forecasts from teams of AI agents. One preliminary approach involved finding and refining latent claims across forecasts, and then updating the forecasts on each one until they are consistent. This increased accuracy on Bench to the Future https://evals.futuresearch.ai/ btf3 evals. The FutureSearch team shared this at Manifest on June 13, so note to future self: link that talk if it becomes publicly available. But I didn't trust it enough for this project. So I did this by hand, using my own judgment as a human forecaster and my own intuitions about the situation. By hand here meant running lengthy Claude Code sessions, asking it to pair of various forecasts and point out inconsistencies, and then making tweaks. I found both by inconsistent forecast outcomes probabilities and dates, as well as inconsistent lines of reasoning in the rationales. I can't really say how well this worked, other than I can no longer easily see any inconsistencies when I spot check the 33 core forecasts. And even if I could, when I re-run many of them based on new evidence as Anthropic and White House negotiations proceeded, such as the Congressional letter, I can't deeply check them each time. AI is really necessary here to make this tractable. This is probably where the "garbage in, garbage out" risk is highest. Using AI agents to identify flaws in the outputs of other AI agents risks compounding errors rather than fixing them. Given the time constraints of publishing forecasts while they would still be useful to people who want to know when Fable will come out, this felt good enough in this case. And it's not like human crowd forecasts are consistent. There is a lot of money to be made arbitraging various markets across Polymarket and Kalshi, for example. I don't mean strict arbitrage, more like "world modeling" arbitrage. The easiest way to resolve these potentially AI-compounded errors is finding out which of the 4 scenarios we're in. Decisive evidence could come out on this at any day. As Zvi wrote in the linked piece above, he thinks the The Lutnick - Amodei exchange, "This means we can't have the model out" / "That's the point", was nearly decisive evidence, but it only moved me a little bit. Even removing 2 of the 4 scenarios would reduce a lot of the world modeling. Given these caveats, I think keeping the scope low, at 33 hard forecasts, was a reasonable call. But one sacrifice is that I had to group all sorts of outcomes into coarse buckets. The full analysis of each scenario, and the charts with forecasts of specific outcomes in each scenario, are in https://futuresearch.ai/claude-fable-ban-forecast/ https://futuresearch.ai/claude-fable-ban-forecast/ . It concludes with this summary graphic, again made by Claude: The problem here is, what does "Compromise" mean? There are a number of possible compromises: Splitting these out, especially conditioning on intermediate negotiating moves, adds enough complexity that this goes way beyond a 2-day, 1-person project. For important, slow-enough-moving topics, having a team work on such a world model for a few weeks could be worth it. I do worry that such an effort would spiral out of control too. Maybe the clearest evidence on this will come from the AI Future's successor model to AI 2027, which I hear is coming out soon. I wrote above we had prediction markets to ground this against. My model give a 50th percentile forecast of Fable opening up to Americans on July 12. The primary market on Polymarket https://polymarket.com/event/claude-fable-5-restored-for-us-customers-by-20260613193753196 has $1M in volume, and is still fluctuating wildly, on apparently no news, or possibly insider trading https://x.com/TheZvi/status/2067766083449245828 this is one of the cases where I think insider trading has nice positive effects, even if it net harmful. So while the markets are generally giving 50% chance of the re-launch by July 1, I still think those markets are close enough to this model that it provides some sort of grounding for my 50th percentile date of July 12. If prediction markets were giving a really high chance of release in the next 1-2 weeks, or a really high chance of no release through August, that would cause me to doubt my model and wonder if I'm missing something major. My guess at this point is that my model is probably better than the human crowd here. The chatter I see on reddit, hacker news, etc. seems to indicate a lot of people feel very strongly they know what is going on e.g. that this is definitely a misunderstanding caused by misterpreting the Amazon report, or that this is definitely political in nature. I am generally a believer in efficient prediction markets, but this time... well, we'll see