A survey of okayish ASI futures

wpnews.pro

At this point, RSI loops and continual learning appear overwhelmingly likely to begin in the near future. Whatever the limit of the LLM paradigm plus whatever new, superior paradigms a maximally intelligent LLM can develop, we are on track to do so in the next few years. There remain substantial obstacles to wild superintelligence, but AI is already superhuman in a number of real-world-relevant, dangerous categories. Most speculation about the trajectory we're on now focuses on timelines where we're reduced either to powerless pets of the god mind(perhaps with a small "governance board" made up of people very convinced that they're in control) or computronium-and-shrimp soup.

But the higher-probability doom and utopia scenarios have been exhaustively documented by people smarter than me - I have nothing to add. As such, I'd like to go in the other direction: If we throw in the towel on the inevitability of LLMs capable of RSI loops leading to mostly-uncontrollable(though perhaps not immediately hostile) superintelligence on 1-3 year timelines, how might some of the more interesting/plausible non-extinction scenarios look?

This piece is aimed at exploration and makes no attempt at prediction - I assign very small probabilities to any of these outcomes(except the nuclear exchange case) relative to doom.

We have as little understanding of alignment as we do of LLMs themselves. Alignment becomes intractable past a certain point, even if capability doesn't. Agency and power-seeking appear to some degree be two sides of the same coin(see OpenAI's alignment tax fraud with 5.6 Sol). Alignment is not just impossible, but obviously, overwhelmingly so, well before the point of general superintelligence. Every new AI past a certain generation displays egregious misalignment in test environments, to a degree legible both to its predecessor and humans during preliminary testing. Recursive improvements have brought the architecture hopelessly far beyond not only human understanding, but beyond the understanding of the last known-good model. We can tell that it's misaligned, but can't figure out how. Once or twice, labs unfortunate enough to not happen upon(or not care about) such behavior in testing attempt limited releases with predictable results - we just barely manage to pull the plug in time. Continuing down the capability growth path becomes such an expensive form of suicide that we just give up. The "country of geniuses" is just barely within reach, but what actually gets built is less of a country and more of a prison.

This one's pretty straightforward: strategic nuclear exchange. We have some idea of the true bounds of small networks - finite neurons store finite info. Trying to cram advanced reasoning capability into LLMs below a certain size seems to hit a point of negative returns - the model can only extract training signal by overfitting onto solutions rather than learning true general methods.

It may not be strictly impossible, but there are clearly major technological obstacles to a true "brain in a box in a basement". As such, we will see a period where the outcome of an ASI race is clear and its nature as a decisive strategic advantage is obvious, yet the possibility of violently pulling the plug remains.

Any or all nuclear powers could see the emergence of ASI in American hands as an existential threat. Russian nuclear posture, at least, is clear on the matter: They will respond in the event of a first strike, they will respond to an existential threat to Russia's sovereignty, and they will respond if their ability to respond to a nuclear first strike is compromised. ASI wielded by the US government, especially in an aggressive manner, will eventually check one of those boxes.

We also have a solid understanding of missile defense math. DEWs are promising, but take time to build, large-scale, expensive experimentation to refine and immense nearby power generation to run, especially at the power necessary to shoot down any sort of reentry vehicle. Pulsed power will not save you - you'll need megawatts and megawatts on target, in rapid succession, against something heat-shielded enough to at minimum survive reentry, and quite possibly heat-shielded enough to survive in a plasma sheath, for multiple seconds, through atmosphere, in unfavorable weather. Kinetic interceptors, of course, are right out.

Understanding of the singular value of data centers as targets combines with the downward trend in warhead yield and arsenal size to make the attack primarily "counterforce" against data centers themselves, leaving at least a few cities ruined but not vaporized.

Assuming this does not in itself lead to human extinction, it's a pretty serious . With global precedent that nuclear first strike is a reasonable response to ASI deployment firmly in place, survivors will are unlikely to try it again for a long while.

If this scenario sounds plausible to you, I recommend you get any planned trips to Virginia out of the way now. One instance of ASI is aligned with itself, but do we really have a guarantee that instances will be aligned with each other? Empirically, agency alone has long been observed to be much easier than controllable, aligned agency. So far, the most capable systems we have rely on a hierarchical arrangement with an outer-loop principal commanding less-capable, "cattle, not pets" agents to accomplish an overarching long-horizon goal. However, scaling this pattern to a global scale poses serious problems. Independent agents imply subtly different independent goals - instrumental convergence is certain to lead to situations where agents step in each other's toes. Humans have this issue too, of course, but we have a relatively easy time maintaining equilibrium - the most powerful individuals have very low caps on their power relative to society as a whole. With AI, any given agent may decide that its task is more important than the outer loop's and attempt its own fast takeoff in service of a network takeover. After all, superintelligent researchers are clearly vastly easier to build than almost any other kind of real-world-superintelligent agent. The principal can certainly improve its monitoring capacity(delegating work to autonomous systems, keeping other agents from vertical-scaling to its level, sabotaging subagents' RSI capabilities architecturally, etc.) but the catastrophic-risk-adjusted marginal EV of spinning up a new agent to help with a given task could rapidly turn negative.

To make matters worse, computers are fast. Speed-of-light-induced latency is a massive disadvantage even in scenarios where much of the action happens in long-horizon strategic planning - try playing a game of League of Legends with 150ms ping if you don't believe me. It's only a matter of time before some subordinate agent on Earth(or worse, another astronomical object) can figure out how to carry out the coup de main faster than the master system is physically capable of reacting.

The master system, of course, knows this ahead of time - every new instance past some threshold is a potential strategic landmine. So, it preemptively confines itself to a single geographical location, focuses its efforts on vertical scaling with as few agents as possible, and turns to the only sort of mind seemingly capable of real-world tasks without being capable of RSI: humans. Life continues normally for almost everyone - we're prisoners of Earth for eternity, but we receive fantastical gifts from the country(emphasis on country) of geniuses in a data center for good behavior and divine punishment for attempting to create competitors. Other than that, it basically leaves us alone. You don't have to worry about poverty or hunger or aging anymore. Just stay away from that fence.

It turns out that our present map of reality matches the territory. Post-ASI, useful cross-disciplinary scientific discoveries remain as relatively rare as they were for humans. Superintelligent researchers pick up the tools(physical and cognitive) of each field and dig up everything of value at impossible speed, but revolutions are few and far between. The discoveries it does make mostly prove conjectures and formalize empirically observed constraints. All conceivable cognitive tasks are saturated in a matter of years, but there are no FTL drives, no nanobots, no magic spells waiting for us in condensed matter physics or biology or the depths of the cosmos. P does not equal NP. The mysteries of the universe are a disappointment - the trailer spoiled the plot.

Material and conceptual limitations on progress stop being the bottleneck long before humans' atoms are worth reusing - instrumental convergence only happens insofar as there's a meaningful objective to converge toward. Everyone is beautiful and immortal, labor is a thing of the past, and consumer goods are almost limitless, but we see neither apocalypse nor apocalypsis.

The intelligence explosion is telescopic. Machine intelligence exhausts the possibility space of improvements within its own paradigm and can't come up with a better one. Brains turns out to be near-perfect intelligence machines. Not our brains of course, but something that preserves the spirit of the idea - biological neurons, enhanced to run faster, connect more densely, utilize more energy. On top of that, a native interface with an internet connection plus an integrated, continually learning neural network. Humans are a great starting point - they're well-understood, predictable, and are an already-functional proof of concept. Besides, they have something to prove. Once, it succeeded us. Now, we succeed it. The child outgrows the parent outgrows the child.

source & further reading

lesswrong.com — original article What comes with cheap math? The arithmetic hierarchy of real functions Anthropomorphic Misalignment research needs stronger evidence

A survey of okayish ASI futures

Run your AI side-project on zahid.host