How should you slow down AI progress if it becomes necessary?

wpnews.pro

How should the world slow down AI progress if it ever decides it needs to? If you ever see substantial evidence of catastrophic risk emerging, social instability caused by mass unemployment occurring, or a software intelligence explosion (SIE) beginning that causes progress to outpace our ability to adapt, you could decide that it’s prudent to slow down progress to have more time to prepare and adapt to coming capabilities.

While there has been a lot of attention devoted to the question of whether you should slow down, thus far not a lot of attention has been devoted to the question of how you would slow down, and the actual instruments that you have available to cause a slowdown. Some commonly discussed mechanisms, such as token taxes, datacenter moratoriums, and 6-month training run s would all have significant downsides. This makes them, by themselves, unattractive as instruments to slow down AI progress and address societal or political concerns about AI. Instead, if you are forced to slow down, the most effective and least harmful approach would be twofold. First, to address catastrophic risks or a SIE, I’ll recommend a layered set of restrictions to slow down the rate of algorithmic progress by limiting the amount of compute that AI companies can pour into R&D internally. The first restriction would be a hard cap at a certain threshold of total R&D compute. The second would be a progressive tax below that threshold. And finally these two restrictions would be accompanied by a backstop in the form of a cap on training compute for individual training runs, to provide extra assurance against evasion. The hard cap on R&D and training compute would be targeted at risks that could arise more suddenly, such as misalignment and catastrophic misuse risk. And the progressive tax would be targeted towards risks that rise more smoothly with respect to capabilities (such as broader societal harms that require time to adapt to).

Second, to address mass unemployment concerns specifically, I’ll propose a capability-gated tax on AI deployment, as the intensity of deployment of powerful AI systems would be tied to displacement, and so metering inference should allow you to control the velocity of economic displacement. Both of these approaches should be designed as dynamic, conditional instruments that are able to be updated in the face of new evidence.

In this post I’ll sketch out some problems and desiderata for slowdowns, particularly through the lens of a critical window of capabilities where your risk-reduction efforts are most leveraged, as well as the concept of an overhang. I’ll argue that slowdown mechanisms should move us slowly through the critical windows of capabilities, avoid being overly blunt, and be dynamic so they can be tuned as evidence emerges. Then, I’ll present a taxonomy of policy levers and targets in the AI tech stack, and explain why the mechanism proposals need to be targeted at the layer of the tech stack that corresponds to where the risk comes from, and use the correct lever to match the harm structure of the risk you’re attempting to address. Finally, I’ll explain how these considerations motivate the above proposal, and briefly touch on what trigger mechanisms could be used as tripwires for when slowdown mechanisms should be implemented.

To be clear, I’m uncertain that slowdowns currently are or will ever be desirable, yet I think it’s an important question to interrogate because there may be scenarios where the risk is large enough to merit slowing down, or where progress will grow so fast that it will truly outpace our ability to adapt and prepare. One specific objection worth touching on: what about China?

Is it even worth thinking about slowing down if China won’t slow down?

Yes, for two reasons. First, the Overton window may shift in the future sufficiently to allow for an international agreement to be formed. Second, the gap between the US and China may widen enough to allow for the US to unilaterally implement these instruments.

To be precise about what a slowdown could get you, you need to think about how leveraged your actions are at different levels of capabilities.

One important concept to analyze the problem of slowdowns through is that of a ‘critical window’ of capabilities, which is a regime of capabilities where preparation is most leveraged, but before you actually enter meaningfully into a dangerous regime. Different risks may have critical windows at different points, and there may be multiple critical windows even for the same risk, but the concept is useful for all risk types.

For unemployment, for example, the critical window(s) could be just as displacement is starting to occur, such that political leverage is highest to prepare for coming automation and implement measures like unemployment compensation or [UBI/](https://epochai.substack.com/p/controlling-the-capital-after-agi)[Universal Basic Capital](https://epochai.substack.com/p/controlling-the-capital-after-agi). Additionally, seeing the capabilities available as they are causing unemployment to rise would help individuals and society broadly prepare for the coming changes.

For misalignment risks, the window could be where alignment research efforts are most leveraged. This could be because you’re closest to the capabilities you’re concerned about and so you’re most able to run experiments aimed at addressing them, or create model organisms of the type of risk you’re concerned about. This would also be the period where AI is most able to boost the productivity of alignment, resilience, and control efforts.

From this point of view, what you want out of slowdown mechanisms is to maximize the amount of time you spend doing useful work inside these critical windows. Of course, there isn’t certainty about where these critical windows start, and in truth they likely aren’t discrete windows but rather continuums where your efforts are more and more leveraged the closer you get to the dangerous capability. It will likely be uncertain whether you are in a critical window even when you are in it.

Spending time within critical windows should also not be seen as universally good, as it can actually increase certain risks like misuse, as it would raise the possibility of model weight theft, or proliferation of capabilities beyond frontier actors. Lingering within the window without doing useful work to prepare and adapt could thus be net negative.

Why frame this post around slowing down within a critical window, and not pausing as some advocates propose?

The uncertainty of the timing of critical windows means you can’t time a precisely, which is one of the reasons an absolute is unwise. Pausing too early would freeze you at a time where efforts aren’t particularly leveraged, and it would also prevent you from gaining more information about where the critical window is. Additionally, as we’ll see in the next section, pausing and unpausing before you enter a critical window may have a neutral-to-negative impact on the amount of time you’ll have later during the critical window. Instead, it is better to have dynamic braking mechanisms that allow you to modulate how severe the slowdown is depending on the available knowledge you have. Dynamic mechanisms might look like caps that rise over time at a certain rate (which can be modulated), or taxes with a variable rate.

A critical window can also shift upwards over time. If you spend sufficient time inside a critical window preparing while inside a , you could hit diminishing returns at that regime of capabilities. To continue improving societal preparation/adaptation, you’d need to let capabilities advance before once again slowing. This is a further argument in favor of dynamic instruments, as you want to be able to move upwards in capabilities slowly in such a scenario.[1]

In truth, pausing and slowing down exist in a continuum of braking intensity, and shouldn’t be seen as mutually exclusive. The best policy may involve slowing down to glide into a critical window, then locking in a for a while before allowing capabilities to grow once again but slowly. Crucially though, they both suffer from a similar cost: while your foot is on the brake, you are accumulating an overhang.

Let’s use a concrete proposal of a 6-month on training runs above a certain capability to illustrate how short s can actually have a neutral impact.

As you can see, in this scenario temporary s or slowdowns don’t increase the amount of time you get inside the critical window, which is where your time is most valuable.

However, the truth is even worse than the plot above suggests, as in real life capabilities wouldn’t simply be shifted to the right, as the curve would instead ‘snap back’ due to ‘overhangs’.

Because AI is the apex of an enormous tech stack that is constantly pushing forwards, attempting to slow down progress at any given layer of the tech stack risks creating ‘pressure’ in the previous layers (hardware, algorithms), as its technology progresses but isn’t implemented fully at the next layer up. This is called an overhang. If the slowdown mechanism is short enough, when it is released, the built-up pressure snaps back, creating faster progress than you would have seen otherwise.

In the above scenario, while you’re restricting training runs above a certain capability, hardware progress continues, as new technology process nodes are introduced, AI chip designs get improved, and AI compute infrastructure generally gets better, etc. This reduces the cost of a training run of the same size. Additionally, algorithmic progress also goes on, reducing the cost of reaching the same capability. So when one day the capabilities cap is lifted, all of a sudden with the same (or more) amount of money, you can run much larger training runs that give you even greater capabilities. Combined with this, you now have better algorithms which give you access to more powerful capabilities for the same amount of compute. This increases the expected returns from training runs and thus raises the amount of money that can be invested into running the training run. Both of these factors, combined with other improvements that happen in the tech stack during the , lead to a massive snapback in capabilities, at a faster rate than would have happened without the cap.

The shifted curve doesn’t snap back all the way back to where it would have been pre-intervention, but if the period of snapping back (i.e. having a higher slope than the no-intervention curve) coincides with the critical window, you’ve actually made things slightly worse.

There are a few factors that influence how much worse you’ve made things by implementing a temporary slowdown before entering the critical window. The longer the slowdown lasts, the less time you have inside the critical window (as long as you release the slowdown before entering the critical window). The farther the entrance to the critical window is from the end of the , the longer time there is for the overhang to exhaust itself and for your impact to return to neutral. And finally, how severe the or slowdown is also impacts the severity of the overhang you deal with after you lift the slowdown.

The main point I wanted to make with the above plots is that to be effective, a slowdown must last long enough so that you stretch the curve to the right instead of merely shifting it, such that the slowdown overlaps with the curve passing through the critical window region.

For example, let’s take a gradually increasing training compute cap that still allows training compute to grow, but at a slower pace. Commencing the slowdown before entering the critical window, and extending it for the duration you are inside it, increases the amount of time you have within it significantly. In the plot, I illustrate the slowdown ending entirely at some point. Among other assumptions, this takes as a given that there is saturation of preparation efforts, meaning that spending enough leverage-weighted time eventually ‘solves’ the risk you’re trying to address, and you can safely release the slowdown without risk.[2]

In reality the critical window would be a gradual continuum instead of a discrete window, given the uncertainty about when you are inside the critical window and when you’ve saturated your preparation efforts. Similarly, the slowdown should be dynamic, and change in kind with how leveraged your efforts are at different capabilities.

Here’s a plot with these new assumptions, plus a depiction of what the overhang would look like over time.

Here we see that there are two variables you must manage: maximize the leverage-weighted time you get by slowing down, and minimize the size of the overhang you accumulate. The first is obvious, but why do you need to minimize the size of the overhang? If you wait to release it until after your preparation efforts have saturated, what’s the problem? The problem is that the size of the overhang is correlated with enforcement difficulty. The overhang growing means that it becomes cheaper to reach the capabilities ceiling you are trying to enforce, and thus more actors become capable of reaching it. At the limit, this could make enforcement infeasible, as rival countries or small actors become capable of reaching past the ceiling regardless. Additionally, the larger the overhang is, the greater the incentive there is to defect and break past the ceiling. This is because you have a greater capabilities jump you could have access to if you defected.

If you let the overhang grow large enough, you’d be left with two alternatives. Either you raise your foot off the brakes and let capabilities keep advancing faster, or you target the very inputs that are causing the overhang to grow. This would likely be very costly, as stopping inputs like hardware progress or algorithmic improvements would require very significant enforcement efforts, and have a very large opportunity cost. Imagine the economic consequences of preventing more fabs from being built, or the political implications of trying to restrict the research that companies, startups, and academics can do. There are two opposing factors determining how large the overhang gets while inside the slowdown. The first is that the longer you are in a slowdown, the farther away the unconstrained maximum capabilities that could be reached gets. This is because improvements lower down the tech stack, like hardware improvements, continue throughout the slowdown period, and unlock larger jumps when the slowdown is released. The second factor is that unlike a , the slowdown does allow capabilities to grow, releasing some of the overhang pressure and reducing the distance to the unconstrained maximum.[3]

In this whole analysis, we’ve been looking at AI capabilities curves that rise exponentially, without considering when or how that process might naturally slow down. Progress may naturally taper off, due to things like physical growth limits, algorithmic insights getting harder to find, nearing the theoretical limit of algorithmic efficiency, and more. While there’s no guarantee you will be reaching this point anytime soon, one possibility is to hold on to the slowdown mechanism until you reach this point. This is one potential resolution to the overhang problem, as it would naturally dissolve instead of causing a snapback of fast progress. For example, have training run caps that rise slowly until they reach the point beyond which it is infeasible to train (e.g. latency walls).

Now that we’ve established what slowdowns should accomplish, what should they look like in practice?

To examine the landscape of options that are available for slowing down AI, you can consider 4 possible targets in the supply chain, and 3 possible levers for deceleration.

At a high level, AI progress can be thought of as a technology stack that you can break up into deployment/inference, training, AI R&D, and compute infrastructure and below. [4] It’s worth noting that final training run compute composes only about 10% of total R&D compute

Conditional regulation could be all that is needed in theory to manage the transition to transformative AI, and slow down progress as is necessary to address any risks. However, risk-specific regulation may not be sufficient or desirable relative to caps or taxes. This is because different risks may be too hard to define and codify into regulation, given the large uncertainty that exists about them. In the case of alignment and catastrophic risk, for example, it would be a significant challenge to formally regulate because there are a wide range of opinions about the nature of the risk (AI takeover, gradual disempowerment, concentration of power), and what would constitute sufficient protection against those risks. A general slowdown could be more favorable than a contested fight about what constitutes safe enough to deploy/train.

Taxes may seem like a less natural fit than the other two levers, given they don’t directly control the quantity of interest, but there’s actually a few reasons to favor them over caps, or at least use them as complements. From a Pigouvian point of view, what you’re trying to achieve with a tax is to internalize some risk or harm that is not being priced in, and in that sense you’re not trying to aim for revenue maximization or minimize market distortion. Instead, distorting the market in specific ways is the point of taxing with the purpose of slowing down.

One frame that the question of caps vs taxes for the purposes of controlling an externality can be interpreted through is a marginal harm point of view. If there are sharp discontinuities in harm from the controlled quantity (e.g. inference compute, or training compute), then that favors caps. If harm grows more smoothly, then taxes are favored. Under this view, hard caps may be a better fit for catastrophic risks and misalignment, where it either causes catastrophe or not, while taxes may be a better fit for unemployment and other societal harms, as those should scale more smoothly with controlled quantities. On top of this, taxes are closer to societally optimal in some ways, as it allows the highest value AI activities to continue, while lower value activities are priced out. As opposed to a cap where it might be arbitrary or subject to some other allocation mechanism less effective than a market. [7] Taxes also come with an overhang pressure release valve, as the willingness to pay higher taxes naturally rises over time, instead of being stuck at some fixed cap. The flipside of this is that the brake erodes in severity over time. Finally, while not the main purpose of them, taxes do generate revenue which can be used to actively reduce risks through adaptation and preparation.

Caps on the other hand are favored because they provide direct control over some quantity. If you know you want to prevent training runs above 1e27 FLOP, rate-setting a tax would be very difficult, as you would be one step removed from directly limiting the dangerous activity. Setting them too low might have little impact, and setting them too high might lose the benefits of using taxes instead of caps in the first place. While this is true, it’s also worth noting the counterpoint, which is that just as caps give you more certainty about a given quantity, taxes give you more certainty about the economic impact on industry. Setting a cap that is too aggressive may decimate the industry in a way you may be unwilling to risk, while setting it too high could have little impact in the short run, until the companies hit that wall. A final point in favor of caps is that they may be easier to enforce and monitor as they are directly tied to physical quantities.

Ultimately, I think the best approach will be to use a mix of all three levers to complement each other and make up for their individual limitations. An example of this could be a cap-and-trade system popularized in the carbon emissions case, although AI differs from the carbon case in ways that make this a less attractive option in particular, mostly because FLOP isn’t quite fungible in harm caused in the same way that carbon is. While a ton of carbon does the same amount of harm no matter where it is emitted, a marginal FLOP used for frontier AI R&D is not interchangeable with, say, a FLOP used in a startup.

To explore the desirability of different targets in the tech stack, I’ll step through a few concrete proposals that have been raised as possible mechanisms for slowdowns, explain why I think they’re not ideal policies for all the risks I’ve been considering, and then explain what I think is the best mix of instruments to deploy against which targets.

A flat token tax that uniformly hits all AI deployment would be ineffective at reducing some risks, and an imperfect way to target some others. With risks like misalignment or catastrophic misuse, most of the harms come from more capable models, rather than more widely deployed AI, meaning you aren’t addressing the risk directly by targeting deployment. [8] In the case of unemployment concerns, while you are directly targeting the layer tied to the harm, it’s imperfect because it uniformly hits deployment that may usefully augment labor instead of replacing it, as well as deployment of more capable models that may actually cause the unemployment impacts. A capability-gated deployment tax (which I’ll recommend later) would fix this discrepancy.

Targeting datacenters is unattractive because this would impact deployment as much as it would training and R&D, which is not ideal. Additionally, it would impact diffusion of lower-capability AI, which would mean you lose out on that benefit to the economy as well as an opportunity to stress-test institutions and culture. You would also hit AI applications that are not general-purpose, like AlphaFold, that have economic and scientific benefits while not causing that much risk.

This is perhaps the most famous attempt at a policy to slow down progress, as seen in the Future of Life Institute's ‘ Giant AI Experiments’ open letter that was created after GPT-4 was released that attempted to ensure no model more capable than it would be created for 6 months.

As seen in the Slowdown dynamics section, the largest problem with this proposal is that while it would gain us time now, it would not give us more time in the future at some critical window of capabilities, where you have much more information about the relevant risks, and are much more able to make progress on them.

To be meaningful, any slowdown attempt would need to last longer. The attempt to target training runs is not misguided in itself, and a modified version would actually be successful at slowing down as capabilities are so correlated with total compute. The main weakness is that targeting training compute does not directly meter the speed at which algorithmic progress can continue (that would require controlling R&D compute, as I’ll argue for in the next section).[9]

To best slow down AI given the possibility of catastrophic risks, mass unemployment, and a SIE, I think there are two instruments that need to be prepared. The first is a layered mechanism that targets AI R&D compute with a tax and a cap, as well as with a backstop in the form of a training compute cap. This instrument is best suited for slowing down AI in the medium to long term, and addresses risks from misalignment, catastrophic misuse, and SIEs. For the case of mass unemployment, the best option is capability-gated deployment taxation.

As you saw in the Slowdown mechanisms section, the amount of compute devoted to total R&D is about 10x the amount dedicated for final training runs. This is where most of the algorithmic progress occurs (outside of smaller innovations / optimizations like FlashAttention developed outside of the company). [10] Restricting R&D compute is the closest to a speed brake one can achieve. You can modulate the speed of algorithmic progress by limiting how much compute can be devoted to it, while at the same time restricting how much training compute can be devoted to single training runs (as training runs are part of R&D).

To construct the actual restriction, it is best to combine several different levers to achieve the best result. The first and most obvious restriction is to simply place a cap on the total amount of compute individual companies can devote to R&D in a given year. [11] This addresses threshold-structured risks that scale discontinuously with capabilities, so you want to stay strictly below a certain capability growth rate.

Additionally, you can offer companies an increase in the R&D cap conditional on approved safety cases for managing increased capability growth rates. This creates an incentive for companies to invest in safety research beyond commercial considerations, because safety work directly expands R&D compute budget.

To illustrate what this set of restrictions could look like, here’s a plot with the three restrictions on R&D compute shown as three different lines. The restriction on training compute isn’t shown as it’s a measure of FLOP, not FLOP/s. Also note that to prevent the tax from affecting small actors, the progressive convex tax only starts past a certain threshold of compute.

If designed correctly, the shape of the marginal tax rate curve could help modulate how quickly AI companies scale up the amount of compute devoted to R&D. As algorithmic progress advances, and better harnesses and deployment infrastructure is developed, the willingness to pay for more compute will naturally rise. This will help mitigate some of the overhang that will generate over time, and distort the market less than a cap which affects all companies that are right up against it equally. One notable downside of targeting total R&D compute rather than training compute is that enforcement might be harder to implement, given that R&D compute is spread out over a lot more experiments, synthetic data generation, and other purposes, instead of being concentrated in a single job. This is somewhat ameliorated by the fact that compute usage must be carefully tracked internally within AI companies, so you can rely on their internal metrics and systems when implementing any restrictions.[13]

Even assuming you can get around evasion difficulties, there are some genuine edge cases where it’s somewhat unclear how to classify compute usage, such as synthetic data generation or internal usage not directly for AI R&D (e.g. for finance). While it’s hard to predict all these edge cases in advance, the general principle I would advance is that it should count under R&D if it’s compute or API spend that is controlled by the company, and whose outputs benefit R&D in some way.[14]

One harm that R&D compute restrictions aren’t perfectly suited to address is job displacement, as that is not directly mediated by capability growth rates, but rather by deployment intensity (though of course R&D compute restrictions could serve as a helpful complement to deployment restrictions, especially in the long run). Therefore, the best target for restrictions would be aimed directly at deployment to control how quickly AI diffuses throughout the economy in a way that causes unemployment. Specifically, you need to make a distinction between AI deployment that augments and transforms labor without causing mass unemployment, and deployment that does cause replacement. [15] An ideal tax policy would only disincentivize and slow down the latter, without affecting the former.

A possible assumption that one could make as to what differentiates the two is that the more powerful an AI system is, the more it will be tilted towards replacement rather than augmentation. In that case, a simple flat tax on all deployment would affect the less powerful AI systems too that would have only augmented labor, which is not what you want. Instead, a capability-gated deployment tax that grows as the capability of the AI system does would more narrowly target replacement effects.

While there is some reason to believe this assumption is true, you should also prepare for the possibility it is not as there are forces that push in both directions (automation may create new tasks, as well as automate existing ones). Therefore the actual policy should be conditional on the displacement vs. augmentation pattern you observe in the economy. If it is indeed true that replacement correlates with frontier capability, then a capability-gated inference tax is the correct lever. If instead it correlates with inference intensity broadly and not particularly with capability, a flat tax rate would be the correct lever. If you do not observe replacement even at high capabilities, then there is no need to implement this lever in the first place. Much like you want dynamic instruments that could be tuned in accordance with the evidence on relevant risks, you can get the best of both worlds by monitoring the economy for triggers that would help you determine whether displacement is occurring, and if it is, whether it is tracking capability or intensity. There are two big problems with this recommendation that make it more tentative than the previous one. First, even if the assumption that capability tracks replacement effects is true, it’s not trivial to track which AI systems are most capable, as inference FLOP per query or other measures would be imperfect metrics. Second, the existence of open-weight models. Open-weight models set an upper bound on how high you can tax deployment before companies just switch to local open-weight models, and if open-weight models are capable enough to cause replacement, taxes aren’t well-suited to prevent that.

As I noted in Slowdown mechanism taxonomy, another benefit of this policy is that it would generate revenue right as labor tax revenue might be falling, fitting in with some existing theory on tax policy in the age of AI.

The biggest question after whether and how to slow down AI, is when to start doing it.

Under the three scenarios that could justify a slowdown — evidence of catastrophic risk, mass unemployment, and a software intelligence explosion — there are specific pieces of evidence that you need to be collecting in order to know whether those scenarios are occurring. And you need to build trigger mechanisms for when to execute slowdown mechanisms.

For catastrophic risks, this looks like alignment, capabilities, and catastrophic misuse evals, which have already been discussed extensively and there are existing efforts to cover this question.

For mass unemployment, this looks like monitoring the market and unemployment rates, but also trying to get evidence ahead of the problem actually occurring with AI-specific measures. Specifically, you also want to know more about the shape of coming automation, and how you can help shape it towards augmentation instead of displacement if it looks to be happening too quickly.

Finally, in the SIE case, there’s no one best trigger, so you can combine many such as productivity multiplier measurements, percentage of total compute going towards internal R&D deployment, and eval growth rates like the Epoch Capability Index. It’s also worth mentioning that even if a *software-only *intelligence explosion doesn’t occur, there may still be significant feedback loops in hardware R&D that speed up progress significantly. These should be easier to catch in time as making compute infrastructure has a naturally slower production cycle than software.

If a dynamic slowdown mechanism is implemented, some centralized body would have to make the decisions about how much to brake progress at different points in time, and how to release the brakes as you’re leaving the critical window. Given the rapid pace of progress and the slowness of centralized decision-making, one option for slowdown triggers is to build automatic circuit-breakers that trip mechanically after certain thresholds have been reached. Another option is to make a slowdown trajectory ahead of time, where you try to forecast how fast progress will go in the future, and brakes will intensify or soften by default unless its overridden. Considering the main risks of misalignment, unemployment, and SIE, this leaves us with dynamic, layered restrictions on R&D compute to address the capability growth rate, as well as a capability-gated tax on AI deployment to manage the diffusion rate of replacement-causing automation.

To be able to implement such a slowdown if it becomes necessary, you’ll need to build the necessary political and technical machinery ahead of time. There’s an argument to be made for implementing a weak version of a slowdown early on, so that it’s more politically palatable, and then later ratchet up the intensity of the slowdown mechanism as the evidence calls for it (or the opposite, release the pressure valve if it turns out the risk is lower than expected). There’s also an argument in the opposite direction, where building the authority to slow down would be like handing the government a hammer that they will use even if it’s not the best tool, and so you would be better off waiting until it’s truly necessary to build that capacity.

To be able to wisely use slowdown authorities, we’ll need to build the analytical capacity within decision making bodies to make those tough judgement calls about how severe a slowdown is necessary in different situations. Where should R&D compute caps be placed? What should the tax rate on deployment be? How should these restrictions evolve over time, and what will determine when they are lifted and at what rate?

It will also be helpful to be able to foresee different situations coming ahead of time, and for that you need analytical capacity within decision-makers, as well as information-sharing mechanisms to ensure they can see evidence as soon as it’s available. This could include mandatory execution and disclosure of evals, internal compute breakdown, productivity multipliers, and so on.

Hopefully, none of these authorities or mechanisms will ever become necessary, and you can reap the benefits of automation as quickly as the technology develops. However, given the potentially enormous social and technological risks that are at play, you should be prepared to step on the brakes if necessary.

A could still be the correct move under a few possible conditions: when harm is very discontinuous with capabilities, when leverage doesn’t grow by that much as you approach dangerous capabilities, when there are no good leading indicators of where dangerous capabilities are, or when the amount of time necessary to prepare within the critical window is so long that you can only achieve it by pausing entirely instead of progressing slowly.

Other possibilities are that new critical windows keep appearing, meaning you need to reintroduce slowdown mechanisms even after you successfully pass the first one, or that the critical window never fades, meaning you would need to set a permanent speed limit.

One subtle point here is the possibility of overhang decay. With very temporary overhangs, say with 6-month s, it’s obvious that algorithmic progress and hardware progress will continue at the same pace as they would without the , creating a 6-month overhang. But with longer slowdown mechanisms, say with 5 year AI R&D compute caps, decreased demand for compute because of slower growth in capabilities would also decrease hardware progress, diminishing the growth rate of the overhang the longer the mechanism is in place. This is because hardware progress is driven so much by learning-by-doing and economies-of-scale, so reduced demand over a long enough period would actually be sufficient to slow down progress. One further complication to an overhang forming is that the AI industry is already compute constrained and might be headed towards a compute crunch, which would further diminish the potential for an overhang to build, since demand wouldn’t fall below the available supply if this is the case. In general, it’s hard to be confident about any of these points, as they depend on very complicated questions of AI industry economics, hardware demand elasticity, hyperscaler capex commitments, and more.

This post doesn’t focus on targets below compute infrastructure, such as restrictions on constructing semiconductor fabs, or taxes on semiconductor manufacturing equipment, even if some advocates argue in favor of these. This is because the collateral damage of these restrictions would be overwhelming, and it’s much less targeted to the actual thing that creates the risk.

Though note this is only based on this one estimate for a single lab in a single year, plus some evidence from financial disclosures of Chinese AI companies at a smaller scale. It may be that this ratio could change over time.

Regulation is a catch-all term here, bundling together prohibitions, licensing, liability, information requirements, and more. I’m treating it as one category here for tractability, and because it’s not the main focus of this post. Also note that this taxonomy has fuzzy edges. For example, liability could function like a tax with different incidence, and sufficiently stringent safety case licensing could function as a hard cap on training compute.

An exception to this upside is if the highest willingness-to-pay companies/consumers come from those that also bring the highest risk, as you might for example expect if you're most concerned about targeting software feedback loops.

One caveat to this is that one of the drivers of capabilities growth is the feedback loop between revenue, investment, and R&D. By reducing revenue, you are adding friction to this feedback loop and so indirectly slowing down capabilities progress. However, you are doing so in an indirect way that doesn’t directly control capabilities (as opposed to training compute caps, which do directly target capabilities as they are so correlated with each other).

That being said, there are some reasons to think restricting training compute could also affect the speed of algorithmic progress. Much algorithmic progress is only unlocked at scale. For example, reasoning wouldn’t have worked meaningfully at GPT-2 scale, even with lots of R&D compute to try and discover it. New training paradigms require frontier-scale runs to test.

Arguably, restricting only company-level R&D compute would not be enough to slow down algorithmic progress, as you would still see field-level progress occur through research that is published and spillovers between frontier companies. If so, that would necessitate field-level caps as well as company-level caps. Another caveat worth noting is that company-level caps would only bind leaders, letting more followers converge, which is worse for some types of risks.

Setting company-level caps invites evasion through entity-splitting, and to actually execute this there would need to be some way to prevent it or aggregate across related entities.

Since there is uncertainty about where dangerous capabilities lie, the rate of growth in capabilities should be bounded to allow for evals or other triggers to catch dangerous capabilities right as they emerge.

For example, you can get a very rough measure of R&D compute simply by subtracting inference compute use from the total amount of compute owned by the company, and use that to double check your estimate of R&D compute. However, you still run the risk of an adversarial lab classifying their R&D compute use as ‘inference’ or some other category to evade the restrictions. One benefit of the training compute backstop is for precisely a scenario like this one, as it would be harder to classify a large training run as something else. To measure R&D compute, you should rely on FLOP aggregated across all activities, including failed runs, much like you would measure the total FLOP of a training run. This avoids the downsides like gameability or uneven impact that other measures like compute spend or accelerator-hours would have.

Here I’m using ‘augment’ to refer to scenarios where AI broadly increases wages for all workers. In reality, AI augmentation could actually cause more unemployment through mechanisms like the ‘superstar phenomenon’, where a few workers become so productive that they displace everyone else.

source & further reading

lesswrong.com — original article Separation of Knowledge and Reasoning? The Slogan Strikes Again Role confusion: sounding like the cause is indistinguishable from being it.

How should you slow down AI progress if it becomes necessary?

Run your AI side-project on zahid.host