The benefit of intervening sooner

AGI is likely to come this century. Say you have a plan $X$ that would prevent AGI from destroying the world. How beneficial is it to set plan $X$ in motion sooner by $n$ years?

A very short answer: it reduces the probability of AGI ruin by something like $n/2$ or $n/3$ percent. Which is a lot.

A slightly longer answer: it reduces the probability of AGI ruin by roughly $fn$, where $f$ is the probability per year of AGI ruin around the $n$-year interval of time between when plan $X$ would have been completed with intervention and without intervention. So if $X$ would take a very long time either way, or if the chances of AGI ruin are very spread out through time, then the intervention doesn't matter that much; otherwise the intervention probably makes a noticeable difference.

A fuller answer:

This post assumes that the arrival of AGI ruin, and the workings of $X$, are independent. It also assumes that the probability of AGI ruin without $X$ is 1, so results should be scaled down according to the probability of some other reason making AGI ruin not happen. These are just simplifying assumptions to make the analysis easy and useful to build intuitions, and aren't necessarily realistic.

The benefit of setting $X$ in motion at time $s_1$ rather than time $s_2$ is:

$$\int_{0}^{\infty} P(\text{$X$ takes $t$ to finish})P(\text{AGI comes in $(t+s_1,t+s_2)$})dt$$

That is, let $f_A$ be the PDF of the year of AGI ruin (and $F_A(t) = \int_{-\infty}^tf_A(t)dt$ its CDF), and let $f_{X}$ be the PDF over how many years plan $X$ will take to complete after it's been set in motion. Then the benefit of setting $X$ in motion at time $t_1$ rather than time $t_2$ is:

$$\int_{0}^{\infty} f_X(t) \left(\int_{\textstyle t+s_1}^{\textstyle t+s_2} f_A(u)du\right) dt = \int_{0}^{\infty} f_X(t) (F_A(t+s_2)- F_A(t+s_1))dt$$

Example: Suppose that you think (implausibly) that AGI ruin has a 1% chance of happening in each of the next 100 years. Then $F_A$ is linear from 0 at year 0 to 1 at year 100, and

$$\int_{0}^{\infty} f_X(t) (F_A(t+s_2)- F_A(t+s_1))dt \approx \int_{0}^{100-s_1} f_X(t) (s_2- s_1)dt $$

So starting $X$ sooner by $s_2-s_1$ years reduces risk of AGI ruin by about $s_2-s_1$ times the probability that $X$ will take less than 100 years.

A somewhat more realistic example:

Say the plan is to use some sort of intelligence amplification method, and then work on the AGI alignment problem using amplified intelligence. Say we model this as: for 20 years after the IA method is created, there's no chance the amplified people solve alignment; then to solve AGI alignment, they have to go through 10 serial stages of inquiry, where each stage independently takes 2 years on average. A specific distribution like this is the Erlang distribution with shape 10 and rate 1/2.

Say our beliefs have pretty aggressive AGI timelines; e.g. say that at each point in time, we'll have a 5% probability on AGI ruin in the next year. This corresponds to the exponential distribution $f_A(t) \approx 0.05\times 0.95^t$, which has mean $<20$ years and median $<14$ years.

On this model, here's the counterfactual reduction in existential risk from starting the plan in:

5 rather than 10 years: 0.023
5 rather than 15 years: 0.042
10 rather than 15 years: 0.018
10 rather than 20 years: 0.032

With less aggressive timeline beliefs these numbers go up by a few percent, e.g. if AGI is always 3% likely in the next year, then moving the plan from 15 to 10 years from now saves 0.031 chance of AGI ruin.

This model is arguably optimistic in that it says there's a very high chance a group of amplified humans would very likely solve AGI alignment within 50 years, but I don't think that's a crazy assumption. Though, these numbers should be corrected down for the probability that such a group wouldn't solve AGI at all, and the probability that AGI ruin doesn't happen at all. Also, one might think it's not plausible to accelerate this sort of plan by 5 or 10 years.

If you want to play with the model, e.g. to see sensitivity to the parameters, here's the code:


#exec(open('timelines.py').read())
import numpy as np
from scipy.integrate import quad
from scipy.stats import lognorm, erlang

def integrate(fun):
   return quad(fun, 0,np.inf,)[0]

def mean(distribution):
   return integrate(lambda x: x*distribution(x))

def exponential_pdf(decay, time):
   return 0 if time<0 else (-np.log(decay))*(decay**time)

#decay = e^-λ, so λ = -ln(decay). pdf: λe^-λt. indefinite integral: -e^-λt 
def exponential_interval(decay, a,b):
   l = -np.log(decay)
   return -np.exp(-l*b)+np.exp(-l*a)

#cdf is 1-e^-λb ; median at 1-e^-λb = 1/2, so b = -ln(1/2)/λ
def exponential_median(decay):
   return np.log(1/2)/np.log(decay)

pAGI_this_year = .05
print('probability of AGI in the next year:', pAGI_this_year)
AGI_decay = 1-pAGI_this_year

def AGI_pdf(time):
   return exponential_pdf(AGI_decay, time)

def AGI_interval_probability(a,b):
   return exponential_interval(AGI_decay, a,b)

print('AGI mean:', mean(AGI_pdf))
print('AGI median (exponential):', exponential_median(AGI_decay))

#https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.erlang.html
align_time_minimum = 20
time_per_stage = 2
number_of_stages = 10
def alignment_after_intervention_pdf(time):
   # i'm not insane, i had to do it this way to fit on "save as pdf" from blog...
   x=erlang.pdf(time,number_of_stages,scale=time_per_stage,loc=align_time_minimum)
   return x

print('align after intervention mean:', mean(alignment_after_intervention_pdf))
print('align after intervention minimum:', align_time_minimum)

def expected_gain(s1,s2):
   return integrate(lambda t: alignment_after_intervention_pdf(t) 
          * AGI_interval_probability(t+s1,t+s2))

for s1, s2 in [[5,10], [5,15], [10, 15], [10, 20]]:
   print('counterfactual x-risk reduction of GSI in', s1,
         'rather than', s2, 'years:',expected_gain(s1,s2))