HIA and X-risk part 2: Why it hurts

Table of Contents

1 Context

Previously, in “HIA and X-risk part 1: Why it helps”, I laid out the reasons I think human intelligence amplification would decrease existential risk from AGI. Here I’ll give all the reasons I can currently think of that HIA might plausibly increase AGI X-risk.

1.1 Questions for the reader

  • Did I miss any important reasons to think that HIA would increase existential risk from AGI?
  • Which reasons seem most worrisome to you (e.g. demand more investigation, demand efforts to avert)?
  • Which reasons, if any, are cruxy for you, i.e. they might make you think human intelligence amplification is net negative in expectation? Up for a live discussion / debate?

1.2 Caveats

The world is very complicated and chaotic and I can’t plausibly predict even important questions like “what actual effect would such and such have”. I can’t even plausibly resolve much uncertainty, and the world is full of agents who will adaptively do surprising things. So the actual search procedure is something like: What is a way, or reason to think, that HIA might increase AGI X-risk, that could plausibly hypothetically convince me that HIA is bad to do? This is mostly a breadth-first search, with a bit of deeper thinking.

In particular, many of the reasons listed below, as they are presented, are, according to my actual beliefs, not true or misrepresented or misemphasized. However, that said, this is an attempt at True Doubt, which partly succeeded; some of the reasons listed do give me some real pause.

This is a similar project as “Potential perils of germline genomic engineering”. As in that case, keep in mind that part of the reason for this exploration is not just to answer “Should we do this, yes or no?”, but also to answer “As we do this, how do we do this in a beneficial way?”. See “To be sharpened by true criticisms” in “Genomic emancipation”.

2 What is HIA?

I’ll generally leave it fairly undefined what intelligence is, and what human intelligence amplification is. See “Overview of strong human intelligence amplification methods” for some concrete methods that might be used to implement HIA; those methods suggest (various different) specific functional meanings of intelligence and HIA.

Because we are being imprecise:

  • Critiques of HIA can bring up many possibilities—e.g. they could claim that HIA would tend to also affect some other trait for the worse.
  • Defenses of HIA can also bring up many possibilities—e.g. they could say “HIA is good if done in such-and-such specific way that falls under the general category”.

2.1 Vague definitions of intelligence and HIA

Vaguely speaking, by HIA I mean any method for increasing a living human’s intelligence, or for making some future people who in expectation have a higher intelligence than they would have otherwise had by default. Generally, we’re discussing strong HIA, meaning that the increase in intelligence is large—imagine 30 or 50 IQ points, so going from average to genius or genius to world-class genius.

Vaguely speaking, by intelligence I mean someone’s ability to solve problems that are bottlenecked on cognition (as opposed to physical strength or stamina, or financial resources, etc.). A priori, this could include the whole range of cognitive problem-solving. So, we include stereotypically IQ-style cognitive problems, like math or engineering. But, we also include for example the ability and inclination towards political charisma, wisdom, good judgement, philosophical ability, learning, questioning and attending to something steadily, creativity, good performance under stress, empathy, contributing well to teams, memory, taste, and speed.

On the other hand, we do not include other cognitive traits, such as kindness, agreeableness, emotional valence, emotional regulation, determination, conscientiousness, and so on. These are important traits in general, and it might be good to also give people the tools to influence themselves on those traits (though that might also be fraught due to coercion risks). But this article is focused more narrowly on intelligence, rather than all cognitive traits.

In practice, intelligence refers to whatever we can reasonably easily measure. If a trait is hard to measure, it’s hard to increase. (This is indeed a cause for concern, in that only increasing traits that are easily measured could be distortive somehow; the claim under discussion is whether HIA is good even under this restriction.) More specifically, intelligence refers to IQ, because IQ is fairly easy to measure. IQ is far from capturing everything about someone’s ability to solve problems that are bottlenecked on cognition. But in this article we take it for granted that IQ is a significant factor in those abilities, and we presume that IQ can be increased.

2.2 HIA as a general access good

One dimension we will fix is distribution: We will assume that HIA comes in an open access way. In other words, defenses of HIA can’t say “we’ll only give HIA to the people who are morally good, and therefore there will be a bunch more brainpower directed in a morally good way”. That’s because a restricted access implementation of HIA seems largely infeasible and also morally and ethically very fraught.

I don’t think it’s an absolute principle that if you come up with an effective HIA method you have to immediately share it with everyone. But I do think there’s a strong moral weight towards doing so; and there’s separately a politico-ethical weight (meaning roughly “it’s not the sort of thing you should do as a member in good standing of society, even if it’s moral, because it would justifiably cause a lot of conflict”). Because of the politico-ethical weight especially, in many scenarios it seems logistically infeasible to do very much selection of who gets access.

This ethical weight towards general access is strongly increased in the case of reprogenetics. Reprogenetics is inherently a multi-use technology, and is already being used by polygenic embryo screening companies to enable parents to decrease disease risks in their future children. This means that society has a very strong justified interest in reprogenetics being equal-access (in the medium-term, once the initial expensive development stages have been completed). Since reprogenetics is likely to be the most feasible HIA method (see “Overview of strong human intelligence amplification methods”), open access seems like a reasonable mainline assumption.

Finally, open-access HIA might be harder to defend as helpful for decreasing AGI existential risk, compared to some sort of hypothetical restricted-access HIA. So, defending the claim that even open-access HIA decreases X-risk is a stricter test; if passed, it should provide stronger evidence that HIA is good to pursue.

2.3 HIA and reprogenetics

Since reprogenetics is likely to be the most feasible strong HIA method, it’s hard to discuss HIA in general completely separately from reprogenetics. The type of HIA available, the timing of its advent, what other traits can be influenced, and how society will react are all potentially heavily affected if the method is reprogenetics specifically.

Still, as much as possible, this article aims to discuss the impact of HIA in general, factoring out impacts from any specific HIA methods. For thoughts on the downside risks of reprogenetics, see “Potential perils of germline genomic engineering”.

3 AGI X-risk

3.1 Background assumptions

This exploration assumes:

  • If anyone builds genuine AGI, everyone dies, unless AGI alignment has been solved.
  • AGI alignment is extremely technically difficult to solve.
  • People will continue pursuing AGI research, for various reasons, unless those reasons are removed and/or there are very strong reasons to not pursue AGI research.
  • There’s a substantial probability of AGI coming in the next 10 years, and also a substantial probability of AGI not coming for many decades (and anything in between).
  • The top strategic priority is to avoid building unaligned AGI.
  • There’s something called “AGI capabilities research”, meaning “research that adds to the technical understanding of humanity about how to make AGI”.
  • AGI capabilities is always bad because it ticks the global clock forward towards AGI.

3.2 Red vs. Blue AGI capabilities research

I want to introduce a piece of vague terminology to help with discussing the strategic landscape. Very vaguely speaking, there’s a spectrum of AGI capabilities from “Red” (near-term, big training runs, lots of attention) to “Blue” (blue-sky research). It’s of course far from actually one-dimensional, and some entries in the below table are quite debatable (i.e. maybe the two entries should be swapped). Still, I want to use this one dimension as a rough-and-ready way to divide the space up by one degree. To give more flavor of the dimension:

Red Blue
hot, active, fast-paced cool, gradual, slow
happens at companies happens in academia
has a big pile of resources (large amounts of money, compute, research talent, software engineers) doesn’t have a big pile of resources
can effectively deploy a big pile of resources can’t very effectively deploy a big pile of resources
requires a big pile of resources can continue with a small pile of resources
does PR doesn’t do PR
seeks and gets lots of attention doesn’t seek or get much attention
exploit explore
concentrated and siloed in a few large organizations and a few large projects within those organizations diffuse; lots of small labs and individual researchers sharing ideas more openly and piecemeal
visible, legible hidden, illegible (happens in colleague discussions, obliquely discussed in math/CS journals)
compute-based; practical; experimental understanding-based; conceptual
make products make publishable ideas
weakly or mediumly contributes to deep AGI capabilities progress strongly contributes to deep AGI capabilities progress
gathers steam more quickly gathers steam more slowly
likely to complete the last mile of research to an intelligence explosion less likely to complete the last mile of AGI research

A plausible (AFAIK) first-approximation model is that at any given time, Red research is the most likely to set off an intelligence explosion. Red research takes existing ideas that have already been somewhat proven, and then cranks them up to 11 to see what happens. On the other hand, Blue research is most likely to contribute to getting to AGI in the longer run.

Red research is easier to regulate than Blue research. That’s because Red research requires big piles of resources, and is generally more visible (PR, products, large salaries, brand recognition). In particular, the physical needs of a large datacenter (energy, heat, chips) can be detected and regulated. Blue research can be carried out with consumer computers and via intellectual discourse, and it uses more specialized theoretical ideas, so it is harder to detect or even define.

4 An ontology of effects of interventions on world processes

In general, with some strategic intervention, the question arises: What processes in the world does this intervention speed up / support, and what processes does the intervention slow down / disrupt?

To a rough first approximation, the intervention is good if and only if the expected net change in all the speeds of the affected processes is a good change. So, we can get a rough guess for the value of an intervention by making guesses at how it affects each separate world process. Then, an argument that HIA is bad takes the form “This process is bad and is especially accelerated by HIA” or “This process is good and is especially decelerated by HIA”, or “Process X is worse than process Y and process X is accelerated by HIA more than process Y is accelerated”.

The next subsection will make some general remarks about the meaning of “acceleration”. The following two subsections will give a list of categories of ways that HIA could affect the speed of some process. (They don’t try to present a comprehensive ontology; I just think dividing up the space somewhat, even a bit arbitrarily, is helpful because it makes it easier to think in terms of specifics while also searching broadly through much of the whole space.)

4.1 The meaning of “acceleration”

To get some kind of handle on the menagerie of plausible effects of HIA, I’ll give a list of categories of ways that HIA could affect the speed of some process. These will be phrased as which processes “are accelerated” by HIA. This is vague, for convenience, but some notes to clarify a bit:

  • The way that HIA affects processes might change over time, so for each of these categories, we could ask how it will change by the time HIA starts affecting processes.
  • The basic point of comparison is the world without HIA.
  • But we might also want to discuss whether a process is accelerated by HIA more than “all processes” are accelerated by HIA “on average”.
  • We might also want to discuss which processes are accelerated more than we might have expected according to some simple rule.
  • A comparison between rates (in the HIA world vs. the non-HIA world, or between different processes, or in anticipation vs. reality) is vague. But the comparison could for example mean that the timeline until some important event within a process moves sooner in absolute terms or proportionally relative to other landmark events in that process; or the comparison could mean races between different processes tilting in favor of one or another.
  • If there’s some feature of a process that implies the process will be accelerated by HIA, then a weaker form or negated form of that feature might make a process tend to be accelerated less or decelerated.

4.2 Effects of HIA on a single process

  • Some processes are accelerated because added per-capita brainpower directly solves more problems within that process. E.g. because…
    • …the process makes good use of brainpower (e.g. onboards it well, allocates it well, heeds it well, supports it well);
    • …the process is bottlenecked on brainpower (as opposed to legwork, etc.);
    • …the process is bottlenecked on high-caliber brainpower (e.g. math research);
    • …the process has a structure of problems and solutions that lends itself to more brainpower (e.g. currently just below some threshold; more parallelizable).
  • Some processes are accelerated more because people who benefit from HIA tend to be personally inclined to contribute to those processes. E.g. because…
    • …intelligence in general causes people to have those inclinations (e.g. being especially interested in or capable for certain kinds of activities);
    • …the specific form of HIA affects interests or values (e.g. by emphasizing some aspects of cognitive performance over others, or by directly affecting interests);
    • …HIA tends to be applied to people with certain characteristics or who are around other people with certain characteristics (e.g. people who choose HIA for themselves or for their children having certain interests, personality traits, or orientations to themselves or their children);
    • …if someone benefits from HIA and knows it, then that causes them to think of themselves differently, including in terms of what processes to contribute to.
  • Some processes are accelerated more because society tends to cause people to contribute disproportionately much to those processes. E.g. because…
    • …society rewards contributing to some processes (with money, social approval (lack of punishment), legal permission (lack of punishment), etc.);
    • …society betrays or harms people in some ways, which affects their behavior (interests, hopes, ethics);
    • …society is inadequate regarding some process, creating an incentive to contribute to that process;
    • …society more indirectly shapes people, e.g. by instilling values.
  • For some processes, some degree of acceleration due to other first-order reasons will further compound into more (second-order) acceleration. E.g. because…
    • …the process uses first-order acceleration to attract more resources (money, people, brainpower, political will), e.g. because it becomes more lucrative, interesting, worthwhile, or socially desirable as it makes faster progress;
    • …the process has network effects, i.e. increasing returns to more people;
    • …the process self-improves, as a community.
  • Some processes are stimulated as responses to the existence of HIA. E.g. because…
    • …people want to intervene on the use of HIA itself (e.g. prevent or constrain it, gain access to it, impose it on others);
    • …people want to intervene on people who get HIA (e.g. recruit them to work on a process, persecute them);
    • …people want to intervene on the results of HIA (e.g. race to complete some project before HIA people intervene).

4.3 Effects of HIA involving multiple processes

  • One process might be directly causally downstream of another process. E.g.:
    • One process directly inhibits another process (e.g. by punishing it, removing rewards for it, or persuading people to not contribute to it).
    • One process directly activates another process (e.g. by recruiting for it, rewarding it, or persuading people to contribute to it).
  • Two processes interact indirectly. E.g.:
    • They compete over resources.
    • They push in opposite directions (e.g. on social opinion or regulation).
  • The relationship between two processes is altered. E.g.:
    • Direct relationships (activation, inhibition) are broken or amplified (e.g. regulatory escape).
    • One gains the upper hand over the other, winning out in competitions or conflicts.
    • Race dynamics are shifted, where one process gains the lead in time over the other.
  • Shifts in many processes cause follow-on shifts. E.g.:
    • Cumulative strain on a system, from many processes accelerating, causes it to tip over a threshold of collapse.
    • An especially nimble, fast-adapting process is able to cope exceptionally well with general multi-process acceleration, gaining a relative advantage over other processes.

5 Processes

This is a list indicating some of the processes relevant to AGI X-risk:

  • Red research (see the subsection “Red vs. Blue AGI capabilities research”)
  • Blue research (see the subsection “Red vs. Blue AGI capabilities research”)
  • Alignment research
  • Society in general, or more narrow bodies:
    • Doing well / poorly; abundance / scarcity
    • Being stable / unstable
    • Being wise / unwise; sane / insane
  • Making progress on X, for various X (medical research, technology, morals, etc.)
  • Conflicts (between various bodies)
  • For various X:
    • Cognitive empathy by people against X for people in favor of X or vice versa
    • Political will in favor of X or against X
    • Convincing people of X
    • For various bodies B (states, international coalitions, professional bodies, social strata):
      • Support of X by B (desire for, capacity for, or actual)
      • Regulation / stigma of X by B (desire for, capacity for, or actual)

6 Some plausible bad effects of HIA on processes

The following subsections list reasons to think that HIA would speed up / support risky processes more than it speeds up / supports derisky processes.

6.1 Speeding up Blue research

HIA would (a priori, in expectation) speed up all research. If progress on some research problem is more bottlenecked on very difficult ideas (compared to e.g. money, legwork, regulatory approval, etc.), then it will tend to be sped up by HIA more than another research problem that’s less bottlenecked on ideas. Therefore, at a guess, HIA would directly speed up Blue research relatively more than many other kinds of research (including Red research).

Smart people might tend to be most interested in endeavors that are individualistic, technical, ambitious, computer-y, and puzzle-y. So they’d tend to be drawn to AGI research. This effect might be currently added to by society’s tendency to not naturally offer ideal social and economic niches for very smart people.

As a basic note, we observe people already being directed to Blue research, so by default we expect that to continue.

Blue research might also have some second-order self-acceleration effects. E.g. there would be intellectual network effects, and maybe some self-improvement effects via better credit assignment and resource allocation internal to the field. These effects might be relatively weak because Blue research is fairly diffuse, but still substantive. On the other hand, there might be significant “coordination overhang”: there could be a threshold effect, where with some difficult new ideas, a large number of small siloed Blue research groups could coordinate with each other. Since there’s far more absolute Blue research than alignment research, there’s more such overhang for Blue research.

Blue research is especially bad, because:

  • It’s hard to regulate.
  • It’s what ticks the world closer to AGI in the long-run.

6.2 Speeding up Red research

Relative to Blue research, Red research is less bottlenecked on very difficult ideas, so it gets less of a relative direct speedup.

However, Red is likely to have strong indirect acceleration. Because of money and status incentives, Red research attracts people. Red research is likely to attract excess big piles of resources to AGI capabilities. It will probably continue attracting investment as it gets applied to more sectors of the economy, and it gets applied more as it progresses more. It also gains social cachet.

As with Blue research, we observe people being directed to Red research. This observation is even more indicative of trends for Red research in particular, because Red research has upticked a lot recently. That means people are still being directed to Red research even in a memetic environment that already includes a lot of warnings about AGI X-risk. In particular, this suggests that kids who benefit from HIA, growing up in a memetic environment with X-risk warnings but also a very prominent money incentive to do AGI research, might tend to work on Red research.

Red researchers might be especially prone and able to take agentic, conflictual stances towards efforts to avert AGI X-risk. That’s because they are more concentrated, have more resources at hand, and tend to be more anti-social and greedy. For example:

  • Red is logistically relatively easier to regulate than Blue because it involves large concentrated piles of resources. However, it’s harder to socially prevent through stigma because it has large money incentives (which tend to overpower weak or medium stigma). Red may be harder to get regulation passed about because Red is especially concentrated, and therefore can apply larger point-forces to push on legislation.
  • Compared to most other processes, Red may be more likely to strategically and effectively target HIA people for recruitment, thus capturing more of the gains.
  • The existence of HIA might spur especially Red research to have more sense of urgency and go faster, out of a fear of being replaced as AGI leaders by HIA people, or out of a fear of being prevented from doing AGI research by strategies from HIA people. Similarly, regimes might pursue AGI more urgently if other regimes are pursuing HIA and not sharing it, in a bid to not be overtaken.

People (the public at large; policymakers) could socially and legally push against AGI research. They’d first have to be convinced to do so. That process may be less bottlenecked on ideas, compared to AGI research. Instead it may be more bottlenecked on, for example:

  • Legwork explaining the danger of AGI, which we know how to do but takes a lot of work.
  • Time for people to orient to the danger of AGI (e.g. understand the danger, deal with feelings), and how to push against it (e.g. policymakers negotiating regulations). That process is mostly governed by people’s internal thoughts, rather than by new very difficult ideas, and most or all already living people won’t get much HIA and therefore won’t do this process faster.
  • Noisy interference from orthogonal processes. E.g. policymakers may be quite preoccupied with other concerns, or might be unable to coordinate with each other.
  • Targeted interference from opposed processes, e.g. concentrated lobbying by people with an ideological or financial motive to have AGI unregulated. This may tend to be advantaged by HIA, compared to concentrated lobbying from those in favor of regulation, since the latter get relatively less acceleration from HIA.

6.4 Nonlinear / race-condition regulatory escape

In general, processes that regulate AGI research are in some conflict with AGI researchers. The results of this conflict could be quite nonlinear, with a soft threshold effect where the ability of AGI researchers to carry on dangerous research could overpower the ability of regulators to prevent it.

Similar things happen with tax evasion and with regulation of pirating media.

Since AGI research is likely to be accelerated relatively more than regulation of AGI research, HIA would increase the likelihood of regulatory escape.

6.5 Alignment loses the race anyway

Even if HIA speeds up alignment, the plan of making an aligned pivotal AI still probably requires making AGI-potent capabilities advances. So, a fortiori, aligned pivotal AI would still probably lose the race against (unaligned omnicidal) AGI. So, the current trajectory is bad, and HIA doesn’t change that.

What HIA does do, is speed up that trajectory. So even if alignment and capabilities research got the same speedup from HIA, the overall effect would not benefit the chances of alignment beating AGI.

6.6 Intrinsic regulatory escape

HIA people, especially extremely smart ones, would in general be out of distribution. That could be because of selection effects, the specific form of HIA, or just because of the high intelligence itself.

Because HIA people are out of distribution, society would tend to be less good at regulating them in general, e.g.:

  • by being able to convince them that AGI is dangerous, using our current crystallized wisdom on that topic;
  • by instilling values and ethics;
  • by providing support (e.g. empathy, peers, good niches);
  • by understanding what they’re doing;
  • by detecting when they are mistaken / lying / deceiving / overconfident;
  • by being logistically able to carry out punishments for bad behavior;
  • by living up to their standards for “being a sane and good world that doesn’t need to be urgently, recklessly shaken up”.

6.7 Disrupting regulatory systems

In general, HIA could cause conflict. Conflict could destabilize systems. If systems are destabilized, they might be less able to regulate in general. Therefore, HIA could make it easier for AGI capabilities research to evade regulation. Examples:

  • HIA could cause deep social/political conflict over the use of HIA…
    • …thus causing people to not pay attention to AGI X-risk and gather political will to stop it.
    • …thus causing people to not relate sanely to their smart friends who are considering doing AGI capabilities research.
    • …thus weakening group sense-making in general.
    • …thus making it harder for us to convince people to do something about AGI X-risk, because we might only have skill / knowledge about how to do that given the current structure of group sense-making.
  • HIA could cause countries or groups of countries to fight with each other about the use of HIA…
    • …thus preventing single countries or groups from attending to AGI X-risk.
    • …thus preventing single countries or groups from coordinating across groups on international agreements to regulate AGI research.
  • HIA could cause shifts or disruption in group-valuing-systems.
    • E.g. people who didn’t benefit from HIA might lose faith in their own future, their say in their future, or their ability to influence HIA people; and by feeling helpless, they may stop having values or change their values because their values don’t know how to exert themselves in the new context.
    • So the group-values that would motivate regulating AGI might be weakened.

In general, many aspects of the current state of affairs will be somewhat at equilibrium, and in particular will be somewhat adapted to the current state of affairs. To the extent that the current state of affairs includes some ability to regulate dangerous technologies, that ability would be disrupted by fast shifts that move out of the regime of adaptation. [H/t so and so for this point.] Further, this adaptation would tend to be poor at benefiting from HIA acceleration, so it would tend to fall even further behind, leading to even more escape.

Note that this argument is a response to the reversal test, because it argues that the status quo is best.

6.8 Social values favor following local incentives

Generally, given society’s current set of values (that it instills in people), long-term altruistic payoffs aren’t incentivized. So in general, processes that only have long-term altruistic payoffs will receive less benefit from HIA. In particular, alignment research, the decision to stop doing AGI research, and the decision to regulate AGI research, are not incentivized.

That is a first order effect, where rewards and punishments don’t directly incentivize long-term thinking. As a second order effect, besides the direct effect, there’s an indirect effect where the fact that society is like this further breaks reasonable faith someone might have in society being good long-term. Since long-term society is a stag hunt game, this further disincentivizes long-term thinking; long-term thinking is partly incentivized because others are directly incentivized to do long-term thinking, but if they aren’t then that incentive is gone. E.g. if there’s a lot of fraud and injustice, that diminishes your expectation that being honest and just will pay off, because others won’t collaborate with you on your honest and just endeavors. This directly interferes with good endeavors. It also might indirectly interfere with good endeavors by more generally distorting people’s values. That happens because the general environment of bad incentives makes there be less expectation of a good long-term future in general, which makes people care less about, for example, omnicide. So AGI X-risk would seem less bad. In that mindset, the thrill and money from AGI research would be more tempting on net.

6.9 Less speeding up change towards better values

In general, for humanity to respond better to AGI is to some extent a question of values, broadly construed to include wisdom, sanity, calmness, patience, coherence, goodness, long-term thinking, altruism, empathy. Policymakers and the public would have to care about long-term global outcomes rather than short-term ones; AGI researchers would have to care about not harming others more than a small chance of large personal gain, and would have to have hope in the future without AGI.

Rather than being bottlenecked on ideas, value change may be relatively more bottlenecked on e.g.:

  • Time, legwork, and skill for persuading AGI researchers to stop. E.g. the skill of confrontation-worthy empathy is probably bottlenecked on several traits / abilities, some of which are not very IQ-related.
  • Time for people to process (e.g. propagating stated values into actions; investigating conflicts between stated values and implicit values; working out lines of retreat).
  • Decisions that people have to make about what they care about.
  • Time for attentional cycles / OODA loops / network effects to run their course.

6.10 Alignment harnesses added brainpower much less effectively than capabilities research does

In addition to just being more difficult, the conceptual structure of the problem of AGI alignment has some more specific disfavorable properties compared to capabilities, which are salient in this context. Alignment progress is less parallelizable, cascading, tractionful, and purely technical than capabilities. In more detail:

  • There’s generally much less traction in alignment research.
    • In other words, there’s less surface area to make progress.
    • In capabilities research, there’s many experiments to run and many ideas to try, which might work. There are partially-working systems which can be refined. You have fairly direct access to the problem frontier, because the frontier is always “what current systems can’t do well”. You can tell what works and what doesn’t.
    • In alignment research, most important problems mostly only show up in actual AGIs, so you don’t have access to the relevant objects and problems. Experiments don’t give much relevant information, and we don’t have the concepts to think about or deal with actual AGI. The problems are more philosophical (where we don’t know what questions to ask, and don’t have the ideas we’d need in order to ask the questions) rather than technical (where the problem is well-defined using well-understood ideas).
    • Because there’s more surface area, capabilities research is also more likely compared to alignment research to be able to incorporate advances in nearby fields. Sources like hardware (faster, cheaper computers), computer science (faster algorithms), neuroscience (ideas for functional algorithm pieces), and mathematics (medium-depth understanding of conceptually-thin aspects of minds) are more likely to have “a bit of a dot product with general intelligence” rather than “a large dot product”, and therefore are more likely to contribute to building [a functional mind, any sort] than to contribute to building [a specific sort of mind, a safe / corrigible / honorable / humanity-aligned mind]. Those nearby fields would tend to also accelerate from HIA.
  • Ideas in capabilities more easily cascade into more ideas.
    • For example, a new training method can be combined with many architectures and datasets; different systems can be combined as mixtures or pipelines; and so on.
    • So, one new idea unlocks a bunch more traction.
    • In alignment on the other hand, ideas often come in the form of understanding the problem better, e.g. understanding constraints on possible solutions. These don’t combine with each other as productively. So, new ideas don’t necessarily cascade into much more traction.
  • Capabilities is more parallelizable.
    • Since in capabilities research there’s more traction, surface area, combination, and cascading, it’s easier and more productive for many people to work in parallel on different projects.
    • In alignment, on the other hand, you have to understand each constraint that’s known in order to even direct your attention to the relevant areas. This is analogous to the situation with the \(\textsf{P}\) vs. \(\textsf{NP}\), where whole classes of plausible proof strategies are proven to not work. You have to understand most of those constraints; otherwise by default you’ll probably be working on e.g. a proof that relativizes and therefore cannot show \(\textsf{P} \ne \textsf{NP}\). Progress is made by narrowing the space, and then looking into the narrowed space. (I’m not sure this story is quite true in the \(\textsf{P}\) vs. \(\textsf{NP}\) case; e.g. were the natural proofs and relativizing proofs constraints discovered with serial dependence?)
    • So alignment has more serial dependence in its ideas, i.e. it’s less parallelizable. (It can still benefit from more researchers to do more searching; but they’ll tend to duplicate efforts more.)
  • Alignment depends more on cognitive traits that are less IQ-correlated than raw technical problem-solving.
    • E.g. alignment takes more
      • wisdom (tracking many constraints, taste in attention),
      • more patience/persistence/attentiveness, more humility (e.g. finding flaws in your own reasoning and ideas),
      • sanity (e.g. being able to ground ideas when reality doesn’t ground them for you; not going crazy from thinking about minds and weird self-referential things and scary things),
      • more security mindset,
      • and more urgency / agenticness (e.g. discarding interesting / lucrative threads that don’t contribute to solving the problem).
    • So, the gains in alignment from HIA would be somewhat attenuated (e.g. by being multiplied by other cognitive traits, or by being Amdahlly gated by other cognitive skills.)

For these reasons, alignment harnesses the gains from HIA much less effectively than capabilities research does.

A related point / another way to say this is that alignment benefits the most from HIA that makes there be more extremely-smart people, but does not benefit differentially from HIA that makes there be more somewhat-smart people, whereas capabilities research does benefit from more somewhat-smart people.

6.11 HIA people may tend to be transgressive

In general, there are several reasons to be cautious about using HIA. Therefore, people who make use of HIA technology might tend to be especially transgressive, i.e. ignoring reasons to not use it. (Cf. “Transgression” in “Potential perils of germline genomic engineering”.) Also, extremely smart people might tend to be transgressive in some ways. Being transgressive might correlate with other traits that lead to HIA people having transgression-related traits. Those traits would tend to make HIA people do more bad things, including pursuing AGI research.

Examples:

  • If there’s international pressure against using HIA, regimes that allow HIA would tend to also buck AGI regulation.
  • Regimes that coerce people to use HIA might also coerce them to do other things, such as AGI research.
  • People who use HIA on themselves might tend to be especially non-humble, anti-consensus, reckless, overly technooptimistic, selfish, or overconfident. Those traits would cut against them being convinced or pressured to not do AGI research.
  • Parents or subcultures who use HIA for their future children might tend to be especially non-humble, norm-bucking, technooptimistic, anti-regulation, overconfident, unscrupulous, or inclined to enlist their future children into their ideologies or causes. So their children might be attracted to or pressured into reckless technology development such as AGI.
  • If HIA is controversial, HIA people might be targeted for persecution. In response, they may become anti-social, individualistic, nihilistic, selfish, or reckless.
  • If HIA makes extremely smart people, those people might tend to be enamored with their own abilities and become overconfident due to being the smartest people around in a relative sense. In particular they might become overconfident in their absolute ability to navigate AGI X-risk, overconfident in their judgement about when norm-breaking is ok, or overconfident in general such that they aren’t open to advice/correction/perspective/epistemic-assistance from other people. They might therefore tend to choose to advance AGI capabilities recklessly.

7 Other arguments

This section gives other arguments that HIA increases AGI X-risk.

7.1 Concentration of power

If there’s a large early cohort of people who benefit a lot from HIA, they might form a somewhat cohesive community. This could have bad effects, including exacerbating some of the dynamics mentioned above. For example:

  • By syncing up with each other somewhat, they could have more correlated failures with each other, e.g. about epistemics around AGI. E.g. rather than this cohort ending up distributed through various processes, they could end up lopsidedly concentrated into AGI research. This outcome might be targeted by Reds.
  • By being more univocal (I mean, speaking in unison / unanimously), they might check each other’s flaws less.
  • Society might be overconfident in them, and they might be overconfident in themselves.
  • They might be especially able to incorrectly persuade society to allow them to pursue AGI.

7.2 HIA is unpredictable and therefore risky

Generally, we don’t understand the tails of cognitive performance, so we don’t understand what HIA would be like. If there’s some strong tendency in HIA people, that tendency would have a large effect in a world with HIA. Most changes are bad, so a priori large effects are bad. Since HIA is unpredictable, we don’t have a good reason to expect it to have good effects.

7.3 More capable but not wiser

As a very general argument, we might expect HIA that targets IQ or IQ-like traits specifically to be bad because it’s imbalanced. Specifically, it makes people who are more capable but not necessarily wiser, to the extent that wisdom is orthogonal to IQ. Since we’re in a regime where the unwise competent pursuit of technology is an existential risk, this implies HIA would be bad.

As an analogy, consider a 3-year-old. Suppose they suddenly gained the strength of an adult and the self-control of an adult. Let us ignore, for the sake of the hypothetical, all the probable badness that would entail for the child’s experience and development, and just ask, what are the direct kinetic consequences of that change? It wouldn’t be so bad: they have the self-control of an adult so they won’t do too much harm. But what if the child suddenly gained the strength of an adult, but not the self-control? This disproportionate change in abilities would be disastrous.

8 Acknowledgements

Thanks to many people for conversations about this, especially RK, DB, MS, VP, SE, TY.