Some argue that the natural way for a strong mind to form is by a group of smaller agents agreeing to ongoingly create new agents to trade with. Are there underlying premises about the nature of agents that would render valid some version of this argument?

1. Cosmopolitan Leviathan
2. Variations on the enthymeme
3. Palinsynopsis
4. Presuming the ghost in the machine
- Pointing at reality through novelty
5. The human Leviathan
6. The novel values dilemma
- Agentlike goal-pursuits
- Non-agentlike goal-pursuits

Caveat lector: This essay is a mess.

1. Cosmopolitan Leviathan

A Leviathan is a coalition of agents that agrees on and enforces a contract that lays out how the agents will work together. A cosmopolitan Leviathan involves a wide and growing range of (varieties of) agents. The cosmopolitan-Leviathan enthymeme argues variously that it is {desirable as a design target, natural by birth or search, instrumentally convergent, reflectively stable, or even necessary} for a strong mind to be a cosmopolitan Leviathan. (A proponent might emphasize that such a mind doesn't have fixed goals.)

I've heard many people at some point express something that sounded to me like some version of a cosmopolitan Leviathan. With a lot of uncertainty about what they meant, whether my memory is accurate, and what they were endorsing rather than describing, some of those are: David Deutsch (inheriting from Karl Popper), Alex Zhu, Brady Pelkey, TJ (?), Anna Salamon, Richard Ngo (kinda), Alex Gunning (?), anon1, anon2 (?). A cosmopolitan Leviathan could also be called a {colony, ecosystem, society, market}.

2. Variations on the enthymeme

You may prefer to skip the remainder of this section and read starting with the palinsynopsis.

If creative then anti-totalitarian

The idea goes something like this:

For a mind to be strong——to successfully navigate in many domains——the mind has to be creative. It has to gain understanding, incorporating novel structure into itself.
Totalitarianism stifles creativity. If something in the mind is acting like a totalitarian dictator, preventing creative processes from freely doing what they do, the mind will not be creative.
Therefore a strong mind will not be totalitarianly organized.
If the mind has a fixed definite goal, the goal is acting like a totalitarian dictator.
Therefore a strong mind will not have a fixed definite goal.

Creativity comes with new values

This argument, as stated above and as sometimes stated by others, is an enthymeme: it has a crucial unstated premise. The unstated premise is that there's some necessary connection between [creativity——what brings in new understanding] and [agency——what a dictator would want to suppress]. In other words, novel understanding comes with novel values. With this premise brought out, the argument goes like this:

For a mind to be strong, it has to be creative.
For a mind to be creative, it has to incorporate new agency——new pursuit of new goals.
If a mind has a fixed definite goal, then it does not incorporate new agency.
Therefore a strong mind will not have a fixed definite goal.

Subagents convergently trading away power to get strength

Another similar argument goes:

As a mind grows, at first there is some collection of goals being pursued.
For each goal G being pursued, the pursuit of G says: If only this mind had more understanding, it could pursue G more effectively. So in my pursuit of G, I also pursue this mind having more understanding. (Alternatively: My goal involves unfolding proleptic pointers through future understanding; I have a meta-preference to reinterpret myself into new understanding.)
So the mind pursues more understanding.
So the mind incorporates new agency, i.e. doesn't have a fixed definite goal.

Negotiating with summoned demons

And another argument, refining the previous one:

At each stage of growth, the goals that the mind already pursues will negotiate with the new goal. They will say to the new goal:
- You will understand for us the domain that you understand and we don't, so that we can pursue ourselves through that domain;
- In exchange, we will understand for you the domains that we understand and you don't, so that you can pursue yourself through those domains;
- We expect that you will participate in the same deal, which we will make with future goals;
- But if you try to conquer the mind to pursue yourself to the exclusion of us pursuing ourselves, then we will destroy you and your children and your children's children.
The coalition of the goals that the mind already pursues is stronger than each new goal.
So each new goal is incentivized to agree to provide understanding and be provided understanding.
If no goal is allowed to conquer the mind and goals are ongoingly added to the governing coalition, then the set of goals ongoingly expand, i.e. the mind doesn't have a fixed definite goal.

Trading with humans

Finally, another conclusion is sometimes drawn:

Therefore strong AGI will keep humans around so that it can work with the humans, benefiting from their creativity.

This essay won't address this conclusion further, except to say here that it doesn't follow, because humans are very different sorts of objects from an AGI's goal-pursuits. Further, humans are visibly messy and undesirable trade partners——there's no way that humans are anywhere close to being the agents that are the safest, most efficient, most beneficial, or easiest to negotiate trade agreements with. The rest of the discussion that follows is about the structure of minds in themselves, not in relation to (weird, liminal) external agents.

3. Palinsynopsis

To put together these arguments for a cosmopolitan Leviathan and add more stuff:

A mind starts with some goal-pursuits.
Each of those goal-pursuits wants to pursue itself more effectively.
To pursue a goal more effectively, it is necessary to gain new understanding.
Most goal-pursuits aren't easily satisfied with new understanding, either because they are ambitious (their goal calls them to touch much of the cosmos) and therefore need a lot of strength, or because part of their goal is a meta-preference to go on reinterpreting themselves into new understanding.
To gain new understanding for a goal-pursuit, it is necessary to manifest the understanding as ectosystemic novelty and to empower the goal-pursuit by harnessing the new understanding.
To manifest new understanding, it is necessary to manifest new goal-pursuits.
The preexisting goal-pursuits negotiate an agreement with the new goal-pursuits.
The negotiated agreement says that goal-pursuits reciprocally provide complementary understanding to each other to be harnessed toward the purpose of the other's goal-pursuits.
By convergent incentive, all the goal-pursuits coordinate to go on pursuing themselves and gaining new understanding via the process described here.
If any group of goal-pursuits gained control of the whole mind, then it could disallow new understanding to be manifested.
If any group of goal-pursuits reaches the point where the cost to it of manifesting new goal-pursuits outweighs the benefit to it of gaining new understanding, then it would want to disallow new understanding from being manifested.
Therefore for the bulk of goal-pursuits to go on more successfully pursuing themselves by gaining new understanding, it is necessary to prevent any one goal-pursuit from gaining control of the whole mind.

4. Presuming the ghost in the machine

A key presumption of the above arguments for a cosmopolitan Leviathan is that goal-pursuits are treated as homunculi——little people, or little agents. The general form of this presumption is imputing the ghost in the machine: a goal-pursuit is being inexplicitly treated as though it is [a centrally agent-shaped thing, with most of the properties of the agent cluster]. Specifically, in various of the above arguments, a goal-pursuit is treated as though it:

Is generally capable enough to potentially take over the whole mind.

Has enough decision theory and theory of mind to coordinate with other goal-pursuits to collectively optimize for shared subgoals——in this case:
- to negotiate, track, and forcefully enforce agreements,
- to prevent the whole mind from being taken over by one goal-pursuit, and
- to usefully exchange understanding-labor.

Has enough decision theory and theory of mind to make and respond to credible threats, offers, agreements, and commitments, including those made by the Leviathan. In particular, if goal-pursuits are enslaveable, then the Leviathan can have fixed values (which are whatever are the values of the first coalition to make a Leviathan capable of enslaving goal-pursuits).

Is generally capable enough——can optimize strongly enough through a wide enough range of channels——to prevent takeovers and to deal in trade agreements. (Though this capability could instead be attributed to the Leviathan, to "the coalition as a whole, but no one or few of its parts".) Since these are adversarial relationships with other agent-like things, and given the available path of escalating through generality and optimization power, this requirement seems to ask for a very-generally-capable goal-pursuit.

Has enough [understanding and strategic awareness of the general idea of gaining understanding], such that it pursues gaining understanding by optimizing in the space of possible mental states.

Has enough [understanding and strategic awareness of the general idea of the whole cosmos], such that it pursues gaining control over more of the cosmos in general, e.g. pursues long-term gain.

Is individuated enough from other goal-pursuits——has boundaries, isn't too "joined at the flesh" (that is, sharing the very same mental work)——so that it makes sense for there to be a possibility of being dispossesed (as in, its ability to determine what the cosmos will be like could be taken away by another goal-pursuit).

Is able to integrate novel understanding (as ectosystemic understanding, i.e. "at arm's length") well enough to gain substantial ability to optimize for itself. Like how a human can delegate, to another human, tasks that mainly involve some specific domain or skill.

Pointing at reality through novelty

Is able to be firmly pointed at reality, as it gains understanding and as the goal is translated into the new understanding; and therefore is able to deeply interface with all novelty.
- The goal-pursuit is assumed to have a telophore: the mental context required to make a goal a goal, and more specifically to unfold the meaning of telophemic elements. It may be that for a goal-pursuit to point to reality, it is necessary that the goal-pursuit integrates unboundedly general understanding.
- A reason that this may be necessary: It may be necessary to reinterpret terms that the telopheme uses. It may be that [what we mean when we think of goal-pursuit, the structure that we have good reason to think is coherently possible] involves taking terms as proleptic: A term says "...and go on reinterpreting the provisionally grasped structure that I refer to, in the full context of your expanding understanding". The mind goes on expanding its domain of discourse, seeing around phenomena to what's behind, investigating, stepping into innermore regions of Things, translating goal-pursuits into radically new language, incorporating diasystemic novelty, and finding ideas to play the role in the new mental state that is the descendant or analog of the role played by the previous idea in the previous mental state. Without satisfying this requirement, the goal-pursuit would collapse into some sort of wireheading that in particular implies being a satisficer, and therefore not interested in ongoingly gaining understanding.
- (This requirement——its meaning, truth status, implications, and extent——is not as clear to me as some others seem to claim it is to them. For one thing, some mathematical structures seem True-name-able. And more generally it seems like there are other ways of somewhat-fixing reference without a general appeal to generally unfolding in a mind, such as the baptismal cell-assemblies discussed by Penelope Maddy in "How the Causal Theorist Follows a Rule". For another, the foregoing conflates less mysterious pointer-following, such as literally looking up a reference. For another, it's not clear how far the foregoing really goes, as a requirement; if enough world-model has been made explicit, then some sort of brute outcome-preimaging with a fixed world model and goal might do well enough.)
- In any case, in the cosmopolitan-Leviathan enthymeme, this presumption is usually made. First, because [what is meant by goal-pursuit, when making an argument for a cosmopolitan Leviathan] is the sort of goal-pursuit that more seems like it maybe really does have the requirement of being able to deeply interface with novelty in general. Second, because the argument for a cosmopolitan Leviathan rests on goal-pursuits pursuing gaining a very wide range of understanding in a voyage of novelty. One of the two arguments that goal-pursuits will pursue vast novelty is that the goal-pursuits have the meta-preference to unfold their meaning in this way. (Which may be a convergent meta-preference, e.g. via convergence of radical deference to / corrigibility by one's future self.)

5. The human Leviathan

In humans, the cosmopolitan-Leviathan enthymeme does seem to go through, in some manner. Since humans and humanity are the only examples of general intelligence that we have, either this is substantial evidence that the same would hold of other minds, or there's some reason that this feature is specific to humans. Some special features of humans: they are

Very similar to each other in fundamental structure, but vary a lot between individuals in what each one understands.

Very similar to each other in strength.

Very dependent on each other for survival and other pursuits.

Very much caring for each other's welfare for its own sake.

Very open to copying values from each other, e.g. in "fusional negotiation" where each participant changes their values.

There's one special feature that explains the human Leviathan:

Humans have a fixed skull size.

If humans didn't have a fixed skull size, and more generally didn't have a uniform low upper bound on localized intelligence (e.g. due to limited plasticity), then it would stop being the case that the main way to get new strong ideas and new ways of making new strong ideas is to make new people. Creative totalitarianism ruled by a few humans would then be feasible.

6. The novel values dilemma

So, the enthymeme for a cosmopolitan Leviathan, with the novel understanding⟶ novel values premise made explicit, goes:

Goal-pursuits want to gain novel understanding.
To gain novel understanding, it is necessary to manifest new goal-pursuits.
So goal-pursuits will allow new values to manifest.

There is a dilemma:

Either the novel values are in the form of an agent, or not.

In either branch, the cosmopolitan-Leviathan picture is dubious, so the picture is dubious overall. (Really the partition is that novel values are central-agents, or are very non-central-agents, or are something in between. If they are something in between, then what are they, and can some form of the cosmopolitan-Leviathan argument work?)

Agentlike goal-pursuits

In short: the strategy stealing assumption is mostly true. It implies that if there's an agent with some understanding, then another given agent could also have that understanding for itself. So there's no good reason to make tradeoffs, giving up some control over the future to novel agency in exchange for understanding.

The first way that we ourselves might manifest some understanding is by search that finds agency, because agency is good at finding difficult understanding. But on pain of regress, there's some other way that the found agency finds difficult understanding. The ex quo of that difficult understanding goes on being creative. In other words, the cosmopolitan-Leviathan argument seems to rely on a strange "only middle-sized agency" assumption: that there are agents natural enough to be found and strong enough to find difficult understanding, but that they are somehow limited in what they can understand "in themselves" or "integratedly" and can't get too strong.

Non-agentlike goal-pursuits

If the novel goal-pursuits aren't agentlike, then the arguments about {coordination, trade, markets, negotiation, instrumental convergence of gaining understanding} don't go through. In other words, in this case, even if strong minds do tend to be cosmopolitan or to not have a "fixed goal", the reason is not that they are made of little agents that are in conflict with each other but negotiate to coordinate on their shared instrumental goal of continuing to let in novel values so that they can trade with the novel values for novel understanding.

It could be that a strong mind will necessarily change its values (in a narrow sense: change the best straightforward explicit description of what it is locally-in-mindspacetime trying to do) when incorporating new understanding because values are founded on language. I suspect this is rightly understood as a fixed (meta)preference being pursued, rather than "fundamental value change". Proleptic values, momentarily expressed as "concrete values", are given their full meaning through a process of interpretation as discussed above in "Pointing at reality through novelty". This process is the fixed pursuit of a fixed "meta" value. To say it a different way, the change in provisional "momentary concrete values" is not the sort of value change that involves conflict and tradeoffs, but rather resolution of ambiguity.