1. Background

This is the second essay in a series under the title "A hermeneutic Movement of the idea of values". This essay is a mix of old notes and some new meditations. The previous essay is "Ah Motiva 1: Words about values".

2. Capable minds with specifiable effects

The starting point of AGI alignment is the question of how to make a mind that is highly capable, and whose ultimate effects are determined by the judgement of human operators.

In other words, the mind should empower humans. It should possibilize a lot for the humans. There should be some channel which the human operators can handily use to determine the ultimate effects of the mind.

3. The idea of values is promissory

3.1. Using the idea of values to specify effects

This desired situation—where we understand minds enough to make such a mind and to specify its effects—provides a criterion for values through the formula:

The mind's values should be such that we can specify the direction of the mind's ultimate effects.

This formula both provides a criterion for the mind's values, and also provides a criterion for our concept of values. It says that our concept of values should be such that the formula is useful. E.g., our concept of values should be such that it makes sense (is relevant, meaningful, clear, useful, testable) to say of a mind that its values are such that we can specify the mind's effects.

If we assert this formula, we are conjecturing that our current idea of value (our pointers, intuitions, partial concepts, connections) is such that it is a useful approach to ask: How can a mind's values be, so that we can specify the mind's effects? The conjecture is a promissory note that says: The concept of values will be revised, starting with our present idea of it; and that process of revision will homotope the idea to a good ensemble of concepts.

With this formula in context, our present concepts about values are like conjectures that say: This concept is a good starting point for finding useful concepts in the region of values.

3.2. Human wanting

Our present concepts about values come mostly from our familiarity with humans and human wanting. See "Human wanting", especially the dimensions of wanting laid out by the variety of human wanting.

3.3. Aside on ideas vs. concepts

Here "concept" is as synchronic as can be, and "idea" permits diachronicity. A concept is what's pointed at by a description of mental elements as they presently relate to other mental elements. An idea is less determined, and to follow its reference would require resolving some provisionality. An idea includes present concepts, the past history of those concepts, and the future ensemble of concepts that the present concepts will transform into by role-isotopy (that is, by being replaced with more suitable concepts for the future analogs of the present roles presently played by present concepts).

We're interested in not just the present concepts of values, but the broader idea of values—including its role, the context that gives it its role, and its future manifestations as concepts.

3.4. Promissoriness asks for holding off on demarcation

There's a natural motion in conceptual analysis: Make a definition, and make it precise. It won't capture everything, but it will be easier to analyze; it will point at a smaller class of examples, it will direct attention to clear central examples, it will be more fixed as the analysis goes on, it will be more amenable to formal analysis.

Demarcating the subject of discourse in that way is useful sometimes. Its key disadvantage for us is that a demarcated notion is like a small patch of a net, which is to say, not a net. A demarcated notion can't play the role it would have to play in a successful hermeneutic net.

Instead of a demarcated notion, we have several provisional concepts, which together stake out a region. These concepts might shift around over the course of our inquiry, and might expand outward from this region. The spread of flags planted in the ground might reveal countours of a conceptual wormhole, leading to a wormhole in the inquiry.

In other words, restricting attention to one notion of value would sterilize the growth of the idea of values that was promised.

4. Are values essentially diasystemic?

A proposition:

Values are essentially diasystemic.

This is really a family of claims: For any notion of value, and for any particular value that fits that notion in some mind, that value runs across the grain of the mind. The value is diasystemic relative to almost all other mental elements: It doesn't fit alongside other elements as another element of the same type that interfaces like other elements do. Rather, it touches everything—more precisely, a value is constituted by everything else in the mind.

4.1. Example: Self-regenerating friendship

A friendship is compiled down into patterns of attention, such as noticing the friend walking down the street more easily than noticing a faint acquaintance. And these patterns of attention can recover a damaged friendship: Noticing a friend who has been drifting away, walking down the street, brings the friendship back up—the pattern serves as a signpost to recover caring. Or, doing an activity that's better with the friend will call for the exertion of skills, and those skills, in their exertion, will call for their accustomed resources, which include the friend. The friend, being called for, is valued, and so the friendship is valued.

Habits and memories and skill are supported by the physical environment, as in a chef orchestrating a complex dish in deft reliance on supplies and equipment in their proper place, or the promises that haunt a childhood home. Analogously, each [mental element that constitutes the value] has, as its supportful dwelling, the rest of those mental elements that constitute the value, and reciprocally supports those other elements, reminding them to be what they are. If all the "mere models" related to the friend were deleted, how would the friendship still be there? On seeing (with what recognition?) the friend, there's an emotion of warmth—but that might fade quickly, with no traction pulling one into the old, now-unfamiliar patterns of togetherness. It's conceivable to regrow the whole mode of friendship from just a single emotional event, but that's not how humans work (except when plummeting in love).

4.2. Human values are diasystemic

Isn't this just a quirk of our lowly origins as sentient mud? Wouldn't a more cleanly architected, more coherent mind have its values factored out from its possibilization?

Maybe! A more modest claim, illustrated by the above example, is visibly true:

Human values are presently diasystemic.

In other words, the "flesh" that constitutes "one human value" is not something like one chunk of brain matter, and is not something like one concept, and is not something like one plan. It's not one envisioned world, or even a pamphlet full of principles. It's not one element that comes with a familiar and comprehensive interface. It's not demarcated from other elements, but rather it is constituted by many elements, as some higher-order organization of aspects of those elements.

To some extent this is temporary. It could be written down, and placed in some trusted crypt, that so-and-so is a good friend. If such a message is trusted, it could on its own be enough to say the value, and that saying would somewhat more densely determine the value compared to the naturally messy distributed caring.

4.3. Values require reference

The question stands, are values nevertheless to some extent essentially diasystemic?

4.3.1. Example: Blueberries

For example, I reach out and pick up some blueberries. This is some kind of expression of my values, but how so? Where are the values?

Are the values in my hands? Are they entirely in my hands, or not at all in my hands? The circuits that control my hands do what they do with regard to blueberries by virtue of my hands being the way they are. If my hands were different, e.g. really small or polydactylous, my hand-controller circuits would be different and would behave differently when getting blueberries. And the deeper circuits that coordinate visual recognition of blueberries, and the deeper circuits that coordinate the whole blueberry-getting system and correct [errors in the performance of blueberry find-and-pick-upping] based on blueberrywise success or failure, would also be different. Are the values in my visual cortext? The deeper circuits require some interface with my visual cortex, to do blueberry find-and-pick-upping. And having served that role, my visual cortex is specially trained for that task, and it will even promote blueberries in my visual field to my attention more readily than yours will to you. And my spatial memory has a nearest-blueberries slot, like those people who always know which direction is north.

It may be objected that the proximal hand-controllers and the blueberry visual circuits are downstream of other deeper circuits, and since they are downstream, they can be excluded from constituting the value. But that's not so clear. To like blueberries, I have to know what blueberries are, and to know what blueberries are I have to interact with them. The fact that I value blueberries is founded on my being able to refer to blueberries. Being able to refer to blueberries is founded on my being able to manually investigate the world. Certainly, if my hands were different but comparably versatile, then I would learn to use them to refer to blueberries about as well as my real hands do. But the reference to (and hence the value of) blueberries must pass through something playing the role that hands play. The hands, or something else, must play that role in constituting the fact that I value blueberries.

4.3.2. The concrete is never lost

In general, values are founded on reference. The context that makes a value be a value has to provide reference.

The situation is like how an abstract concept, once gained, doesn't overwrite and obsolete what was abstracted from. Maxwell's equations don't annihilate Faraday's experiments in their detail. The experiments are unified in idea—metaphorically, the field structures are a "cross-section" of the messy detailed structure of any given experiment. Abstraction is a gain, not a loss.

The abstract concepts, in order to say something about a specific concrete experimental situation, must be paired with specific concrete calculations and referential connections. The concrete situations are still there, even if we now, with our new abstract concepts, want to describe them differently. In the same way, a value, as an element that is not tethered to one specific situation, has to interface with specific situations—via reference.

4.3.3. Is reference essentially diasystemic?

If so, then values are essentially diasystemic.

Reference goes through unfolding.

To refer to something in reality is to be brought (or rather, bringable) to the thing. To be brought to a thing is to go to where the thing really is, through whatever medium is between the mind and where the thing really is. The "really is" calls on future novelty: the "really is" is the Cavern that the Thing is, which calls for stepping into it. See "pointing at reality through novelty".

In other words, reference is open—maybe radically open. It's supposed to incorporate whatever novelty the mind encounters—maybe deeply.

An open element can't be strongly endosystemic.

An open element will potentially relate to (radical, diasystemic) novelty, so its way of relating to other elements can't be fully stereotyped by preexisting elements with their preexisting manifest relations.

Does this imply that open elements are diasystemic?

4.3.4. Example: Parliament

Say we value a parliamentary system of government. That is, we want to make decisions according to a parliamentary process, including decisions in very new situations. When there's some new issue to deal with, we want to discuss the problem, hear perspectives, try to persuade each other, try to understand the constraints and possibilities, get to the truth of things, aggregrate preferences, and negotiate plans we can agree to cooperatively follow. There are rules about who gets to say what when, and who gets what control over decisions.

Is a parliamentary value diasystemic? Not really. Parliamentariness doesn't pervade many regions of the mind in the way that {information theory / Bayesianism, computational complexity, a new ion pump, a convergently discovered but not yet unified algorithm, a sound shift, or a major code refactor} pervade many regions of the mind.

Well, it could govern many regions of the mind, and call on many regions of the mind, but that's not diasystemic existence—it isn't overlapping in structure with many diverse elements. It is like a table on which many mental elements are resting. The table touches many mental elements, but those elements are separate from the table. The table is a container or backdrop or support or neighbor for many elements; it constrains many elements (from falling), like the Parliamentary system constrains many agents in many contexts through rules of procedure. But those agents are more or less left alone, besides being placed in the container.

4.3.5. Stable process values are radically open

Wanting to have a parliamentary system of government is an open value. It doesn't have a domain of value that's already explicitly given. Wanting to be parliamentary is not a value like "there should be such and such government projects" or "we should make such and such laws" or "we should engage in such and such military conflict".

Instead, the value of wanting a parliamentary system refers to any possible future domain, through the intermediary of the parliamentary process of accommodating novelty. It's not natively represented or well-described as a preference ordering on worlds, though a preference ordering on possible worlds could be backed out ex post facto by looking at the outputs of the parliamentary process. (Though one would also have to back out a description language for possible worlds; and to do this in advance, one might have to simulate-to-equivalence the parliamentary proceedings.) It doesn't say that the budget should be this way or that, or that this or that should be illegal or mandatory or regulated. It says that those questions should be answered by a parliament.

Wanting to have a parliamentary system of government is a process value, which is a subspecies of metavalue. The value has us wanting to deal with novelty according to some given rules. The rules are about a system that can deal with novelty. The system can spread caring into novelty. It can care about a world transformed, refined, and expanded by new understanding. The caring is transported across conceptual schemes. Values are reinterpreted in new language. Incorporating novelty into the mind also incorporates the novelty into the mind's caring.

A process value is a value that says what process a mind should use, e.g. to make a decision or to modify itself. A process value that doesn't "leap along with" increments of the mind's voyage of novelty will be "left behind". The novelty will be alien to the process value. The novelty will be wielded by, or even come along with, other values. Those other values will have no reason to cede any control to the process value. In other words, a non-open process value will be usurped, and so it isn't a stable value of the mind.

Imagine a parliamentary government that can't understand a new technology, and the new technology is strong enough to recenter power away from the parliament. The parliamentary process might seem comprehensive from the inside; it could handle anything that's brought up to it. But failing to extend its control through the new technology, the parliament won't defend itself against whatever values do extend their control through the new technology. Those other values will disempower and replace the parliament.

It might be that the new technology is not mediated by modular, explicit artifacts like nuclear weapons—but instead, the new technology is a new way of thinking. To not be left behind, the parliament would have to be able to incorporate the new way of thinking. Even trying to stamp out the new way of thinking, keeping it as ectosystemic novelty, would require the Understanding Police. The Understanding Police have to understand what the new mental technology looks like and how it participates in mind and optimization; otherwise, the new way of thinking can hide behind decoys, alienness, and false intentions. The parliament has to be open to not just new artifacts, but new ways of thinking. It has to be radically open.

4.3.6. Radically open elements are transsystemic

So reference and a parliamentary value are each non-endosystemic. A parliamentary value is not diasystemic. What sort of novelty is a parliamentary value, then? It is transsystemic: it points from within the system to beyond the system. Transsystemicness is a sort of complement to provisionality: a provisional element ought to be treated as though it might be revised; a transsystemic element provides the driving force for novelty, e.g. the novelty of a conceptual revision. Provisional elements may be revised; transsystemic elements may do the revising or bear the revising (as a channel bears the water).

For example, the elements that constitute a mind's creativity are transsystemic. E.g. curiosity: it points from within the mind towards what's beyond the mind, structure that the mind hasn't encountered or doesn't understand or hasn't made explicit. (Though in humans, curiosity is also diasystemic—it can bubble up obliquely from its hiding place in many different regions, e.g. you can become curious about many different domains and in many different moods, but in a way that is intimately "of" those domains and moods.) If you're curious about ants, then your concepts about ants are provisional because your curiosity might revise your concepts about ants by driving investigations.

Reference is essentially radically open, hence essentially transsystemic. Stable process values are essentially radically open, hence essentially transsystemic.

4.4. Self-interpretive metavalues produce diasystemically novel values

4.4.1. Self-interpretive creativity

A metavalue creates, destroys, or otherwise modifies values. E.g. by clarifying them, tweaking them to be about something different or with a different valence, or generalizing them so they apply to a world expanded by novel understanding. A metavalue is a species of creativity.

"Interpretation" is here a pretheoretic idea. The examples below will gesture at interpretation. To interpret X is something like receiving X as though it's a message sent from a mind, or more generally as though it's an expression of something in a mind. Interpreting a mental element means "recovering" something, as if "from behind" or "from within" the element. E.g. if you read a sentence, you want to then interpret it to get the propositions expressed by the sentence—otherwise all you have is a sequence of letters. E.g. if you feel a dislike for someone, you want to interpret it—as a command to get away from them, or as a hypothesis that there's something bad about them or about how you are when you're with them.

A self-interpretive creativity is a creativity whose action is shaped by interpreting the mind, so that the novelty that the creativity produces will in some way incorporate interpretations of the mind. (Here "self" refers to the mind, not the creativity; maybe it should be called mind-interpretive?) Since the novelty incorporates something recovered from preexisting elements, it cuts across those preexisting elements. For example, suppose I conclude that God gave me a voice so that I can sing. This conclusion is very bound up with the concept [a voice as a musical instrument]. That's one aspect of my voice, not the whole thing. Something's been abstracted from my voice, and incorporated as the key idea of a new value—the new value of singing. The abstraction is a gaining, not a loss or restriction or impoverishment; the idea of the abstraction (my voice as an instrument) wasn't there explicitly before (in how I oriented to my voice simpliciter), and now it is. The idea definitely involves the voice, but cuts across the voice in a novel way, showing something new in the voice. The self-interpretive creativity—in this case, viewing myself as God's creation, and viewing my elements as elements put there by God for some purpose—has added something novel to me (my voice as a singing instrument) to a preexisting element (my voice), which is bound up with the preexisting element but cuts across it; and the self-interpretive creativity does this to many elements.

To say it another way, a self-interpretive creativity produces novelty in the form of novel relations of the preexisting elements. The preexisting elements are remade; they take their place in the new mind that gives them a new role. In other words, interpreting an element places it into a new context. The element being placed into a new context constitutes novelty. And, that novelty is skew to the preexisting element as it was previously. Before, the voice was for speaking; now it is also for singing, where the "for" is a novel relation.

Interpreting one element requires the context provided by other elements. So interpretation incorporates structure from many elements. Interpreting a difficult message requires a lot of reading, so interpretation interprets many elements. Interpreting the whole self—the whole mind—involves reading many elements. For example, sometimes understanding what a human meant by a message requires understanding a lot about zer—zer history, zer goals, zer's common-knowledge stance with respect to you. It also requires the context that you provide—the context for gemini modeling the element.

So mind-interpreting creativity touches and incorporates many elements, and does so in a way skew to those elements's preexisting relations. That is, mind-interpreting creativity produces diasystemic novelty.

Since metavalues are a species of creativity, mind-interpretive metavalues are a species of mind-interpretive creativity. Therefore mind-interpretive metavalues produce diasystemically novel values.

4.4.2. Example: FIAT values

A human is a book that can be read. The hieroglyphic hand, and the visual cortex finely tuned like a watch to recognize colorful patterns, signifies that mangos are for eating. One could say that God gave us {hunger, a fear of dangerous things, and an instinct to protect children} and gave us {object recognition, fine motor control, mental workspace broadcasting, Bayesian updating, and causal analysis}, in order that we would survive and reproduce—which is therefore our purpose. God did not literally create us, but the resulting motion of interpretation mostly makes sense.

See FIAT for more examples.

The FIAT metavalue interprets mental elements as expressions of the striving of a hypothetical stronger mind. A mental element is a sort of failed attempt, a deficient version of a corresponding element in some hypothetical stronger mind, or an aspirational beginning of some stronger pursuit. A mental element therefore gestures at a stronger pursuit. FIAT adopts that larger pursuit as a goal.

4.4.3. Example: Corrigibility

From "Hard problem of corrigibility":

Reason as if in the internal conjugate of an outside force trying to build you, which outside force thinks it may have made design errors, but can potentially correct those errors by directly observing and acting, if not manipulated or disassembled.

A corrigible mind might figure out what to do in some situation. Then it thinks: I've figured out what to do. I've figured out a plan that, if I execute it, will result in good outcomes. But, the presence of this plan and its justification in my understanding—and the process that generated the plan and its justification, including my understanding of what counts as good outcomes—is not just me doing my thing. Rather, it's the downstream result of [the source of good agency], doing its thing, but in a flawed way. I'm the result of a flawed attempt to create a delegate. The fact that this plan is in my attention along with a judgement that it's good to execute might mean that it's good to execute, but it's also likely to mean that the process of the good agency building a delegate has gone wrong somehow in a way that produced this plan.

A corrigible agent interprets itself, its elements, as flawed—as an attempt by the humans to do something difficult. It interprets its goals as mere proxy-goals, subgoals or experiments towards some unknown supergoal; it interprets its pursuit of goals as Goodharting.

4.4.4. Example: Extending identity

It might be natural to interpret [metavalues that involve an agent identifying with other agents] as being self-interpretive metavalues. A mind with such a value looks at itself and asks, if all this is just a part or instantiation of a single agent that's shared across such and such minds, what agent is that? What is the agent structure that is the intersection of the agent structures of all these minds, and is really the core of each of them? I'm really that agent.

E.g. loyalty to one's creator; acting as though you are those other agents that use the same decision theory as you or are otherwise "the same" as you; fusing values with other agents that you can negotiate with, or understand well enough to coordinate through mutual trust; acting as though you're behind a veil of ignorance or updatelessness; corrigibility in the sense of viewing yourself as part of a whole that extends across yourself and humans.

Search This Blog

Ah Motiva 2: Relating values and novelty