Does novel understanding imply novel agency / values?
To have large relevant effects on the world, a mind has to understand a lot about the world. The mind has to have a lot of the structure of the cosmos (the entirety of the world, in any aspect or abstraction) highly accessible to itself for use in skillful action. To understand a lot about the world, the mind has to gain a lot of understanding that it didn't have previously. When a mind gains understanding, that's a change in the mind. Does that change have to include a change to the values of the mind?
- 1. Terms
- 2. Reasons that novel understanding implies novel values
- Understanding involves internal control
- Understanding involves outgoing control
- Understanding requires agency to...
- Understanding potentiates agency
- If novel understanding is found by indirect search, that search finds agency
- Novel integrated understanding involves new language and values are founded on language
- 3. Dilemma: value drift or conflict
Thanks to Sam Eisenstat for related conversations and ideas, e.g. provisionality.
1. Terms
This essay uses terms less than perfectly carefully, and makes a lot of very broad statements. Besides laziness, a hope is that this will expose, by a sort of parallax, what meanings would have to be provided by a better set of concepts gestured at by the terms used in the broad statements. Inconsistencies in how words are used in statements should be more prominent if the statements are more absolute, and prominent inconsistencies in statements that one cares about might spur questioning that gets on the way to better concepts.
In particular, here "values" is a pre-theoretic term, and refers to a very broad, unrefined idea. Something like "control": any way that elements of a mind en-structure the mind, or other elements of the mind, or the world. Anything that's usually called "values" is also some kind of "control". A thermostat controlling the temperature of the room "has values" only ambiguously and at a stretch, but it's definitely exerting control. What's usually called "values" has specific salience beyond just being some kind of control, because part of what "value" means is "that sort of control which is exerted by minds that have large relevant effects on the world", and that sort of control is stereotyped (for example, it can't be "just like a thermostat", as thermostats do not have large effects on the world) and so probably has some understandable structure.
This essay also doesn't carefully distinguish mind from agency. "Mind" is about intelligence, thought, concepts, understanding, structure, investigation, truth; "agency" is about coherent action, making things happen in the world, goals, strategy, organizing towards a purpose, coordinating, deciding. Agents have values. Mind comes from agency; an agent has a mind.
2. Reasons that novel understanding implies novel values
(These items aren't exhaustive or mutually exclusive.)
Understanding involves internal control
An idea has some internal structure——parts or aspects or something, which relate to each other not completely arbitrarily. The not-completely-arbitrary-ness of the internal relationships of the idea constitute some sort of control. This internal control could be as simple and maybe value-free as the control exerted by the CPU on an array stored in memory when executing a sorting algorithm, or as complex and value-laden as the relationships between members in a research collaboration that understands something no one else understands.
In the latter case, with humans, the relationships are usually not very related to the idea itself, but sometimes they are. For example, think of a strategy in a team game that's embodied as behavior patterns distributed across team members specialized to play different roles, where the specialized adaptive interplay between members is an integral aspect of the strategy. Or, think of the idea of GAN training; an implementation or embodiment of that idea, or of the idea of adversarial training in general, involves not just internal control, but internal conflict as a necessary aspect.
Understanding involves outgoing control
For a mind to use some understanding of X (denoted «X»), the understanding of X has to cache out in terms of predictions, descriptions, propositions, explanations, designs, recommendations, plans, attentional directions, or some other content usable by other elements of the mind which don't themselves understand X. This constitutes control flowing from «X» to the rest of the mind. Also, this control is to some extent potentially "value-laden": since the elements of the mind that make use of «X» don't fully understand X themselves on their own, they can't verify that the control exerted by «X» is strictly only providing a "neutral" answer to a question, so «X»'s interaction with that element might constitute value-laden control.
(If an element making use of «X» also itself understands X, then it includes a conceptual Doppelgänger of «X», in which case we hadn't comprehensively identified the mind's understanding of X as being just «X». It might not be sensible to comprehensively identify «X», but the point stands.)
It's possible to check understanding, at least to some extent, for example by rediscovering it. E.g. a mind could rederive Kirchoff's laws by observing circuits or by reasoning about electrons and voltages, if it wants to verify that its «Kirchoff's laws» aren't faulty or deceptive. However, that's expensive, and it may be that there are necessarily free choices in the understanding that are not by default constrained away by this sort of spot check.
Understanding requires agency to...
Understanding requires mind to give synchronic meaning to the understanding and to diachronically articulate the understanding. Mind comes from agency. Hence understanding is concomitant with agency. So novel understanding comes with agency, which might be novel agency.
§ ...give synchronic meaning to the understanding
Having a computer program running on your computer that computes a 3D simulation of a coffee cup is some kind of knowledge, but it's different from a human's idea of a coffee cup. Without more work, the computer simulation doesn't interface with abilities to pick up, modify, discuss, categorize, or learn about the coffee cup. That's the synchronic meaning: how the idea fits in to a mental context.
Also, imagine trying to understand topological compactness without understanding sequences or open covers or metric boundedness or closedness, and without being familiar with space more pre-theoretically. Understanding something requires understanding other related things, and the source of that prerequisite understanding is mind.
In general, understanding is something that mind incorporates. Without mind to predict, describe, manipulate, update, question, compare, etc., there's no understanding. See Gemini modeling.
As a special case, some understanding makes more or less explicit function calls to mind itself. That sort of understanding requires mind as a whole. For example, a method of thinking such as "ask an easier related question" presumes something that can ask, investigate, and answer questions, and extract useful analogies between questions. As another example, empathic modeling calls on an agent in order to model another agent as being like this agent, but tweaked; e.g. if Alice sees Bob walking around with his eyes closed, she can make sense of his shortened, less-dynamic gait, and his held-up arms, by asking how she'd behave in his shoes——what actions her agency would produce.
§ ...and diachronically articulate the understanding,
An understanding «X» of X points at the nexus of X. It may be that almost always «X» doesn't fully encompass X. That is, it may be that «X» is provisional: there are more contexts in which X is relevant in new ways not captured explicitly by «X». If «X» is provisional, then «X» is only prepared for those new contexts by virtue of the mind that has «X» and that will articulate (elaborate, explore, develop, refine, expand, grow, generate, refactor) «X» into more of the nexus of X.
For some Xs, maybe for most or even all X, the nexus of X is non-finite, in which case no «X» fully encompasses X. That is, it may be that «X» is essentially provisional: there are always more contexts in which X is relevant in new ways.
To the extent that «X» is provisional, it calls on agency to follow the pointer «X» into the nexus of X and articulate «X» in new contexts, e.g. by investigation, thought, interpretation, and application.
§ ...hence understanding is concomitant with agency.
Since understanding requires agency to give it synchronic meaning and diachronically articulate it, wherever there's novel understanding, there's also agency.
Sometimes the agency is not novel. E.g. when a human comes up with an idea and integrates the idea into their thinking, the agency involved is the human's, and it's roughly the same agency both before and after they have the idea. Usually though, for humans, the agency is novel: we learn most of our understanding from other people, and their agency is novel to our own agency.
In a scenario where the rate of novel understanding outstrips the rate of integrating the understanding into preexisting agency, the novel understanding must be accompanied by novel agency, if it's to be articulated and given meaning. In particular, if the novel understanding is to have large effects on the world, e.g. if it is to be wielded, it has to be integrated into some agency.
Understanding potentiates agency
Agency requires mind and mind requires understanding, so new understanding opens the way to new capabilities for an agent. When mind finds understanding, the seeking selects for understanding that's useful for pursuing the agent's goals. But, since understanding is useful in more contexts than just the context it was sought for, new understanding is also useful for pursuing other goals. So new understanding contributes to other goals by making it easier for agencies with other goals to succeed.
For example, it was scientists curious about the world and hopeful about abundant energy who first sought understanding about material that emits nuclear radiation. That understanding potentiated world-destroying weapons. The methods of thinking used by [researchers in the Manhattan project and researchers at KB-11] were handed down from past engineers and mathematicians, having been developed for other, less destructive purposes.
If novel understanding is found by indirect search, that search finds agency
To solve problems without ourselves getting into the details and gaining our own new understanding, we can run computer programs that find solutions to problems. The harder the problem, the more novel understanding (not contained in the problem statement) is required to solve it. The more novel understanding a computer program finds, by induction and simplicity, the more likely it is that the computer program is able to go on finding understanding, maybe unboundedly. Things that find understanding near-unboundedly are minds, which have agency. Compare the possibility that the computational-universe simplicity prior is malign.
Novel integrated understanding involves new language and values are founded on language
(For "language", you could read "ontology" or "integrated totality of concepts".)
Values——directions in the space of possible worlds that an agent robustly pushes the world——depend on the notion of world, which depends on the language of the agent. For example, a utility function has a domain, which is a set of possible worlds, sometimes taken to be possible settings of some variables, or logical theories in some logical language consistent with some axioms.
Novel understanding that's integrated into a mind constitutes a change to the mind's language. Values expressed in the old language might be translatable into the new language, but there may be ambiguity or free choices in how to do that translation. Compare ontological crisis. So when the mind's language changes, the domain of worlds conceivable by that agent changes, so the space of possible values changes.
It's not clear whether novel integrated understanding implies a change to the agent's values. Why can't you allow agency in the old language to wield the new understanding? Why are you forced to have an opinion about which of the newly apparent possibilities are better or worse? E.g., allow the new understanding to recommend action, but only follow that recommendation when the new understanding would describe the consequences in a way that has an unambiguous meaning in the old language, in the spirit of conservative concepts.
3. Dilemma: value drift or conflict
Understanding that is novel to an agent A can be more or less integrated into A. If the novel understanding is completely integrated into A, then A's values have shifted. If the novel understanding is almost completely not integrated into A, then it came with additional agency B. A and B are in conflict. So to get a large amount of novel understanding, A has to either accept a lot of value drift or be put in a lot of conflict. This could be called a "value drive": ongoing creativity exerts a driving force to shift the agent's values.
For example, Solomonoff induction can find any novel understanding. It uses a fixed language. Without further work, the novel understanding that SI finds is not integrated into the mind that programmed the SI, and may constitute agency in conflict with the programmer.
(Probably the answer is that integrated understanding is desirable, and our values must be compatible with that somehow, and AGI alignment is mainly about integration rather than mechanism design.)