A high-level confusion that I have that seems to be on the way towards understanding alignment, is the relationship between values and understanding. This essay gestures at the idea of structure in general (mainly by listing examples).

Why do we want AGI at all?

We want AGI in order to understand stuff that we haven't yet understood.

(This is not a trivial claim. It might be false. It could be that to secure the future of humane existence, something other than understanding is necessary or sufficient; e.g. it's conceivable that solving some large combinatorial problem, akin to playing Go well or designing a protein by raw search with an explicit criterion, would end the acute risk period. But I don't know how to point at such a thing——plans I know how to point at seem to centrally involve understanding that we don't already have.)

1. Elements and structure

Understanding implies some kind of structure. (This is a trivial claim, or a definition: structure is what a mind is or participates in, when it understands.) Structure is made of elements. "Structure" is the mass noun of, or continuous substance version of, "element". The point of the word "element" is just to abbreviate "any of that pattern-y, structure-y stuff, in a mind or in the world in general".

Elements. An element (of a mind) is anything that combines to constitute the mind, at any level of organization or description.

Examples of elements. Any instance within a mind of any of the following categories is an element: features, aspects, properties, parts, components, subagents, pieces, inputs, algorithms, code, processes, concepts, ideas, skills, methods, procedures, values, goals, architecture, modules, thoughts, propositions, beliefs, probabilities, principles, rules, axioms, heuristics, plans, operations, connections, associations, metaphors, abstractions, memories, arguments, reasons, purposes, modes, emotions, tendencies, organs, ingredients, functions, dynamics, structures, data, types, languages, proofs, justifications, motives, images, searches, knowledge, computations, rewards, reinforcement, specifications, information, intuitions, ideologies, protocols, stimuli, responses, domains, gradients, objective functions, optimizers, satisficers, control systems, basins of attraction, tasks, attitudes, stances, dispositions, words, terms, definitions, nexi, drives, perceptions, grammar, criteria, possibilities, combinations, categories, inferences, actions.
How elements are. Mental elements overlap, crisscross, lie on spectra, control, use, associate with, repel, and super- or sub-vene each other; are created, grown, modified, duplicated, refactored, deleted; participate together in doing tasks; and generally do basically anything that happens in a mind.
"Element" is not really a type. Example: "doing task X" is an element, and then the above point doesn't really make much sense; does "doing task X" participate in doing a task, or what? Example: some elements are supposed to be "ideal" (a priori, objective, real by themselves), such as propositions (distinct from representations/instances/applications of propositions), and then it doesn't make sense to say that the element was created. So statements about elements are duck-typed, so to speak; when applied to specific elements, they may or may not make sense.
Cosmos. The cosmos is the totality of all elements, mental or not.

2. Novelty, creativity

Acquiring elements

A mind's internal process of creating elements is creativity. The result of creativity is novelty: elements that are new to the mind.

A mind acquires or gains gains an element (gains structure) when it comes to possess the element. A mind possesses an element when the element is integrated into the mind so that the mind is empowered to use the element. Integration means that the element is co-adapted to interoperate with other elements: the element can be called as a function, or applied as a skill; elements communicate in expected formats, and expect the formats they receive; elements organize each other to perform tasks; elements are indexed so that they are brought to bear on contexts where they're useful. A mind is empowered to use an element when it can perform the tasks that the element, if made fully available to the mind, would enable the mind to do. (A more permissive sense of "enables", allowing more ways the mind counterfactually could incorporate and use the element E, gives a stronger sense of "empowered to use E". A more permissive sense of "E enables performing" a task would require, for a mind to be "empowered to use E", that the mind is able to perform a larger set of tasks. Compare the spectrum of coherence.)

Ways of acquiring elements

Novelty can be encountered or acquired in ways other than creativity. Other minds are a source of novelty, encountered and potentially acquired but not created. Learning is a clear example of acquiring novelty, and most learning is only somewhat creative, being heavily alloyed with copying from another mind. Learning to do something on your own is creativity. Creativity is the "creative edge" or "creative froth" of thought; search, trying things out, program search, combinatorial thinking, tweaking ideas. Evolution and automated proof search are creative non-minds: they creative novel structures, without having the context in which those structures are fully themselves. An example of encountering novelty without acquiring it is if a superintelligent AGI kills you by understanding stuff that you don't understand, or if you see a car with an internal combustion engine go fast without knowing about PV=nRT and gears (even if you've already seen cars before; novelty is perennially novel until it's acquired).

Elements can be acquired by:

thinking
invention / design
discovery
learning
abduction
induction
deduction
inference
metaphor (mapping between elements of elements)
emergence (from some dynamical system of elements, e.g. gradient descent, or e.g. the ambient pressures placed on elements by demands to be useful in their context)
copying internally (copying code, or more loosely, distillation)
copying from another mind (e.g. reading an idea or imitating a behavior, or copying code)
"direct impressions" (as in, getting a picture of an object by looking at the object)
mutation, trial and error, ratcheting towards solutions by component-wise improvements
search; Ariadne's thread, exhaustive search by synthesizing all possibilities and eliminating failures (Wiki)
combining elements

Pierce's abduction

Charles Sanders Pierce described three kinds of inference:

Abduction: the creation of hypotheses (theories, ideas, concepts).
Deduction: making explicit the consequences of hypotheses, e.g. their predictions or logical implications; making explicit the analytic content of concepts; world-building.
Induction: selection between hypotheses, e.g. by comparing deductions from those hypotheses against data and eliminating ones that don't fit.

All three of these kinds of inference involve novelty. They are interweaved with each other. For example:

Abduction usually involves a pre-existing store of concepts, and those concepts are there because they've been selected to be somewhat deductively and inductively valid.
Deduction is guided by language, and that language had to be abducted previously.
A large swath of abduction can in some sense in principle be reduced to the idea of universal computation plus inference; see e.g. Solomonoff induction and universal Garrabrant induction.

But overall, abduction is the most creative form of inference: abductive reasoning always involves self-generated novelty, and if all the elements generated by abduction fail to be novel to the reasoner, then it was a failed abduction.

We could add non-linguistic elements to Pierce's scheme of inference:

hypothesis, concept ⟶ disposition to act, element
truth of proposition, descriptiveness of concept, prediction ⟶ (respectively:) success of disposition to act, usefulness of element, recommendation to act
abduction ⟶ creating a disposition to act, creating an element
deduction ⟶ excersizing / applying a disposition to act
induction ⟶ selecting between elements or dispositions to act, e.g. by comparing outcomes from applying them to guide action

3. Measuring structuredness

This section lists some theories that sift out the essence of some kinds of structure and compare structure with structure. This isn't trying to be comprehensive or to demarcate anything; it's a collection intended to gesture at what structure is by describing some of the gross contours of the universe of structure. For some coordinates of structure, see this list of directions in the space of concepts.

Examples part 1: Compressibility (prediction, surprise)

Theme: structuredness correlates with locating or being a small target in a large space.

Information, optimization power. How much something cuts down a search space. Cf. Eliezer's notion of "lawful creativity" (LessWrong)
Compression, retrodiction. How much something encodes stuff compactly.
Algorithmic complexity, sophistication. How uncompressible / hard to describe something is (without just being random, in the case of sophistication).
Surprise, anti-inductivity. How much something is unpredicted; how much it becomes more unpredicted / unpredictable because it's being predicted.

Examples part 2: Definability (computational strength, quantifier complexity, expressive strength)

Theme: structuredness correlates with being able to describe / point at / compute / subsume many things.

Language theory, automata theory. Regular expressions == finite automata; context free grammars == pushdown automata; context sensitive grammar == linear bounded automata. (Wiki)
Computational complexity theory, descriptive complexity theory. If you can solve computational problem X, does that imply that you can solve problem Y? Computational reducibility (with some resource constraints), complexity classes. Some complexity classes are equivalent to the expressiveness of certain kinds of logical formulae: (Wiki). E.g. the polynomial hierarchy level $\Delta_n$ / $\Sigma_n$ / $\Pi_n$ is equivalent to structures defined by $\Delta_n$ / $\Sigma_n$ / $\Pi_n$ formulae in second-order logic.
Recursion theory, arithmetical hierarchy. If you are given a solution to (uncomputable) problem X, can you use it to solve problem Y? Turing reducibility, Turing degrees (and other reductions and degrees). Given the $(n-1)$-times-iterated Turing jump, its relatively recursive / recursively enumerable / co-recursively enumerable sets correspond to sets definable by $\Delta_n^0$ / $\Sigma_n^0$ / $\Pi_n^0$ formulae in first-order arithmetic. (Wiki) The first $\omega$ levels of the Borel hierarchy correspond, by dropping computability, to the arithmetical hierarchy. (Wiki)

[Entering Higher Recursion Theory Zone, which I don't understand so well]

Hyperarithmetical hierarchy, Borel sets. Extends the Turing jump to computable ordinals, extending the arithmetical hierarchy up to $\omega_1^{\mathsf {CK}}$. (Also, defines the hyperjump of a set X as the sum of all $\kappa$-Turing jumps of X for $\kappa$ an X-computable ordinal; hyperarithmetical reducibility, hyperdegrees.) Analogous to the Borel hierarchy, allowing arbitrary countable trees of unions and intersections of open sets. (Wiki) (Wiki) (This maybe corresponds to some infinitary logic; $L_{\omega_1, \omega}$ is maybe related to but not isomorphic with Borel sets (Link).)
Analytic hierarchy, projective hierarchy. Sets definable by $\Delta_n^1$ / $\Sigma_n^1$ / $\Pi_n^1$ formulae of second-order arithmetic. $\Delta_1^1$ is equal to the hyperarithmetical hierarchy. Corresponds to the projective hierarchy of sets. (Wiki) (Wiki) I don't understand this stuff but there might be computational interpretations of this, e.g. in terms of games (Link) or in terms of what sounds like a kind of "uniform relative $\Sigma_2^0$-ness" (Libgen) or maybe here (Libgen). See also infinite time Turing machines.
Much higher recursion theory. Apparently recursion theory can be generalized somehow to larger ordinals, and this is maybe related to constructible sets somehow, and maybe related to the analytic hierarchy? IDK. (Wiki) (Wiki)
Constructibility. Gödel's L. A set theoretic universe constructed in stages consisting of sets that can be defined in reference to the previous stages. Ranks structures in terms of how deep their construction is. (Wiki)
Logical interpretability. Roughly, given two theories T and S, is it the case that there's a set of formulas that, given any model of T, define a model of S as a set of tuples of elements of the model (providing interpretations for symbols of S)? (Link)

Examples part 3: Provability (logical strength)

Theme: structuredness correlates with deductively implying many things (while being consistent).

Proof theoretic ordinal analysis. Given theory T, how complex is it to perform induction on the structure of proofs in T to show that T-proofs can reduce to elementary proofs with weaker assumptions, and therefore T is consistent if the weaker assumptions are true? For what ordinal $\kappa$ is the well-foundedness of $\kappa$ necessary or sufficient to show that T is consistent, given only a weak metatheory? (Wiki)
Reverse mathematics. Continuous with set theory work, but in weaker systems and concerning more everyday mathematics: Which theorems imply which basic axioms? And from that, which theorems imply other (not obviously related) theorems? (Wiki)
Consistency and soundness strength. Which theories prove which other theories consistent or sound? For strong theorems / hypotheses, which axioms are necessary and sufficient to prove them? E.g., (Wiki) and comparisons with strong combinatorial principles.

Remarks on examples of structure

The examples above are somewhat arranged in order of complexity of the structure they describe. Complexity is correlated with "depth", but is not the same; simple things are often "deep", and things that are complex in some sense can be "shallow".

The above list is heavily biased towards things that I'm aware of, things that have some interesting developed theory, and things that fit into hierarchies and uniform comparisons. What other measurements or notions of structure-in-general are there? There are notions of simulation, e.g. "bisimulation", but I'm not aware of very interesting general results there. There's the informal notion of "deep" mathematics, or "deep" insights in general, which have the flavor of retrodiction and the flavor of being generally useful and implying many other things. See Penelope Maddy's work.

It's not necessarily interesting to try specifically to "measure structure", but speaking vaguely, I would like to know how different kinds or dimensions of structure relate. E.g., when someone learns a skill, in what senses are they accessing / using / participating in / creating propositions? (More concretely, what other skills must they be enabling themselves to also learn easily?) Algorithmic complexity theory and computability/definability theory touch on the complexity of "concepts" in some sense, but there's a lot left to ask about; when / how / in what senses does a mind come to understand something, and how can you tell, and what does that imply about what the mind can, can't, will, or won't do?

4. Synopsis

To interface with a mind, we have to understand what it understands. Understanding is some kind of structure. Minds are made of elements. Structure is elements. Structure that's new to a mind is novelty. Creativity is the process of generating novelty. Structuredness correlates with compression, expression, and ~~impression~~ implication.

Search This Blog

Structure, creativity, and novelty