Posts

Building a bridge!

You: We're building a bridge! It's going to be great! We've been thinking about it a bunch, and we're even starting work on it. We've got lots of great people. Let me tell you about why it's going to be great and very aesthetic. Me: Nice! Can you tell me a bit about the loads you're planning for the bridge to carry, and how they'll be supported? You: We'll work it out. We'll have great people. Great people are strong and make bridges that are strong and can carry mighty weights. Me: I'm asking about the bridge itself, not the people. How much weight will the bridge carry, and what materials will you use, and how will the loads be distributed to the pylons, and where will the pylons be embedded? If you don't think about these things and try to build a bridge anyway, it will just collapse. You: I don't get what you mean, you're being vague. Can you show a specific way that our bridge will collapse? Me: I can give an example sce

Overview of strong human intelligence amplification methods

Image
How can we make many humans who are very good at solving difficult problems?

Pet theory: Compound foods confuse the gutbrain

Image
(Unresearched uncareful speculation.) TL;DR: Maybe eating food that's made of a bunch of different ingredients is bad because it prevents your gutbrain and headbrain from learning which foods correspond to which nutrients. Without knowing that, the brains can't accurately say when to eat what. My supporting data: It stands to reason. When I haven't eaten enough protein, I feel something's off and I'm compelled to eat more food; but I don't automatically specifically eat protein. Sometimes when I've exercised a lot, and have been drinking water, I still feel off. Then electrolytes tastes actively good, and I feel better after drinking them (and they stop tasting so good, or even taste a little bad). I had to learn this, I didn't automatically know it. One time I got sick and vomited after having eaten sweet potato fries with dijon mustard. A decade later, I still maybe wouldn't want to eat sweet potato fries, and would probably avoid the combo.

Break

Image
What is breaking? When something breaks, how does it break——how does it suddenly pass away? If it could have not broken, why did it break instead of not breaking? What drives the breaking forward?

The moral obligation not to be eaten

[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ] If someone or something is eating you, this is a problem. It may seem strange to think that this even needs to be said. Surely. Surely, everyone wishes to not be eaten? And so would already agree that if, contrary to that wish, ze is being eaten, this is a problem? But yes, it needs to be said: How many have told themselves: "The one dining on me is misguided, truly, yes, but ze is good of heart. Ze expects to dine on me, and I wouldn't want to disappoint. I can bear this; my flesh regenerates, incompletely maybe, but still, I can bear it. In the future, I'll avoid getting into such a situation, but with zer in particular I'm already here." How many have advised their friend: "Those dining on you are despicable, yes, but there is nothing you can do, for now. If you dismount the dining table, they will hunt you! It would be better to bide your time and survive, for

[Talk] Creating the contexts needed to produce the concepts needed to understand minds

Image
I gave a talk on 20 Feb 2024 in the PIBBSS speaker series . It's a talk version of " A hermeneutic net for agency ". Here's the talk: https://www.youtube.com/watch?v=3ZaxlegV90w Here's the discussion: https://www.youtube.com/watch?v=vznEzOCAmho Abstract: We are fundamentally confused about minds, and about what about a mind determines what about the world . Our concepts don't automatically support the inferences and design choices we would like to make using those concepts , and there are strong forces that will break weak supports . Drive-by attempts to rework one or a few concepts in isolation don't work . Minds are too big and structurally entangled within themselves to centrally unravel with a reductionist piecemeal method . The relevance of the most relevant mental elements is essentially provisional and requires the full context of a mind to be understood . The only source of useable data about minds and their intentions is our own minds . W

Koan: divining alien datastructures from RAM activations

Exploring the ruins of an alien civilization, you find what appears to be a working computer——it's made of plastic and metal, wires connect it to various devices, and you see arrays of capacitors that maintain charged or uncharged states and that sometimes rapidly toggle in response to voltages from connected wires. You can tell that the presumptive RAM is activating in complex but structured patterns, but you don't know their meanings. What strategies can you use to come to understand what the underlying order is, what algorithm the computer is running, that explains the pattern of RAM activations?

Never better!

[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ] If one were to complain, one would hear, whether from the outside or from the inside, something like this:

What could a policy banning AGI look like?

Image
[Caveat lector: I know roughly nothing about policy!] Suppose that there were political support to really halt research that might lead to an unstoppable, unsteerable transfer of control over the lightcone from humans to AGIs. What government policy could exert that political value?

What is wisdom?

Antistrophe

[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ]

The cosmopolitan-Leviathan enthymeme

Image
Some argue that the natural way for a strong mind to form is by a group of smaller agents agreeing to ongoingly create new agents to trade with. Are there underlying premises about the nature of agents that would render valid some version of this argument?

Sum-threshold attacks

Image
How do you affect something far away, a lot, without anyone noticing?

theoriz3r

Image
theoriz3r is a tool to do exploratory automated theorem proving.

A hermeneutic net for agency

Image
A hermeneutic net for agency is a natural method to try, to solve a bunch of philosophical difficulties relatively quickly. Not to say that it would work. It's just the obvious thing to try.

Human wanting

We have pretheoretic ideas of wanting that come from our familiarity with human wanting, in its variety. To see what way of wanting can hold sway in a strong and strongly growing mind, we have to explicate these ideas, and create new ideas.

Views on when AGI comes and on strategy to reduce existential risk

Summary: AGI isn't super likely to come super soon. People should be working on stuff that saves humanity in worlds where AGI comes in 20 or 50 years, in addition to stuff that saves humanity in worlds where AGI comes in the next 10 years.  

Time is homogeneous sequentially-composable determination

Image
Time is the character of courses of events in which determinations——the ways that one event determines the next——are uniform across events and across composing determinations.

Telopheme, telophore, and telotect

Image
To come to know that a mind will have some specified ultimate effect on the world, first come to know, narrowly and in full, what about the mind makes it have effects on the world.

The possible shared Craft of deliberate Lexicogenesis

Image
Words are good. Making more good words is good. Being better and faster at making more good words would be more good. Maybe we can get better and faster at making more good words by working together.

About me

Image
This blog contains most of my public writing. If you'd like to talk with me, you can email me at my gmail address, which has the username: [first part of blog URL] + 'contact' I like to see behind things. I have many ideas for cool things to make, so if you want to make cool things I think of (example: hyperphone ), email me. If you want to make words you can go to lexicogenesis.zulipchat.com . If you're a person who has a streak of fanaticism for decreasing the probability that all human value is destroyed (by AGI), email me. If you're a woman and might want to go out with me, email me. I live in the Bay Area and want to have kids (preferably a lot) (with the right person). I'd be a great father. If you're a biologist or a linguist and are open to just shooting the shit, email me. (Or if you know a whole lot about something.) If you want to go hiking in the middle of the night, email me. If you want to go ice skating, email me. If you want to figure out

Better debates

When two people disagree about a proposition even though they've thought about it alot, the disagreement is often hard to resolve. There's a gulf of data, concepts, intuitions, experiences, inferences . Some of this gulf has to be resolved by the two people individually trying to collate and present their own positions more clearly and legibly, so that they can build up concepts and propositions in whoever is receiving the model. Also, most new understanding comes from people working on their own or with others who are already synced up——for the most part they already agree on what and how to investigate, they have shared context of past experience and data, they agree on background assumptions, they have a shared language, they trust each other. But still, a lot of value comes from debate. The debaters are forced to make their evidence and logic legible. Ideas are tested against other ideas from another at least somewhat coherent perspective. Analogies and disanalogies are dr

Fundamental question: What determines a mind's effects?

A mind has some effects on the world. What determines which effects a mind has? To eventually create minds that have large effects that we specify, this question has to first be answered.

New Alignment Research Agenda: Massive Multiplayer Organism Oversight

When there's an AGI that's smarter than a human, how will we make sure it's not trying to kill us? The answer, in outline, is clear: we will watch the AGI's thoughts, and if it starts thinking about how to kill us, we will turn it off and then fix it so that it stops trying to kill us. 1. Limits of AI transparency There is a serious obstacle to this plan. Namely, the AGI will be very big and complicated, so it will be very difficult for us to watch all of its many thoughts. We don't know how to build structures made of large groups of humans that can process that much information to make good decisions. How can we overcome this obstacle? ML systems Current AI transparency methods are fundamentally limited by the size and richness of their model systems. To gain practical empirical experience today with modeling very large systems, we have to look to systems that are big and complex enough, with the full range of abstractions, to be analogous to future AGI syste

הלבת-אש ללא הסנה

Image
[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ] I seem to have always already lost my wife. I do wonder where she is. I assume she doesn't know where I am, or else she would have returned to me, although——not being able to imagine that she's dead or nonexistent or otherwise radically disempowered——I also eventually come to wonder if she's forsaken me, which choice I would naturally be required to have made myself enough apparently separate to pretend acceptance of, at least long enough for her to depart. Sadly I have also forgotten where she might be, and what she looks like, and worst of all, the sound of her voice murmuring something secret in my ear. I've even forgotten her name. Did it start with a J? Maybe an M? an A? Or was it a $\daleth$ or an Л? I don't remember.  We can be quite sure it doesn't start with an $\aleph$, since she's kind and patient. She likes lemons and she likes the feel of rock on her sk

The fraught voyage of aligned novelty

A voyage of novelty is fraught. If a mind takes a voyage of novelty, an observer is hard pressed to understand what the growing mind is thinking or what effects the growing mind will have on the world.

Provisionality

A mental element has to be open to revision, and so it has to be treated as though it might be revised.

Explicitness

Image
Explicitness is out-foldedness. An element of a mind is explicit when it is available to relate to other elements when suitable.

Communicating with binaries and spectra

To communicate, it's convenient to code information in words and numbers. Words are discrete, so they're well-suited to expressing binaries: this is big, that is small. They're also well-suited to express finite partitions: microscopic, tiny, small, big, huge, enormous. Thought is often tripped up by finite partitions: many things do not fit neatly into the partitions, or what's relevant about something might be only poorly expressible with the available partitions. So instead an adjective can be taken as pointing at a spectrum. This is bigger, that is smaller. This is 10 meters long, that is 1 millimeter long. Thought can also be tripped up by spectra: again, what's relevant might be only poorly expressible as lying somewhere on the spectrum. What's relevant might be multidimensional, so that a one-dimensional representation requires a lossy projection. This weighs 2000 kg and is 10 meters long, that weighs 3 mg and is 1 millimeter long. A description could

Please don't throw your mind away

Image
1. Dialogue 2. Synopsis 3. Highly theoretical justifications for having fun 4. Appendix: What is this "fun" you speak of? What's a circle? Hookwave Random smooth paths Mandala Water flowing uphill Guitar chamber Groups, but without closure Wet wall 1. Dialogue [Warning: the following dialogue contains an incidental spoiler for "Music in Human Evolution" by Kevin Simler . That post is short, good, and worth reading without spoilers, and this post will still be here if you come back later. It's also possible to get the point of this post by skipping the dialogue and reading the other sections.] Pretty often, talking to someone who's arriving to the existential risk / AGI risk / longtermism cluster, I'll have a conversation like the following. ———————————————————— Tsvi: "So, what's been catching your eye about this stuff?" Arrival: "I think I want to work on machine learning, and see if I can contribute to align

Rules for the flighty-souled

[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ] Never take your phone out while you're walking. Unless it's an emergency or you're going to take notes (but voice notes are preferable).   Never take your phone out while you're with someone. Ever. Unless you explicitly take a break from being together. If they take their phone out first, it's less bad, but still never do it. Never wear clothing with words or images. Especially not logos or branded symbols. Accept as little money as possible. Never be in photos, ever. When you're having a long-distance call, make it audio only, not video. Avoid being inside cars. Never go on a dating app. Don't go on Facebook, Twitter, or other similar networks. Never say the same thing twice. Never speak to more than three people at once. Preferably, never speak to more than one person at once. Never touch anyone unless you mutually have some type

esc.

[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ] [Note February 2023: this is an unedited first draft written in April 2018 while I was heavily involved with a psychopath.] 180408 01:46:29 esc. summary: modalities attempt to mention a statement S, so to speak (by transforming it to a different statement [[M]] S), without using it. but the hidden effects of uttering S are often-roughly also caused by [[M]] S. thus S escapes its modality. speaking in a modality [[M]] attempts to transform a statement S into an object some other type, denoted [[M]] S (which can also be taken to be a statement, but in a different modality (as in, sensory modality)). some examples of modalities: [[M]] S may be: a string (the quotation modality; for example, "i was going to say [[']]i can't deal with this right now[[']]"); an emotion (e.g. "[[i feel like]] you are trying to hurt me"); a perception (e.g. "[[it seems to

Wildfire of strategicness

Image
It may not be feasible to make a mind that makes achievable many difficult goals in diverse domains, without the mind also itself having large and increasing effects on the world. That is, it may not be feasible to make a system that strongly possibilizes without strongly actualizing . But suppose that this is feasible, and there is a mind M that strongly possibilizes without strongly actualizing. What happens if some mental elements of M start to act strategically, selecting, across any available domain, actions predicted to push the long-term future toward some specific outcome? The growth of M is like a forest or prairie that accumulates dry grass and trees over time. At some point a spark ignites a wildfire that consumes all the accumulated matter. The spark of strategicness, if such a thing is possible, recruits the surrounding mental elements. Those surrounding mental elements, by hypothesis, make goals achievable. That means the wildfire can recruit these surrounding element

A strong mind continues its trajectory of creativity

A very strong mind is produced by a trajectory of creativity. A trajectory of creativity that produces a very strong mind is hard to separate from the mind's operation. So a strong mind continues on its trajectory of creativity as long as it is active.

An anthropomorphic AI dilemma

Either generally-human-level AI will work internally like humans work internally, or not. If generally-human-level AI works like humans, then takeoff can be very fast, because in silico minds that work like humans are very scalable. If generally-human-level AI does not work like humans, then intent alignment is hard because we can't use our familiarity with human minds to understand the implications of what the AI is thinking or to understand what the AI is trying to do.

The voyage of novelty

Image
Novelty is understanding that is new to a mind, that doesn't readily correspond or translate to something already in the mind. We want AGI in order to understand stuff that we haven't yet understood. So we want a system that takes a voyage of novelty: a creative search progressively incorporating ideas and ways of thinking that we haven't seen before. A voyage of novelty is fraught: we don't understand the relationship between novelty and control within a mind.

Hyperphone

Image
"Did you know the Greenland shark doesn't reach reproductive maturity until it's over 100 years old?" "Yeah, it's crazy! What evolutionary pressures could possibly have produced that trait? Maybe it has to do with... "Well I was thinking that...

Endo-, Dia-, Para-, and Ecto-systemic novelty

Image
Novelty can be coarsely described as one of: fitting within a preexisting system; constituting a shift of the system; creating a new parallel subsystem; or standing unintegrated outside the system.

"Sorry" and the originary concept of apology

1. Paradox of "apology" What does the word "apology" mean? Today it means "say you're sorry". In Ancient Greece, as people say, the etymon ἀπολογία meant "a speech made in defense of something", and this meaning can also attach to the English word "apology". Aren't these nearly exact opposites? Saying sorry is saying you did something wrong, and ἀπολογία is defending what you did, saying it's not wrong.

Verichtung

[This post is labeled בבל, meaning it's especially experimental. See: בבל disclaimer ] (Caveat lector: I only speak English and didn't run this by anyone.) Sonnendurchflutet Bäume über einem stillgelegt Steinbruch, Spiegel einander gegenüber, zu nah. Die geworfenen Würfel sind Schlangenaugen. ...איך להסביר לילד שכולם ימ Shield-toad left in the Haze, Gebröckelt Steineule auf der Hügelspitze. Unter der Unterfläche wartet der blaue Dynamo. Notes: "Verichtung" is a made-up word, patterned off " Vernichtung ", replacing "nicht" with the obsolete analogous word " icht ". "Icht" could maybe be viewed as "je-Wicht" (as in English "wight"), meaning something like "ever something", as opposed to "nicht" = "nie-Wicht" = "never something". So "Verichtung" would mean something like "be-something-ing, to make something be something, to make something

בבל disclaimer

Here are the posts labeled "בבל": https://tsvibt.blogspot.com/search/label/%D7%91%D7%91%D7%9C Posts labeled "בבל" are more experimental, unreliable, poetic, prophetic, metaphoric, contradictory, false, confused, incoherent, unclear, inchoate, incontinent, insane, repetitive, low-effort, pointless, cringe, fringe, binge, silly, facile, babbling, rambling, squabbling, and any other manner of not to necessarily be taken too seriously, compared to other posts.

Possibilizing vs. actualizing

Some behavior seems like it's just making things possible, without actually doing much of anything, while other behavior seems to actually do something. Is there a principled, or a useful, distinction between possibilizing and actualizing? Is it possible to possibilize a large effect on the world without actualizing large effects on the world?

Ultimate ends may be easily hidable behind convergent subgoals

Image
$\require{AMScd}$ Thought and action in pursuit of convergent instrumental subgoals do not automatically reveal why those subgoals are being pursued——towards what supergoals——because many other agents with different supergoals would also pursue those subgoals, maybe with overlapping thought and action. In particular, an agent's ultimate ends don't have to be revealed by its pursuit of convergent subgoals. It might might therefore be easy to covertly pursue some ultimate goal by mostly pursuing generally useful subgoals of other supergoals. By the inspection paradox for the convergence of subgoals, it might be easy to think and act almost comprehensively like a non-threatening agent would think and act, while going most of the way towards achieving some other more ambitious goal.

Politically convergent perverse instability

[Epistemic status: just a guess / hypothesis.] Suppose Alice is anti-immigration and has political power. She politically pushes for laws against immigration, government spending towards capacity to prevent immigration like walls and guards, policies to deport illegal immigrants, and so on. Alice also pushes against policies to cope with whatever the current de facto status quo is, e.g. to alleviate harms done by whatever is already going on, or at least doesn't push for such alleviation policies. Those policies would alleviate pressure to change, and Alice wants change; they'd make the status quo less bad, and Alice doesn't like the status quo. And, those policies being passed would constitute, relative to Alice's desired anti-immigration stance, a symbolic victory for the side opposing Alice; it would "say", in the language of politics, that "we are okay with the status quo, we're organizing to make something like the status quo work well". Pl

Descriptive vs. specifiable values

What are an agent's values? An answer to this question might be a good description of the agent's external behavior and internal workings, without showing how one could modify the agent's workings or origins so that the agent pushes the world in a specific different direction.

Shell games

Image
1. Shell game 2. Perpetual motion machines 3. Shell games in alignment Example: hiding the generator of large effects Example: hiding the generator of novel understanding Other? 1. Shell game Here's the classic shell game: Youtube Screenshot from that video. The little ball is a phantom: when you look for it under a specific shell, it's not there, it's under a different shell.