Posts

Showing posts from 2025

Secret: Why an AI might be controlled by dangerous hidden thoughts

[Note: This is a draft for a contest submission. I'm publishing it before it's fully edited because of the Inkhaven deadline. You may or may not want to wait some days before reading it.] [After a few small tweaks, this is now probably as edited as it will get.] [This is a draft script for a hypothetical video; it's written in a different style from what I normally write.] 1. [intro about AI] Researchers are racing to make smarter-than-human AI. Some of them say that AI can probably be made safe by instilling values into the AI. But what if those plans have a fundamental obstacle? What if no one knows how to program values into an AI in a way that will stick around as the AI gets smarter? In this video we'll look at one way of understanding what might go wrong with plans like this. 2. [intro atlantis] Imagine for a moment the recently founded island nation of New Atlantis. The Atlantean citizens have been hard at work on roads, houses, hospitals, sewers, a defens...

Inkhaven postmortem 😵

Image
1. Introduction 2. Why I am so clever 3. Why I am so verbose 4. Why I make such excellent memes 5. Why I am so tired 6. Endnotes 7. thanks 1. Introduction Alright! I did it! I've published 30 posts in 30 days for INKHAVEN 2025 . Or, I mean, I hereby, via this post, am doing it. Am having done it? I am have-doning it. Unless I need another day to edit my contest submission post, in which case this is my penultimate post, and I will be am have-doning it. I'm not doing what works, I'm doing what's funny, in the margins . It was a nice warm blood wordbath. I'm in the inaugural cohort. I'm selectively decorrelated. Decorrelated from what? Doesn't matter. Everything. Anything. The possibilities are endless. The possibilities are wordpress (dot com). Having just spent a month pressing words (dot com), tens of thousands of em, out of my nose, what have I learned? What have I un learned? If anything. Or everything. 2. Why I am so clever If I have on...

Ah Motiva 3: The context of the concept of value

1. Background 2. Why even talk about values? 2.1. Correlated coverage tends to be founded on values 2.2. Corrigibility handles are founded on values 2.3. Constraints, handles, desiderata, antidesiderata 2.4. Constraints and handles related to values 3. Where do values come from? 4. The fact-value distinction 4.1. The basic fact-value distinction 4.2. Complications with the distinction 4.3. What is the type of "value"? 4.4. Preciser fact-value distinctions 1. Background This is the third essay in a series under the title "A hermeneutic Movement of the idea of values". This essay is a mix of old notes and some new meditations. The previous essay is " Ah Motiva 2: Relating values and novelty ". 2. Why even talk about values? In short: we don't have any idea what values actually are, and we probably need to, and it would probably be very helpful. In more words: The fundamental question of rationality is Why do you believe what you be...

Ah Motiva 2: Relating values and novelty

1. Background 2. Capable minds with specifiable effects 3. The idea of values is promissory 3.1. Using the idea of values to specify effects 3.2. Human wanting 3.3. Aside on ideas vs. concepts 3.4. Promissoriness asks for holding off on demarcation 4. Are values essentially diasystemic? 4.1. Example: Self-regenerating friendship 4.2. Human values are diasystemic 4.3. Values require reference 4.4. Self-interpretive metavalues produce diasystemically novel values 1. Background This is the second essay in a series under the title "A hermeneutic Movement of the idea of values". This essay is a mix of old notes and some new meditations. The previous essay is " Ah Motiva 1: Words about values ". 2. Capable minds with specifiable effects The starting point of AGI alignment is the question of how to make a mind that is highly capable, and whose ultimate effects are determined by the judgement of human operators. In other words, the mind should empower...