Showing posts from April, 2023

Fundamental question: What determines a mind's effects?

A mind has some effects on the world. What determines which effects a mind has? To eventually create minds that have large effects that we specify, this question has to first be answered.

New Alignment Research Agenda: Massive Multiplayer Organism Oversight

When there's an AGI that's smarter than a human, how will we make sure it's not trying to kill us? The answer, in outline, is clear: we will watch the AGI's thoughts, and if it starts thinking about how to kill us, we will turn it off and then fix it so that it stops trying to kill us. 1. Limits of AI transparency There is a serious obstacle to this plan. Namely, the AGI will be very big and complicated, so it will be very difficult for us to watch all of its many thoughts. We don't know how to build structures made of large groups of humans that can process that much information to make good decisions. How can we overcome this obstacle? ML systems Current AI transparency methods are fundamentally limited by the size and richness of their model systems. To gain practical empirical experience today with modeling very large systems, we have to look to systems that are big and complex enough, with the full range of abstractions, to be analogous to future AGI syste