Blog

Category: Analysis

Ngo and Yudkowsky on alignment difficulty

  This post is the first in a series of transcribed Discord conversations between Richard Ngo and Eliezer Yudkowsky, moderated by Nate Soares. We’ve also added Richard and Nate’s running summaries of the conversation (and others’ replies) from Google Docs....

Discussion with Eliezer Yudkowsky on AGI interventions

  The following is a partially redacted and lightly edited transcript of a chat conversation about AGI between Eliezer Yudkowsky and a set of invitees in early September 2021. By default, all other participants are anonymized as “Anonymous”. I think...

Saving Time

Note: This is a preamble to Finite Factored Sets, a sequence I’ll be posting over the next few weeks. This Sunday at noon Pacific time, I’ll be giving a Zoom talk (link) introducing Finite Factored Sets, a framework which I...

Thoughts on Human Models

This is a joint post by MIRI Research Associate and DeepMind Research Scientist Ramana Kumar and MIRI Research Fellow Scott Garrabrant, cross-posted from the AI Alignment Forum and LessWrong. Human values and preferences are hard to specify, especially in complex...

Embedded Curiosities

This is the conclusion of the Embedded Agency series. Previous posts:   Embedded Agents  —  Decision Theory  —  Embedded World-ModelsRobust Delegation  —  Subsystem Alignment     A final word on curiosity, and intellectual puzzles: I described an embedded agent, Emmy,...

Subsystem Alignment

[mathjax]   You want to figure something out, but you don’t know how to do that yet. You have to somehow break up the task into sub-computations. There is no atomic act of “thinking”; intelligence must be built up of...