MIRI Updates

The following is a basically unedited summary I wrote up on March 16 of my take on Paul Christiano’s AGI alignment approach (described in “ALBA” and “Iterated Distillation and Amplification”). Where Paul had comments and replies, I’ve included them below....

Updates A new paper: “Categorizing Variants of Goodhart’s Law” New research write-ups and discussions: Distributed Cooperation; Quantilal Control for Finite Markov Decision Processes New at AI Impacts: Transmitting Fibers in the Brain: Total Length and Distribution of Lengths Scott Garrabrant,...

Update Nov. 23: This post was edited to reflect Scott’s terminology change from “naturalized world-models” to “embedded world-models.” For a full introduction to these four research problems, see Scott Garrabrant and Abram Demski’s “Embedded Agency.” Scott Garrabrant is taking over...

Goodhart’s Law states that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” However, this is not a single phenomenon. In Goodhart Taxonomy, I proposed that there are (at least) four different...

Updates New research write-ups and discussions: Knowledge is Freedom; Stable Pointers to Value II: Environmental Goals; Toward a New Technical Explanation of Technical Explanation; Robustness to Scale New at AI Impacts: Likelihood of Discontinuous Progress Around the Development of AGI...

MIRI senior researcher Eliezer Yudkowsky was recently invited to be a guest on Sam Harris’ “Waking Up” podcast. Sam is a neuroscientist and popular author who writes on topics related to philosophy, religion, and public discourse. The following is a...

Browse
Browse
Subscribe
Follow us on