MIRI Updates
June 2018 Newsletter
Updates New research write-ups and discussions: Logical Inductors Converge to Correlated Equilibria (Kinda) MIRI researcher Tsvi Benson-Tilsen and Alex Zhu ran an AI safety retreat for MIT students and alumni. Andrew Critch discusses what kind of advice to give to...
May 2018 Newsletter
Updates New research write-ups and discussions: Resource-Limited Reflective Oracles; Computing An Exact Quantilal Policy New at AI Impacts: Promising Research Projects MIRI research fellow Scott Garrabrant and associates Stuart Armstrong and Vanessa Kosoy are among the winners in the second...
Challenges to Christiano’s capability amplification proposal
[mathjax] The following is a basically unedited summary I wrote up on March 16 of my take on Paul Christiano’s AGI alignment approach (described in “ALBA” and “Iterated Distillation and Amplification”). Where Paul had comments and replies, I’ve included them...
April 2018 Newsletter
Updates A new paper: “Categorizing Variants of Goodhart’s Law” New research write-ups and discussions: Distributed Cooperation; Quantilal Control for Finite Markov Decision Processes New at AI Impacts: Transmitting Fibers in the Brain: Total Length and Distribution of Lengths Scott Garrabrant,...
2018 research plans and predictions
Update Nov. 23: This post was edited to reflect Scott’s terminology change from “naturalized world-models” to “embedded world-models.” For a full introduction to these four research problems, see Scott Garrabrant and Abram Demski’s “Embedded Agency.” Scott Garrabrant is taking over...
New paper: “Categorizing variants of Goodhart’s Law”
Goodhart’s Law states that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” However, this is not a single phenomenon. In Goodhart Taxonomy, I proposed that there are (at least) four different...