August 2021 Newsletter

Posted by & filed under Newsletters.

MIRI updates Scott Garrabrant and Rohin Shah debate one of the central questions in AI alignment strategy: whether we should try to avoid human-modeling capabilities in the first AGI systems. Scott gives a proof of the fundamental theorem of finite factored sets. News and links Redwood Research, a new AI alignment research organization, is seeking an operations lead. Led… Read more »

July 2021 Newsletter

Posted by & filed under Newsletters.

MIRI updates MIRI researcher Evan Hubinger discusses learned optimization, interpretability, and homogeneity in takeoff speeds on the Inside View podcast. Scott Garrabrant releases part three of "Finite Factored Sets", on conditional orthogonality. UC Berkeley's Daniel Filan provides examples of conditional orthogonality in finite factored sets: 1, 2. Abram Demski proposes factoring the alignment problem into "outer alignment"… Read more »

June 2021 Newsletter

Posted by & filed under Newsletters.

Our big news this month is Scott Garrabrant's finite factored sets, one of MIRI's largest results to date. For most people, the best introductory resource on FFS is likely Scott’s Topos talk/transcript. Scott is also in the process of posting a longer, more mathematically dense introduction in multiple parts: part 1, part 2. Scott has also discussed… Read more »

May 2021 Newsletter

Posted by & filed under Newsletters.

MIRI senior researcher Scott Garrabrant has a major new result, “Finite Factored Sets,” that he’ll be unveiling in an online talk this Sunday at noon Pacific time. (Zoom link.) For context on the result, see Scott’s new post “Saving Time.” In other big news, MIRI has just received its two largest individual donations of all… Read more »

April 2021 Newsletter

Posted by & filed under Newsletters.

MIRI updates MIRI researcher Abram Demski writes regarding counterfactuals: I've felt like the problem of counterfactuals is "mostly settled" (modulo some math working out) for about a year, but I don't think I've really communicated this online. Partly, I've been waiting to write up more formal results. But other research has taken up most of my… Read more »

March 2021 Newsletter

Posted by & filed under Newsletters.

MIRI updates MIRI's Eliezer Yudkowsky and Evan Hubinger comment in some detail on Ajeya Cotra's The Case for Aligning Narrowly Superhuman Models. This conversation touches on some of the more important alignment research views at MIRI, such as the view that alignment requires a thorough understanding of AGI systems' reasoning "under the hood", and the view that early AGI systems should most likely avoid human… Read more »

February 2021 Newsletter

Posted by & filed under Newsletters.

MIRI updates Abram Demski distinguishes different versions of the problem of “pointing at” human values in AI alignment. Evan Hubinger discusses “Risks from Learned Optimization” on the AI X-Risk Research Podcast. Eliezer Yudkowsky comments on AI safety via debate and Goodhart’s law. MIRI supporters donated ~$135k on Giving Tuesday, of which ~26% was matched by Facebook and ~28% by employers… Read more »

January 2021 Newsletter

Posted by & filed under Newsletters.

MIRI updates MIRI’s Evan Hubinger uses a notion of optimization power to define whether AI systems are compatible with the strategy-stealing assumption. MIRI’s Abram Demski discusses debate approaches to AI safety that don’t rely on factored cognition. Evan argues that the first AGI systems are likely to be very similar to each other, and discusses… Read more »