October 2019 Newsletter

 |   |  Newsletters

September 2019 Newsletter

 |   |  Newsletters

August 2019 Newsletter

 |   |  Newsletters

July 2019 Newsletter

 |   |  Newsletters

New paper: “Risks from learned optimization”

 |   |  Papers

Risks from Learned Optimization in Advanced Machine Learning SystemsEvan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant have a new paper out: “Risks from learned optimization in advanced machine learning systems.”

The paper’s abstract:

We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this paper.

We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research.

The critical distinction presented in the paper is between what an AI system is optimized to do (its base objective) and what it actually ends up optimizing for (its mesa-objective), if it optimizes for anything at all. The authors are interested in when ML models will end up optimizing for something, as well as how the objective an ML model ends up optimizing for compares to the objective it was selected to achieve.

The distinction between the objective a system is selected to achieve and the objective it actually optimizes for isn’t new. Eliezer Yudkowsky has previously raised similar concerns in his discussion of optimization daemons, and Paul Christiano has discussed such concerns in “What failure looks like.”

The paper’s contents have also been released this week as a sequence on the AI Alignment Forum, cross-posted to LessWrong. As the authors note there:

We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we plan to present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems.

Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning.


Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.


June 2019 Newsletter

 |   |  Newsletters

2018 in review

 |   |  MIRI Strategy

Our primary focus at MIRI in 2018 was twofold: research—as always!—and growth.

Thanks to the incredible support we received from donors the previous year, in 2018 we were able to aggressively pursue the plans detailed in our 2017 fundraiser post. The most notable goal we set was to “grow big and grow fast,” as our new research directions benefit a lot more from a larger team, and require skills that are a lot easier to hire for. To that end, we set a target of adding 10 new research staff by the end of 2019.

2018 therefore saw us accelerate the work we started in 2017, investing more in recruitment and shoring up the foundations needed for our ongoing growth. Since our 2017 fundraiser post, we’re up 3 new research staff, including noted Haskell developer Edward Kmett. I now think that we’re most likely to hit 6–8 hires by the end of 2019, though hitting 9–10 still seems quite possible to me, as we are still engaging with many promising candidates, and continue to meet more.

Overall, 2018 was a great year for MIRI. Our research continued apace, and our recruitment efforts increasingly paid out dividends.
Read more »

May 2019 Newsletter

 |   |  Newsletters