June 2019 Newsletter

 |   |  Newsletters

2018 in review

 |   |  MIRI Strategy

Our primary focus at MIRI in 2018 was twofold: research—as always!—and growth.

Thanks to the incredible support we received from donors the previous year, in 2018 we were able to aggressively pursue the plans detailed in our 2017 fundraiser post. The most notable goal we set was to “grow big and grow fast,” as our new research directions benefit a lot more from a larger team, and require skills that are a lot easier to hire for. To that end, we set a target of adding 10 new research staff by the end of 2019.

2018 therefore saw us accelerate the work we started in 2017, investing more in recruitment and shoring up the foundations needed for our ongoing growth. Since our 2017 fundraiser post, we’re up 3 new research staff, including noted Haskell developer Edward Kmett. I now think that we’re most likely to hit 6–8 hires by the end of 2019, though hitting 9–10 still seems quite possible to me, as we are still engaging with many promising candidates, and continue to meet more.

Overall, 2018 was a great year for MIRI. Our research continued apace, and our recruitment efforts increasingly paid out dividends.
Read more »

May 2019 Newsletter

 |   |  Newsletters

New paper: “Delegative reinforcement learning”

 |   |  Papers

Delegative Reinforcement LearningMIRI Research Associate Vanessa Kosoy has written a new paper, “Delegative reinforcement learning: Learning to avoid traps with a little help.” Kosoy will be presenting the paper at the ICLR 2019 SafeML workshop in two weeks. The abstract reads:

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.)

The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.

The goal of Kosoy’s work on DRL is to put us on a path toward having a deep understanding of learning systems with human-in-the-loop and formal performance guarantees, including safety guarantees. DRL tries to move us in this direction by providing models in which such performance guarantees can be derived.

While these models still make many unrealistic simplifying assumptions, Kosoy views DRL as already capturing some of the most essential features of the problem—and she has a fairly ambitious vision of how this framework might be further developed.

Kosoy previously described DRL in the post Delegative Reinforcement Learning with a Merely Sane Advisor. One feature of DRL Kosoy described here but omitted from the paper (for space reasons) is DRL’s application to corruption. Given certain assumptions, DRL ensures that a formal agent will never have its reward or advice channel tampered with (corrupted). As a special case, the agent’s own advisor cannot cause the agent to enter a corrupt state. Similarly, the general protection from traps described in “Delegative reinforcement learning” also protects the agent from harmful self-modifications.

Another set of DRL results that didn’t make it into the paper is Catastrophe Mitigation Using DRL. In this variant, a DRL agent can mitigate catastrophes that the advisor would not be able to mitigate on its own—something that isn’t supported by the more strict assumptions about the advisor in standard DRL.

Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.


April 2019 Newsletter

 |   |  Newsletters

New grants from the Open Philanthropy Project and BERI

 |   |  News

I’m happy to announce that MIRI has received two major new grants:

The Open Philanthropy Project’s grant was awarded as part of the first round of grants recommended by their new committee for effective altruism support:

We are experimenting with a new approach to setting grant sizes for a number of our largest grantees in the effective altruism community, including those who work on long-termist causes. Rather than have a single Program Officer make a recommendation, we have created a small committee, comprised of Open Philanthropy staff and trusted outside advisors who are knowledgeable about the relevant organizations. […] We average the committee members’ votes to arrive at final numbers for our grants.

The Open Philanthropy Project’s grant is separate from the three-year $3.75 million grant they awarded us in 2017, the third $1.25 million disbursement of which is still scheduled for later this year. This new grant increases the Open Philanthropy Project’s total support for MIRI from $1.4 million1 in 2018 to ~$2.31 million in 2019, but doesn’t reflect any decision about how much total funding MIRI might receive from Open Phil in 2020 (beyond the fact that it will be at least ~$1.06 million).

Going forward, the Open Philanthropy Project currently plans to determine the size of any potential future grants to MIRI using the above committee structure.

We’re very grateful for this increase in support from BERI and the Open Philanthropy Project—both organizations that already numbered among our largest funders of the past few years. We expect these grants to play an important role in our decision-making as we continue to grow our research team in the ways described in our 2018 strategy update and fundraiser posts.

  1. The $1.4 million counts the Open Philanthropy Project’s $1.25 million disbursement in 2018, as well as a $150,000 AI Safety Retraining Program grant to MIRI. 

March 2019 Newsletter

 |   |  Newsletters

Applications are open for the MIRI Summer Fellows Program!

 |   |  News

CFAR and MIRI are running our fifth annual MIRI Summer Fellows Program (MSFP) in the San Francisco Bay Area from August 9 to August 24, 2019.

MSFP is an extended retreat for mathematicians and programmers with a serious interest in making technical progress on the problem of AI alignment. It includes an overview of CFAR’s applied rationality content, a breadth-first grounding in the MIRI perspective on AI safety, and multiple days of actual hands-on research with participants and MIRI staff attempting to make inroads on open questions.

Read more »