# New paper: “Toward negotiable reinforcement learning”

|   |  Papers

MIRI Research Fellow Andrew Critch has developed a new result in the theory of conflict resolution, described in “Toward negotiable reinforcement learning: Shifting priorities in Pareto optimal sequential decision-making.”

Abstract:

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time.

Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from naïve linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

# Response to Cegłowski on superintelligence

|   |  Analysis

Web developer Maciej Cegłowski recently gave a talk on AI safety (video, text) arguing that we should be skeptical of the standard assumptions that go into working on this problem, and doubly skeptical of the extreme-sounding claims, attitudes, and policies these premises appear to lead to. I’ll give my reply to each of these points below.

First, a brief outline: this will mirror the structure of Cegłowski’s talk in that first I try to put forth my understanding of the broader implications of Cegłowski’s talk, then deal in detail with the inside-view arguments as to whether or not the core idea is right, then end by talking some about the structure of these discussions.

# January 2017 Newsletter

|   |  Newsletters

# November 2016 Newsletter

|   |  Newsletters

 Post-fundraiser update: Donors rallied late last month to get us most of the way to our first fundraiser goal, but we ultimately fell short. This means that we’ll need to make up the remaining 160k gap over the next month if we’re going to move forward on our 2017 plans. We’re in a good position to expand our research staff and trial a number of potential hires, but only if we feel confident about our funding prospects over the next few years. Since we don’t have an official end-of-the-year fundraiser planned this time around, we’ll be relying more on word-of-mouth to reach new donors. To help us with our expansion plans, donate at https://intelligence.org/donate/ — and spread the word! Research updates Critch gave an introductory talk on logical induction (video) for a grad student seminar, going into more detail than our previous talk. New at IAFF: Logical Inductor Limts Are Dense Under Pointwise Convergence; Bias-Detecting Online Learners; Index of Some Decision Theory Posts We ran a second machine learning workshop. General updates We ran an “Ask MIRI Anything” Q&A on the Effective Altruism forum. We posted the final videos from our Colloquium Series on Robust and Beneficial AI, including Armstrong on “Reduced Impact AI” (video) and Critch on “Robust Cooperation of Bounded Agents” (video). We attended OpenAI’s first unconference; see Viktoriya Krakovna’s recap. Eliezer Yudkowsky spoke on fundamental difficulties in aligning advanced AI at NYU’s “Ethics of AI” conference. A major development: Barack Obama and a recent White House report discuss intelligence explosion, Nick Bostrom’s Superintelligence, open problems in AI safety, and key questions for forecasting general AI. See also the submissions to the White House from MIRI, OpenAI, Google Inc., AAAI, and other parties. News and links The UK Parliament cites recent AI safety work in a report on AI and robotics. The Open Philanthropy Project discusses methods for improving individuals’ forecasting abilities. Paul Christiano argues that AI safety will require that we align a variety of AI capacities with our interests, not just learning — e.g., Bayesian inference and search. See also new posts from Christiano on reliability amplification, reflective oracles, imitation + reinforcement learning, and the case for expecting most alignment problems to arise first as security problems. The Leverhulme Centre for the Future of Intelligence has officially launched, and is hiring postdoctoral researchers: details. # Post-fundraiser update | | News We concluded our 2016 fundraiser eleven days ago. Progress was slow at first, but our donors came together in a big way in the final week, nearly doubling our final total. In the end, donors raised589,316 over six weeks, making this our second-largest fundraiser to date. I’m heartened by this show of support, and extremely grateful to the 247 distinct donors who contributed.

We made substantial progress toward our immediate funding goals, but ultimately fell short of our $750,000 target by about$160k. We have a number of hypotheses as to why, but our best guess at the moment is that we missed our target because more donors than expected are waiting until the end of the year to decide whether (and how much) to give.

We were experimenting this year with running just one fundraiser in the fall (replacing the summer and winter fundraisers we’ve run in years past) and spending less time over the year on fundraising. Our fundraiser ended up looking more like recent summer funding drives, however. This suggests that either many donors are waiting to give in November and December, or we’re seeing a significant decline in donor support:

Looking at our donor database, preliminary data weakly suggests that many traditionally-winter donors are holding off, but it’s still hard to say.

This dip in donations so far is offset by the Open Philanthropy Project’s generous $500k grant, which raises our overall 2016 revenue from$1.23M to $1.73M. However,$1.73M would still not be enough to cover our 2016 expenses, much less our expenses for the coming year:

(2016 and 2017 expenses are projected, and our 2016 revenue is as of November 11.)

To a first approximation, this level of support means that we can continue to move forward without scaling back our plans too much, but only if donors come together to fill what’s left of our $160k gap as the year draws to a close: In practical terms, closing this gap will mean that we can likely trial more researchers over the coming year, spend less senior staff time on raising funds, and take on more ambitious outreach and researcher-pipeline projects. E.g., an additional expected$75k / year would likely cause us to trial one extra researcher over the next 18 months (maxing out at 3-5 trials).

Currently, we’re in a situation where we have a number of potential researchers that we would like to give a 3-month trial, and we lack the funding to trial all of them. If we don’t close the gap this winter, then it’s also likely that we’ll need to move significantly more slowly on hiring and trialing new researchers going forward.

Our main priority in fundraisers is generally to secure stable, long-term flows of funding to pay for researcher salaries — “stable” not necessarily at the level of individual donors, but at least at the level of the donor community at large. If we make up our shortfall in November and December, then this will suggest that we shouldn’t expect big year-to-year fluctuations in support, and therefore we can fairly quickly convert marginal donations into AI safety researchers. If we don’t make up our shortfall soon, then this will suggest that we should be generally more prepared for surprises, which will require building up a bigger runway before growing the team very much.

Although we aren’t officially running a fundraiser, we still have quite a bit of ground to cover, and we’ll need support from a lot of new and old donors alike to get the rest of the way to our $750k target. Visit intelligence.org/donate to donate toward this goal, and do spread the word to people who may be interested in supporting our work. You have my gratitude, again, for helping us get this far. It isn’t clear yet whether we’re out of the woods, but we’re now in a position where success in our 2016 fundraising is definitely a realistic option, provided that we put some work into it over the next two months. Thank you. Update December 22: We have now hit our$750k goal, with help from end-of-the-year donors. Many thanks to everyone who helped pitch in over the last few months! We’re still funding-constrained with respect to how many researchers we’re likely to trial, as described above — but it now seems clear that 2016 overall won’t be an unusually bad year for us funding-wise, and that we can seriously consider (though not take for granted) more optimistic growth possibilities over the next couple of years.

December/January donations will continue to have a substantial effect on our 2017–2018 hiring plans and strategy as we try to assess our future prospects. For some external endorsements of MIRI as a good place to give this winter, see a suite of recent evaluations by Daniel Dewey, Nick Beckstead, Owen Cotton-Barratt, and Ben Hoskin.