CSRBAI talks on preference specification

 |   |  News, Video

We’ve uploaded a third set of videos from our recent Colloquium Series on Robust and Beneficial AI (CSRBAI), co-hosted with the Future of Humanity Institute. These talks were part of the week focused on preference specification in AI systems, including the difficulty of specifying safe and useful goals, or specifying safe and useful methods for learning human preferences. All released videos are available on the CSRBAI web page.



Tom Everitt, a PhD student at the Australian National University, spoke about his paper “Avoiding wireheading with value reinforcement learning,” written with Marcus Hutter (slides). Abstract:

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward — the so-called wireheading problem.

In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading. Our VRL agent offers the ease of control of RL agents and avoids the incentive for wireheading.

Read more »

CSRBAI talks on robustness and error-tolerance

 |   |  News, Video

We’ve uploaded a second set of videos from our recent Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office, co-hosted with the Future of Humanity Institute. These talks were part of the week focused on robustness and error-tolerance in AI systems, and how to ensure that when AI system fail, they fail gracefully and detectably. All released videos are available on the CSRBAI web page.



Bart Selman, professor of computer science at Cornell University, spoke about machine reasoning and planning (slides). Excerpt:

I’d like to look at what I call “non-human intelligence.” It does get less attention, but the advances also have been very interesting, and they’re in reasoning and planning. It’s actually partly not getting as much attention in the AI world because it’s more used in software verification, program synthesis, and automating science and mathematical discoveries – other areas related to AI but not a central part of AI that are using these reasoning technologies. Especially the software verification world – Microsoft, Intel, IBM – push these reasoning programs very hard, and that’s why there’s so much progress, and I think it will start feeding back into AI in the near future.

Read more »

MIRI strategy update: 2016

 |   |  MIRI Strategy

This post is a follow-up to Malo’s 2015 review, sketching out our new 2016-2017 plans. Briefly, our top priorities (in decreasing order of importance) are to (1) make technical progress on the research problems we’ve identified, (2) expand our team, and (3) build stronger ties to the wider research community.

As discussed in a previous blog post, the biggest update to our research plans is that we’ll be splitting our time going forward between our 2014 research agenda (the “agent foundations” agenda) and a new research agenda oriented toward machine learning work led by Jessica Taylor: “Alignment for Advanced Machine Learning Systems.”

Three additional news items:

1. I’m happy to announce that MIRI has received support from a major new donor: entrepreneur and computational biologist Blake Borgeson, who has made a $300,000 donation to MIRI. This is the second-largest donation MIRI has received in its history, beaten only by Jed McCaleb’s 2013 cryptocurrency donation. As a result, we’ve been able to execute on our growth plans with more speed, confidence, and flexibility.

2. This year, instead of running separate summer and winter fundraisers, we’re merging them into one more ambitious fundraiser, which will take place in September.

3. I’m also pleased to announce that Abram Demski has accepted a position as a MIRI research fellow. Additionally, Ryan Carey has accepted a position as an assistant research fellow, and we’ve hired some new administrative staff.

I’ll provide more details on these and other new developments below.

Read more »

August 2016 Newsletter

 |   |  Newsletters

Research updates

General updates

  • Our 2015 in review, with a focus on the technical problems we made progress on.
  • Another recap: how our summer colloquium series and fellows program went.
  • We’ve uploaded our first CSRBAI talks: Stuart Russell on “AI: The Story So Far” (video), Alan Fern on “Toward Recognizing and Explaining Uncertainty” (video), and Francesca Rossi on “Moral Preferences” (video).
  • We submitted our recommendations to the White House Office of Science and Technology Policy, cross-posted to our blog.
  • We attended IJCAI and the White House’s AI and economics event. Furman on technological unemployment (video) and other talks are available online.
  • Talks from June’s safety and control in AI event are also online. Speakers included Microsoft’s Eric Horvitz (video), FLI’s Richard Mallah (video), Google Brain’s Dario Amodei (video), and IARPA’s Jason Matheny (video).

News and links

2016 summer program recap

 |   |  News, Video

As previously announced, we recently ran a 22-day Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office, co-hosted with the Oxford Future of Humanity Institute. The colloquium was aimed at bringing together safety-conscious AI scientists from academia and industry to share their recent work. The event served that purpose well, initiating some new collaborations and a number of new conversations between researchers who hadn’t interacted before or had only talked remotely.

Over 50 people attended from 25 different institutions, with an average of 15 people present on any given talk or workshop day. In all, there were 17 talks and four weekend workshops on the topics of transparency, robustness and error-tolerance, preference specification, and agent models and multi-agent dilemmas. The full schedule and talk slides are available on the event page. Videos from the first day of the event are now available, and we’ll be posting the rest of the talks online soon:



Stuart Russell, professor of computer science at UC Berkeley and co-author of Artificial Intelligence: A Modern Approach, gave the opening keynote. Russell spoke on “AI: The Story So Far” (slides). Abstract:

I will discuss the need for a fundamental reorientation of the field of AI towards provably beneficial systems. This need has been disputed by some, and I will consider their arguments. I will also discuss the technical challenges involved and some promising initial results.

Russell discusses his recent work on cooperative inverse reinforcement learning 36 minutes in. This paper and Dylan Hadfield-Menell’s related talk on corrigibility (slides) inspired lots of interest and discussion at CSRBAI.

Read more »

2015 in review

 |   |  MIRI Strategy

As Luke had done in years past (see 2013 in review and 2014 in review), I (Malo) wanted to take some time to review our activities from last year. In the coming weeks Nate will provide a big-picture strategy update. Here, I’ll take a look back at 2015, focusing on our research progress, academic and general outreach, fundraising, and other activities.

After seeing signs in 2014 that interest in AI safety issues was on the rise, we made plans to grow our research team. Fueled by the response to Bostrom’s Superintelligence and the Future of Life Institute’s “Future of AI” conference, interest continued to grow in 2015. This suggested that we could afford to accelerate our plans, but it wasn’t clear how quickly.

In 2015 we did not release a mid-year strategic plan, as Luke did in 2014. Instead, we laid out various conditional strategies dependent on how much funding we raised during our 2015 Summer Fundraiser. The response was great; we had our most successful fundraiser to date. We hit our first two funding targets (and then some), and set out on an accelerated 2015/2016 growth plan.

As a result, 2015 was a big year for MIRI. After publishing our technical agenda at the start of the year, we made progress on many of the open problems it outlined, doubled the size of our core research team, strengthened our connections with industry groups and academics, and raised enough funds to maintain our growth trajectory. We’re very grateful to all our supporters, without whom this progress wouldn’t have been possible.

Read more »

New paper: “Alignment for advanced machine learning systems”

 |   |  Papers

Alignment for Advanced Machine Learning SystemsMIRI’s research to date has focused on the problems that we laid out in our late 2014 research agenda, and in particular on formalizing optimal reasoning for bounded, reflective decision-theoretic agents embedded in their environment. Our research team has since grown considerably, and we have made substantial progress on this agenda, including a major breakthrough in logical uncertainty that we will be announcing in the coming weeks.

Today we are announcing a new research agenda, “Alignment for advanced machine learning systems.” Going forward, about half of our time will be spent on this new agenda, while the other half is spent on our previous agenda. The abstract reads:

We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers.

Open problems surveyed in this research proposal include: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? We discuss these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.

Co-authored by Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch, our new report discusses eight new lines of research (previously summarized here). Below, I’ll explain the rationale behind these problems, as well as how they tie in to our old research agenda and to the new “Concrete problems in AI safety” agenda spearheaded by Dario Amodei and Chris Olah of Google Brain.

Read more »

Submission to the OSTP on AI outcomes

 |   |  News

The White House Office of Science and Technology Policy recently put out a request for information on “(1) The legal and governance implications of AI; (2) the use of AI for public good; (3) the safety and control issues for AI; (4) the social and economic implications of AI;” and a variety of related topics. I’ve reproduced MIRI’s submission to the RfI below:

I. Review of safety and control concerns

AI experts largely agree that AI research will eventually lead to the development of AI systems that surpass humans in general reasoning and decision-making ability. This is, after all, the goal of the field. However, there is widespread disagreement about how long it will take to cross that threshold, and what the relevant AI systems are likely to look like (autonomous agents, widely distributed decision support systems, human/AI teams, etc.).

Despite the uncertainty, a growing subset of the research community expects that advanced AI systems will give rise to a number of foreseeable safety and control difficulties, and that those difficulties can be preemptively addressed by technical research today. Stuart Russell, co-author of the leading undergraduate textbook in AI and professor at U.C. Berkeley, writes:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

Researchers’ worries about the impact of AI in the long term bear little relation to the doomsday scenarios most often depicted in Hollywood movies, in which “emergent consciousness” allows machines to throw off the shackles of their programmed goals and rebel. The concern is rather that such systems may pursue their programmed goals all too well, and that the programmed goals may not match the intended goals, or that the intended goals may have unintended negative consequences.

These challenges are not entirely novel. We can compare them to other principal-agent problems where incentive structures are designed with the hope that blind pursuit of those incentives promotes good outcomes. Historically, principal-agent problems have been difficult to solve even in domains where the people designing the incentive structures can rely on some amount of human goodwill and common sense. Consider the problem of designing tax codes to have reliably beneficial consequences, or the problem of designing regulations that reliably reduce corporate externalities. Advanced AI systems naively designed to optimize some objective function could result in unintended consequences that occur on digital timescales, but without goodwill and common sense to blunt the impact.

Given that researchers don’t know when breakthroughs will occur, and given that there are multiple lines of open technical research that can be pursued today to address these concerns, we believe it is prudent to begin serious work on those technical obstacles to improve the community’s preparedness.

Read more »