News and links
As previously announced, we recently ran a 22-day Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office, co-hosted with the Oxford Future of Humanity Institute. The colloquium was aimed at bringing together safety-conscious AI scientists from academia and industry to share their recent work. The event served that purpose well, initiating some new collaborations and a number of new conversations between researchers who hadn’t interacted before or had only talked remotely.
Over 50 people attended from 25 different institutions, with an average of 15 people present on any given talk or workshop day. In all, there were 17 talks and four weekend workshops on the topics of transparency, robustness and error-tolerance, preference specification, and agent models and multi-agent dilemmas. The full schedule and talk slides are available on the event page. Videos from the first day of the event are now available, and we’ll be posting the rest of the talks online soon:
Stuart Russell, professor of computer science at UC Berkeley and co-author of Artificial Intelligence: A Modern Approach, gave the opening keynote. Russell spoke on “AI: The Story So Far” (slides). Abstract:
I will discuss the need for a fundamental reorientation of the field of AI towards provably beneficial systems. This need has been disputed by some, and I will consider their arguments. I will also discuss the technical challenges involved and some promising initial results.
Russell discusses his recent work on cooperative inverse reinforcement learning 36 minutes in. This paper and Dylan Hadfield-Menell’s related talk on corrigibility (slides) inspired lots of interest and discussion at CSRBAI.
As Luke had done in years past (see 2013 in review and 2014 in review), I (Malo) wanted to take some time to review our activities from last year. In the coming weeks Nate will provide a big-picture strategy update. Here, I’ll take a look back at 2015, focusing on our research progress, academic and general outreach, fundraising, and other activities.
After seeing signs in 2014 that interest in AI safety issues was on the rise, we made plans to grow our research team. Fueled by the response to Bostrom’s Superintelligence and the Future of Life Institute’s “Future of AI” conference, interest continued to grow in 2015. This suggested that we could afford to accelerate our plans, but it wasn’t clear how quickly.
In 2015 we did not release a mid-year strategic plan, as Luke did in 2014. Instead, we laid out various conditional strategies dependent on how much funding we raised during our 2015 Summer Fundraiser. The response was great; we had our most successful fundraiser to date. We hit our first two funding targets (and then some), and set out on an accelerated 2015/2016 growth plan.
As a result, 2015 was a big year for MIRI. After publishing our technical agenda at the start of the year, we made progress on many of the open problems it outlined, doubled the size of our core research team, strengthened our connections with industry groups and academics, and raised enough funds to maintain our growth trajectory. We’re very grateful to all our supporters, without whom this progress wouldn’t have been possible.
MIRI’s research to date has focused on the problems that we laid out in our late 2014 research agenda, and in particular on formalizing optimal reasoning for bounded, reflective decision-theoretic agents embedded in their environment. Our research team has since grown considerably, and we have made substantial progress on this agenda, including a major breakthrough in logical uncertainty that we will be announcing in the coming weeks.
Today we are announcing a new research agenda, “Alignment for advanced machine learning systems.” Going forward, about half of our time will be spent on this new agenda, while the other half is spent on our previous agenda. The abstract reads:
We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers.
Open problems surveyed in this research proposal include: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? We discuss these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.
Co-authored by Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch, our new report discusses eight new lines of research (previously summarized here). Below, I’ll explain the rationale behind these problems, as well as how they tie in to our old research agenda and to the new “Concrete problems in AI safety” agenda spearheaded by Dario Amodei and Chris Olah of Google Brain.
The White House Office of Science and Technology Policy recently put out a request for information on “(1) The legal and governance implications of AI; (2) the use of AI for public good; (3) the safety and control issues for AI; (4) the social and economic implications of AI;” and a variety of related topics. I’ve reproduced MIRI’s submission to the RfI below:
I. Review of safety and control concerns
AI experts largely agree that AI research will eventually lead to the development of AI systems that surpass humans in general reasoning and decision-making ability. This is, after all, the goal of the field. However, there is widespread disagreement about how long it will take to cross that threshold, and what the relevant AI systems are likely to look like (autonomous agents, widely distributed decision support systems, human/AI teams, etc.).
Despite the uncertainty, a growing subset of the research community expects that advanced AI systems will give rise to a number of foreseeable safety and control difficulties, and that those difficulties can be preemptively addressed by technical research today. Stuart Russell, co-author of the leading undergraduate textbook in AI and professor at U.C. Berkeley, writes:
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.
Researchers’ worries about the impact of AI in the long term bear little relation to the doomsday scenarios most often depicted in Hollywood movies, in which “emergent consciousness” allows machines to throw off the shackles of their programmed goals and rebel. The concern is rather that such systems may pursue their programmed goals all too well, and that the programmed goals may not match the intended goals, or that the intended goals may have unintended negative consequences.
These challenges are not entirely novel. We can compare them to other principal-agent problems where incentive structures are designed with the hope that blind pursuit of those incentives promotes good outcomes. Historically, principal-agent problems have been difficult to solve even in domains where the people designing the incentive structures can rely on some amount of human goodwill and common sense. Consider the problem of designing tax codes to have reliably beneficial consequences, or the problem of designing regulations that reliably reduce corporate externalities. Advanced AI systems naively designed to optimize some objective function could result in unintended consequences that occur on digital timescales, but without goodwill and common sense to blunt the impact.
Given that researchers don’t know when breakthroughs will occur, and given that there are multiple lines of open technical research that can be pursued today to address these concerns, we believe it is prudent to begin serious work on those technical obstacles to improve the community’s preparedness.
News and links
- Inspiration for these gyms came in part from Chris Olah and Dario Amodei in a conversation with Rafael. ↩
Future of Humanity Institute Research Fellow Jan Leike and MIRI Research Fellows Jessica Taylor and Benya Fallenstein have just presented new results at UAI 2016 that resolve a longstanding open problem in game theory: “A formal solution to the grain of truth problem.”
Game theorists have techniques for specifying agents that eventually do well on iterated games against other agents, so long as their beliefs contain a “grain of truth” — nonzero prior probability assigned to the actual game they’re playing. Getting that grain of truth was previously an unsolved problem in multiplayer games, because agents can run into infinite regresses when they try to model agents that are modeling them in turn. This result shows how to break that loop: by means of reflective oracles.
In the process, Leike, Taylor, and Fallenstein provide a rigorous and general foundation for the study of multi-agent dilemmas. This work provides a surprising and somewhat satisfying basis for approximate Nash equilibria in repeated games, folding a variety of problems in decision and game theory into a common framework.
The paper’s abstract reads:
A Bayesian agent acting in a multi-agent environment learns to predict the other agents’ policies if its prior assigns positive probability to them (in other words, its prior contains a grain of truth). Finding a reasonably large class of policies that contains the Bayes-optimal policies with respect to this class is known as the grain of truth problem. Only small classes are known to have a grain of truth and the literature contains several related impossibility results.
In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically. However, agents based on Thompson sampling converge to play ε-Nash equilibria in arbitrary unknown computable multi-agent environments. While these results are purely theoretical, we show that they can be computationally approximated arbitrarily closely.
Traditionally, when modeling computer programs that model the properties of other programs (such as when modeling an agent reasoning about a game), the first program is assumed to have access to an oracle (such as a halting oracle) that can answer arbitrary questions about the second program. This works, but it doesn’t help with modeling agents that can reason about each other.
While a halting oracle can predict the behavior of any isolated Turing machine, it cannot predict the behavior of another Turing machine that has access to a halting oracle. If this were possible, the second machine could use its oracle to figure out what the first machine-oracle pair thinks it will do, at which point it can do the opposite, setting up a liar paradox scenario. For analogous reasons, two agents with similar resources, operating in real-world environments without any halting oracles, cannot perfectly predict each other in full generality.
Game theorists know how to build formal models of asymmetric games between a weaker player and a stronger player, where the stronger player understands the weaker player’s strategy but not vice versa. For the reasons above, however, games between agents of similar strength have resisted full formalization. As a consequence of this, game theory has until now provided no method for designing agents that perform well on complex iterated games containing other agents of similar strength.
News and links