The Machine Intelligence Research Institute is a research nonprofit studying the mathematical underpinnings of intelligent behavior. Our mission is to develop formal tools for the clean design and analysis of general-purpose AI systems, with the intent of making such systems safer and more reliable when they are developed.
The field of AI has a reputation for overselling its progress. In the “AI winters” of the late 1970s and 1980s, researchers’ failures to make good on ambitious promises led to a collapse of funding and interest in AI. Although the field is now undergoing a renaissance, overconfidence is still a major fear; discussion of the possibility of human-equivalent general intelligence is still largely relegated to the science fiction shelf.
At the same time, researchers largely agree that AI is likely to begin outperforming humans on most cognitive tasks in this century. Given how disruptive domain-general AI could be, we think it is prudent to begin a conversation about this now, and to investigate whether there are limited areas in which we can predict and shape this technology’s societal impact.
Researchers at MIRI tend to be relatively agnostic about how the state of the art in AI will change over the coming decades, and how many years off smarter-than human AI systems are. However, we think some qualitative predictions are possible:
— As perception, inference, and planning algorithms improve, AI systems will be trusted with increasingly complex and long-term decision-making. Small errors will then have larger consequences.
— Realistic goals and environments for general reasoning systems will be too complex for programmers to directly specify. AI systems will instead need to inductively learn correct goals and environmental models.
— Systems that end up with poor models of their environment can do significant harm. However, poor models limit how well a planning system can control its environment, which limits the expected harm.
— There are fewer obvious constraints on the harm a system with poorly specified goals might do. In particular, an autonomous system that learns about human goals, but is not correctly designed to align its own goals to its best model of human goals, could cause catastrophic harm in the absence of adequate checks.
— AI systems’ goals or world-models may be brittle, exhibiting exceptionally good behavior until some seemingly irrelevant environmental variable changes. This is again a larger concern for incorrect goals than for incorrect belief and inference, because incorrect goals don’t limit the capability of an otherwise high-intelligence system.
Stuart Russell, a MIRI research advisor and co-author of the leading textbook on artificial intelligence, argues in “The Long-Term Future of Artificial Intelligence” that we should integrate questions of robustness and safety into mainstream capabilities research:
Our goal as a field is to make better decision-making systems. And that is the problem. […If] you’re going to build a superintelligent machine, you have to give it something that you want it to do. The danger is that you give it something that isn’t actually what you really want — because you’re not very good at expressing what you really want, or even knowing what you really want — until it’s too late and you see that you don’t like it.
If you think about it just in terms of an optimization problem: The machine is solving an optimization problem for you, and you leave out some of the variables that you actually care about. Well, it’s in the nature of optimization problems that if the system gets to manipulate some variables that don’t form part of the objective function — so it’s free to play with those as much as it wants — often, in order to optimize the ones that it is supposed to optimize, it will set the other ones to extreme values.
My proposal is that we should stop doing AI in its simple definition of just improving the decision-making capabilities of systems. […] With civil engineering, we don’t call it “building bridges that don’t fall down” — we just call it “building bridges.” Of course we don’t want them to fall down. And we should think the same way about AI: of course AI systems should be designed so that their actions are well-aligned with what human beings want. But it’s a difficult unsolved problem that hasn’t been part of the research agenda up to now.
We want to change the field so that it feels like civil engineering or like nuclear fusion. [… We] created a hydrogen bomb explosion — unlimited amounts of energy, more than we could possibly use. But it wasn’t in a socially beneficial form. And now it’s just what fusion researchers do — containment is what fusion research is. That’s the problem that they work on.
In line with Russell’s talk, MIRI’s work is aimed at helping jump-start a paradigm of AI research that is conscious of the field’s long-term impact. Our methodology is to break down the alignment problem into simpler and more precisely stated subproblems, develop basic mathematical theory for understanding these problems, and then make use of our newfound understanding in engineering applications.
Resources for Learning More
Two of the best nontechnical introductions to the problem of AI goal alignment are Stuart Armstrong’s short and lively Smarter Than Us and Nick Bostrom’s more in-depth monograph Superintelligence:
“Intelligent search for instrumentally optimal plans and policies can be performed in the service of any goal. Intelligence and motivation are in a sense orthogonal: we can think of them as two axes spanning a graph in which each point represents a logically possible artificial agent.”
“[T]he orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans[.]”
For more technical details on MIRI’s research focus, see:
Lastly, for general information: