Russell and Norvig on Friendly AI

 |   |  Analysis

russell-norvigAI: A Modern Approach is by far the dominant textbook in the field. It is used in 1200 universities, and is currently the 22nd most-cited publication in computer science. Its authors, Stuart Russell and Peter Norvig, devote significant space to AI dangers and Friendly AI in section 26.3, “The Ethics and Risks of Developing Artificial Intelligence.”

The first 5 risks they discuss are:

  • People might lose their jobs to automation.
  • People might have too much (or too little) leisure time.
  • People might lose their sense of being unique.
  • AI systems might be used toward undesirable ends.
  • The use of AI systems might result in a loss of accountability.

Each of those sections is one or two paragraphs long. The final subsection, “The Success of AI might mean the end of the human race,” is given 3.5 pages. Here’s a snippet:

The question is whether an AI system poses a bigger risk than traditional software. We will look at three sources of risk. First, the AI system’s state estimation may be incorrect, causing it to do the wrong thing. For example… a missile defense system might erroneously detect an attack and launch a counterattack, leading to the death of billions…

Second, specifying the right utility function for an AI system to maximize is not so easy. For example, we might propose a utility function designed to minimize human suffering, expressed as an additive reward function over time… Given the way humans are, however, we’ll always find a way to suffer even in paradise; so the optimal decision for the AI system is to terminate the human race as soon as possible – no humans, no suffering…

Third, the AI system’s learning function may cause it to evolve into a system with unintended behavior. This scenario is the most serious, and is unique to AI systems, so we will cover it in more depth. I.J. Good wrote (1965),

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then be unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

Russell and Norvig then mention Moravec and Kurzweil’s writings, before returning to a more concerned tone about AI. They cover Asimov’s three laws of robotics, and then:

Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design – to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. We can’t just give a program a static utility function, because circumstances, and our desired responses to circumstances, change over time. For example, if technology had allowed us to design a super-powerful AI agent in 1800 and endow it with the prevailing morals of the time, it would be fighting today to reestablish slavery and abolish women’s right to vote. On the other hand, if we build an AI agent today and tell it how to evolve its utility function, how can we assure that it won’t read that “Humans think it is moral to kill annoying insects, in part because insect brains are so primitive. But human brains are primitive compared to my powers, so it must be moral for me to kill humans.”

Omohundro (2008) hypothesizes that even an innocuous chess program could pose a risk to society. Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal. The moral is that even if you only want your program to play chess or prove theorems, if you give it the capability to learn and alter itself, you need safeguards.

We are happy to see MIRI’s work getting such mainstream academic exposure.

Readers may also be interested to learn that Russell organized a panel on AI impacts at the IJCAI-13 conference. Russell’s own slides from that panel are here. The other panel participants were Henry Kautz (slides), Joanna Bryson (slides), Anders Sandberg (slides), and Sebastian Thrun (slides).