Russell and Norvig on Friendly AI

 |   |  Analysis

russell-norvigAI: A Modern Approach is by far the dominant textbook in the field. It is used in 1200 universities, and is currently the 22nd most-cited publication in computer science. Its authors, Stuart Russell and Peter Norvig, devote significant space to AI dangers and Friendly AI in section 26.3, “The Ethics and Risks of Developing Artificial Intelligence.”

The first 5 risks they discuss are:

  • People might lose their jobs to automation.
  • People might have too much (or too little) leisure time.
  • People might lose their sense of being unique.
  • AI systems might be used toward undesirable ends.
  • The use of AI systems might result in a loss of accountability.

Each of those sections is one or two paragraphs long. The final subsection, “The Success of AI might mean the end of the human race,” is given 3.5 pages. Here’s a snippet:

The question is whether an AI system poses a bigger risk than traditional software. We will look at three sources of risk. First, the AI system’s state estimation may be incorrect, causing it to do the wrong thing. For example… a missile defense system might erroneously detect an attack and launch a counterattack, leading to the death of billions…

Second, specifying the right utility function for an AI system to maximize is not so easy. For example, we might propose a utility function designed to minimize human suffering, expressed as an additive reward function over time… Given the way humans are, however, we’ll always find a way to suffer even in paradise; so the optimal decision for the AI system is to terminate the human race as soon as possible – no humans, no suffering…

Third, the AI system’s learning function may cause it to evolve into a system with unintended behavior. This scenario is the most serious, and is unique to AI systems, so we will cover it in more depth. I.J. Good wrote (1965),

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then be unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

Read more »

Richard Posner on AI Dangers

 |   |  Analysis

PosnerRichard Posner is a jurist, legal theorist, and economist. He is also the author of nearly 40 books, and is by far the most-cited legal scholar of the 20th century.

In 2004, Posner published Catastrophe: Risk and Response, in which he discusses risks from AGI at some length. His analysis is interesting in part because it appears to be intellectually independent from the Bostrom-Yudkowsky tradition that dominates the topic today.

In fact, Posner does not appear to be aware of earlier work on the topic by I.J. Good (19701982), Ed Fredkin (1979), Roger Clarke (1993, 1994), Daniel Weld & Oren Etzioni (1994), James Gips (1995), Blay Whitby (1996), Diana Gordon (2000), Chris Harper (2000), or Colin Allen (2000). He is not even aware of Hans Moravec (1990, 1999), Bill Joy (2000), Nick Bostrom (1997; 2003), or Eliezer Yudkowsky (2001). Basically, he seems to know only of Ray Kurzweil (1999).

Still, much of Posner’s analysis is consistent with the basic points of the Bostrom-Yudkowsky tradition:

[One class of catastrophic risks] consists of… scientific accidents, for example accidents involving particle accelerators, nanotechnology…, and artificial intelligence. Technology is the cause of these risks, and slowing down technology may therefore be the right response.

…there may some day, perhaps some day soon (decades, not centuries, hence), be robots with human and [soon thereafter] more than human intelligence…

…Human beings may turn out to be the twenty-first century’s chimpanzees, and if so the robots may have as little use and regard for us as we do for our fellow, but nonhuman, primates…

…A robot’s potential destructiveness does not depend on its being conscious or able to engage in [e.g. emotional processing]… Unless carefully programmed, the robots might prove indiscriminately destructive and turn on their creators.

…Kurzweil is probably correct that “once a computer achieves a human level of intelligence, it will necessarily roar past it”…

One major point of divergence seems to be that Posner worries about a scenario in which AGIs become self-aware, re-evaluate their goals, and decide not to be “bossed around by a dumber species” anymore. In contrast, Bostrom and Yudkowsky think AGIs will be dangerous not because they will “rebel” against humans, but because (roughly) using all available resources — including those on which human life depends — is a convergent instrumental goal for almost any set of final goals a powerful AGI might possess. (See e.g. Bostrom 2012.)

Ben Goertzel on AGI as a Field

 |   |  Conversations

Ben Goertzel portrait Dr. Ben Goertzel is Chief Scientist of financial prediction firm Aidyia Holdings; Chairman of AI software company Novamente LLC and bioinformatics company Biomind LLC; Chairman of the Artificial General Intelligence Society and the OpenCog Foundation; Vice Chairman of futurist nonprofit Humanity+; Scientific Advisor of biopharma firm Genescient Corp.; Advisor to the Singularity University and MIRI; Research Professor in the Fujian Key Lab for Brain-Like Intelligent Systems at Xiamen University, China; and general Chair of the Artificial General Intelligence conference series. His research work encompasses artificial general intelligence, natural language processing, cognitive science, data mining, machine learning, computational finance, bioinformatics, virtual worlds and gaming and other areas. He has published a dozen scientific books, 100+ technical papers, and numerous journalistic articles. Before entering the software industry he served as a university faculty in several departments of mathematics, computer science and cognitive science, in the US, Australia and New Zealand. He has three children and too many pets, and in his spare time enjoys creating avant-garde fiction and music, and exploring the outdoors.

Read more »

MIRI’s October Newsletter

 |   |  Newsletters

Greetings from the Executive Director

Dear friends,

The big news this month is that Paul Christiano and Eliezer Yudkowsky are giving talks at Harvard and MIT about the work coming out of MIRI’s workshops, on Oct. 15th and 17th, respectively (details below).

Meanwhile we’ve been planning future workshops and preparing future publications. Our experienced document production team is also helping to prepare Nick Bostrom‘s Superintelligence book for publication. It’s a very good book, and should be released by Oxford University Press in mid-2014.

By popular demand, MIRI research fellow Eliezer Yudkowsky now has a few “Yudkowskyisms” available on t-shirts, at Rational Attire. Thanks to Katie Hartman and Michael Keenan for setting this up.

Cheers,

Luke Muehlhauser
Executive Director

Upcoming Talks at Harvard and MIT

If you live near Boston, you’ll want to come see Eliezer Yudkowsky give a talk about MIRI’s research program in the spectacular Stata building on the MIT campus, on October 17th.

His talk is titled Recursion in rational agents: Foundations for self-modifying AI. There will also be a party the next day in MIT’s Building 6, with Yudkowsky in attendance.

Two days earlier, Paul Christiano will give a technical talk to a smaller audience about on of the key results from MIRI’s research workshops thus far. This talk is titled Probabilistic metamathematics and the definability of truth.

For more details on both talks, see the blog post here.

Read more »

Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness

 |   |  Analysis

encryptionIn 1979, Michael Rabin proved that his encryption system could be inverted — so as to decrypt the encrypted message — only if an attacker could factor n. And since this factoring task is computationally hard for any sufficiently large n, Rabin’s encryption scheme was said to be “provably secure” so long as one used a sufficiently large n.

Since then, creating encryption algorithms with this kind of “provable security” has been a major goal of cryptography,1 and new encryption algorithms that meet these criteria are sometimes marketed as “provably secure.”

Unfortunately, the term “provable security” can be misleading,2 for several reasons3.

Read more »


  1. An encryption system is said to be provably secure if its security requirements are stated formally, and proven to be satisfied by the system, as was the case with Rabin’s system. See Wikipedia
  2. Security reductions can still be useful (Damgård 2007). My point is just that term “provable security” can be misleading, especially to non-experts. 
  3. For more details, and some additional problems with the term “provable security,” see Koblitz & Menezes’ Another Look website and its linked articles, especially Koblitz & Menezes (2010)

Upcoming Talks at Harvard and MIT

 |   |  News

Paul & EliezerOn October 15th from 4:30-5:30pm, MIRI workshop participant Paul Christiano will give a technical talk at the Harvard University Science Center, room 507, as part of the Logic at Harvard seminar and colloquium.

Christiano’s title and abstract are:

Probabilistic metamathematics and the definability of truth

No model M of a sufficiently expressive theory can contain a truth predicate T such that for all S, M |= T(“S”) if and only if M |= S. I’ll consider the setting of probabilistic logic, and show that there are probability distributions over models which contain an “objective probability function” P such that M |= a < P(“S”) < b almost surely whenever a < P(M |= S) < b. This demonstrates that a probabilistic analog of a truth predicate is possible as long as we allow infinitesimal imprecision. I’ll argue that this result significantly undercuts the philosophical significance of Tarski’s undefinability theorem, and show how the techniques involved might be applied more broadly to resolve obstructions due to self-reference.

Stata CenterThen, on October 17th from 4:00-5:30pm, Scott Aaronson will host a talk by MIRI research fellow Eliezer Yudkowsky.

Yudkowsky’s talk will be somewhat more accessible than Christiano’s, and will take place in MIT’s Ray and Maria Stata Center (see image on right), in room 32-123 (aka Kirsch Auditorium, with 318 seats). There will be light refreshments 15 minutes before the talk. Yudkowsky’s title and abstract are:

Recursion in rational agents: Foundations for self-modifying AI

Reflective reasoning is a familiar but formally elusive aspect of human cognition. This issue comes to the forefront when we consider building AIs which model other sophisticated reasoners, or who might design other AIs which are as sophisticated as themselves. Mathematical logic, the best-developed contender for a formal language capable of reflecting on itself, is beset by impossibility results. Similarly, standard decision theories begin to produce counterintuitive or incoherent results when applied to agents with detailed self-knowledge. In this talk I will present some early results from workshops held by the Machine Intelligence Research Institute to confront these challenges.

The first is a formalization and significant refinement of Hofstadter’s “superrationality,” the (informal) idea that ideal rational agents can achieve mutual cooperation on games like the prisoner’s dilemma by exploiting the logical connection between their actions and their opponent’s actions. We show how to implement an agent which reliably outperforms classical game theory given mutual knowledge of source code, and which achieves mutual cooperation in the one-shot prisoner’s dilemma using a general procedure. Using a fast algorithm for finding fixed points, we are able to write implementations of agents that perform the logical interactions necessary for our formalization, and we describe empirical results.

Second, it has been claimed that Godel’s second incompleteness theorem presents a serious obstruction to any AI understanding why its own reasoning works or even trusting that it does work. We exhibit a simple model for this situation and show that straightforward solutions to this problem are indeed unsatisfactory, resulting in agents who are willing to trust weaker peers but not their own reasoning. We show how to circumvent this difficulty without compromising logical expressiveness.

Time permitting, we also describe a more general agenda for averting self-referential difficulties by replacing logical deduction with a suitable form of probabilistic inference. The goal of this program is to convert logical unprovability or undefinability into very small probabilistic errors which can be safely ignored (and may even be philosophically justified).

Also, on Oct 18th at 7pm there will be a Less Wrong / Methods of Rationality meetup/party on the MIT campus in Building 6, room 120. There will be snacks and refreshments, and Yudkowsky will be in attendance.

Paul Rosenbloom on Cognitive Architectures

 |   |  Conversations

Paul Rosenbloom portrait Paul S. Rosenbloom is Professor of Computer Science at the University of Southern California and a project leader at USC’s Institute for Creative Technologies. He was a key member of USC’s Information Sciences Institute for two decades, leading new directions activities over the second decade, and finishing his time there as Deputy Director. Earlier he was on the faculty at Carnegie Mellon University (where he had also received his MS and PhD in computer science) and Stanford University (where he had also received his BS in mathematical sciences with distinction).

His research concentrates on cognitive architectures – models of the fixed structure underlying minds, whether natural or artificial – and on understanding the nature, structure and stature of computing as a scientific domain.  He is a AAAI Fellow, the co-developer of Soar (one of the longest standing and most well developed cognitive architectures), the primary developer of Sigma (which blends insights from earlier architectures such as Soar with ideas from graphical models), and the author of On Computing: The Fourth Great Scientific Domain (MIT Press, 2012).

Read more »

Effective Altruism and Flow-Through Effects

 |   |  Conversations

Last month, MIRI research fellow Carl Shulman1 participated in a recorded debate/conversation about effective altruism and flow-through effects. This issue is highly relevant to MIRI’s mission, since MIRI focuses on activities that are intended to produce altruistic value via their flow-through effects on the invention of AGI.

The conversation (mp3, transcript) included:

Recommended background reading includes:

To summarize the conversation very briefly: All participants seemed to agree that more research on flow-through effects would be high value. However, there’s a risk that such research isn’t highly tractable. For now, GiveWell will focus on other projects that seem more tractable. Rob Wiblin might try to organize some research on flow-through effects, to learn how tractable it is.


  1. Carl was a MIRI research fellow at the time of the conversation, but left MIRI at the end of August 2013 to study computer science.