Responses to Catastrophic AGI Risk: A Survey

 |   |  Papers

MIRI is self-publishing another technical report that was too lengthy (60 pages) for publication in a journal: Responses to Catastrophic AGI Risk: A Survey.

The report, co-authored by past MIRI researcher Kaj Sotala and University of Louisville’s Roman Yampolskiy, is a summary of the extant literature (250+ references) on AGI risk, and can serve either as a guide for researchers or as an introduction for the uninitiated.

Here is the abstract:

Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next twenty to one hundred years. It has been suggested that AGI may pose a catastrophic risk to humanity. After summarizing the arguments for why AGI may pose such a risk, we survey the field’s proposed responses to AGI risk. We consider societal proposals, proposals for external constraints on AGI behaviors, and proposals for creating AGIs that are safe due to their internal design.

The preferred discussion page for the paper is here.

Update: This report has now been published in Physica Scripta, available here.

What is Intelligence?

 |   |  Analysis

When asked their opinions about “human-level artificial intelligence” — aka “artificial general intelligence” (AGI)1 — many experts understandably reply that these terms haven’t yet been precisely defined, and it’s hard to talk about something that hasn’t been defined.2 In this post, I want to briefly outline an imprecise but useful “working definition” for intelligence we tend to use at MIRI. In a future post I will write about some useful working definitions for artificial general intelligence.


Imprecise definitions can be useful

Precise definitions are important, but I concur with Bertrand Russell that

[You cannot] start with anything precise. You have to achieve such precision… as you go along.

Physicist Milan Ćirković agrees, and gives an example:

The formalization of knowledge — which includes giving precise definitions — usually comes at the end of the original research in a given field, not at the very beginning. A particularly illuminating example is the concept of number, which was properly defined in the modern sense only after the development of axiomatic set theory in the… twentieth century.3

For a more AI-relevant example, consider the concept of a “self-driving car,” which has been given a variety of vague definitions since the 1930s. Would a car guided by a buried cable qualify? What about a modified 1955 Studebaker that could use sound waves to detect obstacles and automatically engage the brakes if necessary, but could only steer “on its own” if each turn was preprogrammed? Does that count as a “self-driving car”?

What about the “VaMoRs” of the 1980s that could avoid obstacles and steer around turns using computer vision, but weren’t advanced enough to be ready for public roads? How about the 1995 Navlab car that drove across the USA and was fully autonomous for 98.2% of the trip, or the robotic cars which finished the 132-mile off-road course of the 2005 DARPA Grand Challenge, supplied only with the GPS coordinates of the route? What about the winning cars of the 2007 DARPA Grand Challenge, which finished an urban race while obeying all traffic laws and avoiding collisions with other cars? Does Google’s driverless car qualify, given that it has logged more than 500,000 autonomous miles without a single accident under computer control, but still struggles with difficult merges and snow-covered roads?4

Our lack of a precise definition for “self-driving car” doesn’t seem to have hindered progress on self-driving cars very much.5 And I’m glad we didn’t wait to seriously discuss self-driving cars until we had a precise definition for the term.

Similarly, I don’t think we should wait for a precise definition of AGI before discussing the topic seriously. On the other hand, the term is useless if it carries no information. So let’s work our way toward a stipulative, operational definition for AGI. We’ll start by developing an operational definition for intelligence.

Read more »

  1. I use the HLAI and AGI interchangeably, but lately I’ve been using AGI almost exclusively, because I’ve learned that many people in the AI community react negatively to any mention of “human-level” AI but have no objection to the concept of narrow vs. general intelligence. See also Ben Goertzel’s comments here
  2. Asked when he thought HLAI would be created, Pat Hayes (a past president of AAAI) replied: “I do not consider this question to be answerable, as I do not accept this (common) notion of ‘human-level intelligence’ as meaningful.” Asked the same question, AI scientist William Uther replied: “You ask a lot about ‘human level AGI’. I do not think this term is well defined,” while AI scientist Alan Bundy replied: “I don’t think the concept of ‘human-level machine intelligence’ is well formed.” 
  3. Sawyer (1943) gives another example: “Mathematicians first used the sign √-1, without in the least knowing what it could mean, because it shortened work and led to correct results. People naturally tried to find out why this happened and what √-1 really meant. After two hundreds years they succeeded.” Dennett (2013) makes a related comment: “Define your terms, sir! No, I won’t. That would be premature… My [approach] is an instance of nibbling on a tough problem instead of trying to eat (and digest) the whole thing from the outset… In Elbow Room, I compared my method to the sculptor’s method of roughing out the form in a block of marble, approaching the final surfaces cautiously, modestly, working by successive approximation.” 
  4. With self-driving cars, researchers did use many precise external performance measures (e.g. accident rates, speed, portion of the time they could run unassisted, frequency of getting stuck) to evaluate progress, as well as internal performance metrics (speed of search, bounded loss guarantees, etc.). Researchers could see that these bits of progress were in the right direction, even if their relative contribution long-term was unclear. And so it is with AI in general. AI researchers use many precise external and internal performance measures to evaluate progress, but it is difficult to know the relative contribution of these bits of progress toward the final goal of AGI. 
  5. Heck, we’ve had pornography for millennia and still haven’t been able to define it precisely. Encyclopedia entries for “pornography” often simply quote Justice Potter Stewart: “I shall not today attempt further to define the kinds of material I understand to be [pornography]… but I know it when I see it.” 

MIRI’s July 2013 Workshop

 |   |  News

Mihaly at April workshop

From July 8-14, MIRI will host its 3rd Workshop on Logic, Probability, and Reflection. The focus of this workshop will be the Löbian obstacle to self-modifying systems.

Participants confirmed so far include:

If you have a strong mathematics background and might like to attend this workshop, it’s not too late to apply! And even if this workshop doesn’t fit your schedule, please do apply, so that we can notify you of other workshops (long before they are announced publicly).

Information on past workshops:

New Research Page and Two New Articles

 |   |  Papers

research page

Our new Research page has launched!

Our previous research page was a simple list of articles, but the new page describes the purpose of our research, explains four categories of research to which we contribute, and highlights the papers we think are most important to read.

We’ve also released drafts of two new research articles.

Tiling Agents for Self-Modifying AI, and the Löbian Obstacle (discuss it here), by Yudkowsky and Herreshoff, explains one of the key open problems in MIRI’s research agenda:

We model self-modification in AI by introducing “tiling” agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring’s goals). Constructing a formalism in the most straightforward way produces a Gödelian difficulty, the “Löbian obstacle.” By technical methods we demonstrates the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed. We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic (discuss it here), by LaVictoire et al., explains some progress in program equilibrium made by MIRI research associate Patrick LaVictoire and several others during MIRI’s April 2013 workshop:

Rational agents defect on the one-shot prisoner’s dilemma even though mutual cooperation would yield higher utility for both agents. Moshe Tennenholtz showed that if each program is allowed to pass its playing strategy to all other players, some programs can then cooperate on the one-shot prisoner’s dilemma. Program equilibria is Tennenholtz’s term for Nash equilibria in a context where programs can pass their playing strategies to the other players.

One weakness of this approach so far has been that any two programs which make different choices cannot “recognize” each other for mutual cooperation, even if they are functionally identical. In this paper, provability logic is used to enable a more flexible and secure form of mutual cooperation.

Participants of MIRI’s April workshop also made progress on Christiano’s probabilistic logic (an attack on the Löbian obstacle), but that work is not yet ready to be released.

We’ve also revamped the Get Involved page, which now includes an application form for forthcoming workshops. If you might like to work with MIRI on some of its open research problems sometime in the next 18 months, please apply! Likewise, if you know someone who might enjoy attending such a workshop, please encourage them to apply.

Friendly AI Research as Effective Altruism

 |   |  Analysis

MIRI was founded in 2000 on the premise that creating1 Friendly AI might be a particularly efficient way to do as much good as possible.

Some developments since then include:

  • The field of “effective altruism” — trying not just to do good but to do as much good as possible2 — has seen more publicity and better research than ever before, in particular through the work of GiveWell, the Center for Effective Altruism, the philosopher Peter Singer, and the community at Less Wrong.3
  • In his recent PhD dissertation, Nick Beckstead has clarified the assumptions behind the claim that shaping the far future (e.g. via Friendly AI) is overwhelmingly important.
  • Due to research performed by MIRI, the Future of Humanity Institute (FHI), and others, our strategic situation with regard to machine superintelligence is more clearly understood, and FHI’s Nick Bostrom has organized much of this work in a forthcoming book.4
  • MIRI’s Eliezer Yudkowsky has begun to describe in more detail which open research problems constitute “Friendly AI research,” in his view.

Given these developments, we are in a better position than ever before to assess the value of Friendly AI research as effective altruism.

Still, this is a difficult question. It is challenging enough to evaluate the cost-effectiveness of anti-malaria nets or direct cash transfers. Evaluating the cost-effectiveness of attempts to shape the far future (e.g. via Friendly AI) is even more difficult than that. Hence, this short post sketches an argument that can be given in favor of Friendly AI research as effective altruism, to enable future discussion, and is not intended as a thorough analysis.

Read more »

  1. In this post, I talk about the value of humanity in general creating Friendly AI, though MIRI co-founder Eliezer Yudkowsky usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI. This is because I am not as confident as Yudkowsky that it is best for MIRI to attempt to build Friendly AI. When updating MIRI’s bylaws in early 2013, Yudkowsky and I came to a compromise on the language of MIRI’s mission statement, which now reads: “[MIRI] exists to ensure that the creation of smarter-than-human intelligence has a positive impact. Thus, the charitable purpose of [MIRI] is to: (a) perform research relevant to ensuring that smarter-than-human intelligence has a positive impact; (b) raise awareness of this important issue; (c) advise researchers, leaders and laypeople around the world; and (d) as necessary, implement a smarter-than-human intelligence with humane, stable goals” (emphasis added). My own hope is that it will not be necessary for MIRI (or a functional equivalent) to attempt to build Friendly AI itself. But of course I must remain open to the possibility that this will be the wisest course of action as the first creation of AI draws nearer. There is also the question of capability: few people think that a non-profit research organization has much chance of being the first to build AI. I worry, however, that the world’s elites will not find it fashionable to take this problem seriously until the creation of AI is only a few decades away, at which time it will be especially difficult to develop the mathematics of Friendly AI in time, and humanity will be forced to take a gamble on its very survival with powerful AIs we have little reason to trust. 
  2. One might think of effective altruism as a straightforward application of decision theory to the subject of philanthropy. Philanthropic agents of all kinds (individuals, groups, foundations, etc.) ask themselves: “How can we choose philanthropic acts (e.g. donations) which (in expectation) will do as much good as possible, given what we care about?” The consensus recommendation for all kinds of choices under uncertainty, including philanthropic choices, is to maximize expected utility (Chater & Oaksford 2012; Peterson 2004; Stein 1996; Schmidt 1998:19). Different philanthropic agents value different things, but decision theory suggests that each of them can get the most of what they want if they each maximize their expected utility. Choices which maximize expected utility are in this sense “optimal,” and thus another term for effective altruism is “optimal philanthropy.” Note that effective altruism in this sense is not too dissimilar from earlier approaches to philanthropy, including high-impact philanthropy (making “the biggest difference possible, given the amount of capital invested“), strategic philanthropy, effective philanthropy, and wise philanthropy. Note also that effective altruism does not say that a philanthropic agent should specify complete utility and probability functions over outcomes and then compute the philanthropic act with the highest expected utility — that is impractical for bounded agents. We must keep in mind the distinction between normative, descriptive, and prescriptive models of decision-making (Baron 2007): “normative models tell us how to evaluate… decisions in terms of their departure from an ideal standard. Descriptive models specify what people in a particular culture actually do and how they deviate from the normative models. Prescriptive models are designs or inventions, whose purpose is to bring the results of actual thinking into closer conformity to the normative model.” The prescriptive question — about what bounded philanthropic agents should do to maximize expected utility with their philanthropic choices — tends to be extremely complicated, and is the subject of most of the research performed by the effective altruism community. 
  3. See, for example: Efficient Charity, Efficient Charity: Do Unto Others, Politics as Charity, Heuristics and Biases in Charity, Public Choice and the Altruist’s Burden, On Charities and Linear Utility, Optimal Philanthropy for Human Beings, Purchase Fuzzies and Utilons Separately, Money: The Unit of Caring, Optimizing Fuzzies and Utilons: The Altruism Chip Jar, Efficient Philanthropy: Local vs. Global Approaches, The Effectiveness of Developing World Aid, Against Cryonics & For Cost-Effective Charity, Bayesian Adjustment Does Not Defeat Existential Risk Charity, How to Save the World, and What is Optimal Philanthropy? 
  4. I believe Beckstead and Bostrom have done the research community an enormous service in creating a framework, a shared language, for discussing trajectory changes, existential risks, and machine superintelligence. When discussing these topics with my colleagues, it has often been the case that the first hour of conversation is spent merely trying to understand what the other person is saying — how they are using the terms and concepts they employ. Beckstead’s and Bostrom’s recent work should enable clearer and more efficient communication between researchers, and therefore greater research productivity. Though I am not aware of any controlled, experimental studies on the effect of shared language on research productivity, a shared language is widely considered to be of great benefit for any field of research, and I shall provide a few examples of this claim which appear in print. Fuzzi et al. (2006): “The use of inconsistent terms can easily lead to misunderstandings and confusion in the communication between specialists from different [disciplines] of atmospheric and climate research, and may thus potentially inhibit scientific progress.” Hinkel (2008): “Technical languages enable their users, e.g. members of a scientific discipline, to communicate efficiently about a domain of interest.” Madin et al. (2007): “terminological ambiguity slows scientific progress, leads to redundant research efforts, and ultimately impedes advances towards a unified foundation for ecological science.” 

MIRI May Newsletter: Intelligence Explosion Microeconomics and Other Publications

 |   |  Newsletters

Greetings From the Executive Director

Dear friends,

It’s been a busy month!

Mostly, we’ve been busy publishing things. As you’ll see below, Singularity Hypotheses has now been published, and it includes four chapters by MIRI researchers or research associates. We’ve also published two new technical reports — one on decision theory and another on intelligence explosion microeconomics — and several new blog posts analyzing various issues relating to the future of AI. Finally, we added four older articles to the research page, including Ideal Advisor Theories and Personal CEV (2012).

In our April newsletter we spoke about our April 11th party in San Francisco, celebrating our relaunch as the Machine Intelligence Research Institute and our transition to mathematical research. Additional photos from that event are now available as a Facebook photo album. We’ve also uploaded a video from the event, in which I spend 2 minutes explaining MIRI’s relaunch and some tentative results from the April workshop. After that, visiting researcher Qiaochu Yuan spends 4 minutes explaining one of MIRI’s core research questions: the Löbian obstacle to self-modifying systems.

Some of the research from our April workshop will be published in June, so if you’d like to read about those results right away, you might like to subscribe to our blog.


Luke Muehlhauser

Executive Director

Read more »

New Transcript: Yudkowsky and Aaronson

 |   |  News


In When Will AI Be Created?, I referred to a conversation between Eliezer Yudkowsky and Scott Aaronson. A transcript of that dialogue is now available, thanks to MIRI volunteers Ethan Dickinson, Daniel Kokotajlo, and Rick Schwall.

See also the transcript for a conversation between Eliezer Yudkowsky and Massimo Pigliucci.

To join these volunteers in assisting our cause, visit!

Sign up for DAGGRE to improve science & technology forecasting

 |   |  News

In When Will AI Be Created?, I named four methods that might improve our forecasts of AI and other important technologies. Two of these methods were explicit quantification and leveraging aggregation, as exemplified by IARPA’s ACE program, which aims to “dramatically enhance the accuracy, precision, and timeliness of… forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many… analysts.”

GMU’s DAGGRE program, one of five teams participating in ACE, recently announced a transition from geopolitical forecasting to science & technology forecasting:

DAGGRE will continue, but it will transition from geo-political forecasting to science and technology (S&T) forecasting to better use its combinatorial capabilities. We will have a brand new shiny, friendly and informative interface co-designed by Inkling Markets, opportunities for you to provide your own forecasting questions and more!

Another exciting development is that our S&T forecasting prediction market will be open to everyone in the world who is at least eighteen years of age. We’re going global!

If you want to help improve humanity’s ability to forecast important technological developments like AI, please register for DAGGRE’s new S&T prediction website here.

I did.