New Research Page and Two New Articles

 |   |  Papers

research page

Our new Research page has launched!

Our previous research page was a simple list of articles, but the new page describes the purpose of our research, explains four categories of research to which we contribute, and highlights the papers we think are most important to read.

We’ve also released drafts of two new research articles.

Tiling Agents for Self-Modifying AI, and the Löbian Obstacle (discuss it here), by Yudkowsky and Herreshoff, explains one of the key open problems in MIRI’s research agenda:

We model self-modification in AI by introducing “tiling” agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring’s goals). Constructing a formalism in the most straightforward way produces a Gödelian difficulty, the “Löbian obstacle.” By technical methods we demonstrates the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed. We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic (discuss it here), by LaVictoire et al., explains some progress in program equilibrium made by MIRI research associate Patrick LaVictoire and several others during MIRI’s April 2013 workshop:

Rational agents defect on the one-shot prisoner’s dilemma even though mutual cooperation would yield higher utility for both agents. Moshe Tennenholtz showed that if each program is allowed to pass its playing strategy to all other players, some programs can then cooperate on the one-shot prisoner’s dilemma. Program equilibria is Tennenholtz’s term for Nash equilibria in a context where programs can pass their playing strategies to the other players.

One weakness of this approach so far has been that any two programs which make different choices cannot “recognize” each other for mutual cooperation, even if they are functionally identical. In this paper, provability logic is used to enable a more flexible and secure form of mutual cooperation.

Participants of MIRI’s April workshop also made progress on Christiano’s probabilistic logic (an attack on the Löbian obstacle), but that work is not yet ready to be released.

We’ve also revamped the Get Involved page, which now includes an application form for forthcoming workshops. If you might like to work with MIRI on some of its open research problems sometime in the next 18 months, please apply! Likewise, if you know someone who might enjoy attending such a workshop, please encourage them to apply.

Friendly AI Research as Effective Altruism

 |   |  Analysis

MIRI was founded in 2000 on the premise that creating1 Friendly AI might be a particularly efficient way to do as much good as possible.

Some developments since then include:

  • The field of “effective altruism” — trying not just to do good but to do as much good as possible2 — has seen more publicity and better research than ever before, in particular through the work of GiveWell, the Center for Effective Altruism, the philosopher Peter Singer, and the community at Less Wrong.3
  • In his recent PhD dissertation, Nick Beckstead has clarified the assumptions behind the claim that shaping the far future (e.g. via Friendly AI) is overwhelmingly important.
  • Due to research performed by MIRI, the Future of Humanity Institute (FHI), and others, our strategic situation with regard to machine superintelligence is more clearly understood, and FHI’s Nick Bostrom has organized much of this work in a forthcoming book.4
  • MIRI’s Eliezer Yudkowsky has begun to describe in more detail which open research problems constitute “Friendly AI research,” in his view.

Given these developments, we are in a better position than ever before to assess the value of Friendly AI research as effective altruism.

Still, this is a difficult question. It is challenging enough to evaluate the cost-effectiveness of anti-malaria nets or direct cash transfers. Evaluating the cost-effectiveness of attempts to shape the far future (e.g. via Friendly AI) is even more difficult than that. Hence, this short post sketches an argument that can be given in favor of Friendly AI research as effective altruism, to enable future discussion, and is not intended as a thorough analysis.

Read more »

  1. In this post, I talk about the value of humanity in general creating Friendly AI, though MIRI co-founder Eliezer Yudkowsky usually talks about MIRI in particular — or at least, a functional equivalent — creating Friendly AI. This is because I am not as confident as Yudkowsky that it is best for MIRI to attempt to build Friendly AI. When updating MIRI’s bylaws in early 2013, Yudkowsky and I came to a compromise on the language of MIRI’s mission statement, which now reads: “[MIRI] exists to ensure that the creation of smarter-than-human intelligence has a positive impact. Thus, the charitable purpose of [MIRI] is to: (a) perform research relevant to ensuring that smarter-than-human intelligence has a positive impact; (b) raise awareness of this important issue; (c) advise researchers, leaders and laypeople around the world; and (d) as necessary, implement a smarter-than-human intelligence with humane, stable goals” (emphasis added). My own hope is that it will not be necessary for MIRI (or a functional equivalent) to attempt to build Friendly AI itself. But of course I must remain open to the possibility that this will be the wisest course of action as the first creation of AI draws nearer. There is also the question of capability: few people think that a non-profit research organization has much chance of being the first to build AI. I worry, however, that the world’s elites will not find it fashionable to take this problem seriously until the creation of AI is only a few decades away, at which time it will be especially difficult to develop the mathematics of Friendly AI in time, and humanity will be forced to take a gamble on its very survival with powerful AIs we have little reason to trust. 
  2. One might think of effective altruism as a straightforward application of decision theory to the subject of philanthropy. Philanthropic agents of all kinds (individuals, groups, foundations, etc.) ask themselves: “How can we choose philanthropic acts (e.g. donations) which (in expectation) will do as much good as possible, given what we care about?” The consensus recommendation for all kinds of choices under uncertainty, including philanthropic choices, is to maximize expected utility (Chater & Oaksford 2012; Peterson 2004; Stein 1996; Schmidt 1998:19). Different philanthropic agents value different things, but decision theory suggests that each of them can get the most of what they want if they each maximize their expected utility. Choices which maximize expected utility are in this sense “optimal,” and thus another term for effective altruism is “optimal philanthropy.” Note that effective altruism in this sense is not too dissimilar from earlier approaches to philanthropy, including high-impact philanthropy (making “the biggest difference possible, given the amount of capital invested“), strategic philanthropy, effective philanthropy, and wise philanthropy. Note also that effective altruism does not say that a philanthropic agent should specify complete utility and probability functions over outcomes and then compute the philanthropic act with the highest expected utility — that is impractical for bounded agents. We must keep in mind the distinction between normative, descriptive, and prescriptive models of decision-making (Baron 2007): “normative models tell us how to evaluate… decisions in terms of their departure from an ideal standard. Descriptive models specify what people in a particular culture actually do and how they deviate from the normative models. Prescriptive models are designs or inventions, whose purpose is to bring the results of actual thinking into closer conformity to the normative model.” The prescriptive question — about what bounded philanthropic agents should do to maximize expected utility with their philanthropic choices — tends to be extremely complicated, and is the subject of most of the research performed by the effective altruism community. 
  3. See, for example: Efficient Charity, Efficient Charity: Do Unto Others, Politics as Charity, Heuristics and Biases in Charity, Public Choice and the Altruist’s Burden, On Charities and Linear Utility, Optimal Philanthropy for Human Beings, Purchase Fuzzies and Utilons Separately, Money: The Unit of Caring, Optimizing Fuzzies and Utilons: The Altruism Chip Jar, Efficient Philanthropy: Local vs. Global Approaches, The Effectiveness of Developing World Aid, Against Cryonics & For Cost-Effective Charity, Bayesian Adjustment Does Not Defeat Existential Risk Charity, How to Save the World, and What is Optimal Philanthropy? 
  4. I believe Beckstead and Bostrom have done the research community an enormous service in creating a framework, a shared language, for discussing trajectory changes, existential risks, and machine superintelligence. When discussing these topics with my colleagues, it has often been the case that the first hour of conversation is spent merely trying to understand what the other person is saying — how they are using the terms and concepts they employ. Beckstead’s and Bostrom’s recent work should enable clearer and more efficient communication between researchers, and therefore greater research productivity. Though I am not aware of any controlled, experimental studies on the effect of shared language on research productivity, a shared language is widely considered to be of great benefit for any field of research, and I shall provide a few examples of this claim which appear in print. Fuzzi et al. (2006): “The use of inconsistent terms can easily lead to misunderstandings and confusion in the communication between specialists from different [disciplines] of atmospheric and climate research, and may thus potentially inhibit scientific progress.” Hinkel (2008): “Technical languages enable their users, e.g. members of a scientific discipline, to communicate efficiently about a domain of interest.” Madin et al. (2007): “terminological ambiguity slows scientific progress, leads to redundant research efforts, and ultimately impedes advances towards a unified foundation for ecological science.” 

MIRI May Newsletter: Intelligence Explosion Microeconomics and Other Publications

 |   |  Newsletters

Greetings From the Executive Director

Dear friends,

It’s been a busy month!

Mostly, we’ve been busy publishing things. As you’ll see below, Singularity Hypotheses has now been published, and it includes four chapters by MIRI researchers or research associates. We’ve also published two new technical reports — one on decision theory and another on intelligence explosion microeconomics — and several new blog posts analyzing various issues relating to the future of AI. Finally, we added four older articles to the research page, including Ideal Advisor Theories and Personal CEV (2012).

In our April newsletter we spoke about our April 11th party in San Francisco, celebrating our relaunch as the Machine Intelligence Research Institute and our transition to mathematical research. Additional photos from that event are now available as a Facebook photo album. We’ve also uploaded a video from the event, in which I spend 2 minutes explaining MIRI’s relaunch and some tentative results from the April workshop. After that, visiting researcher Qiaochu Yuan spends 4 minutes explaining one of MIRI’s core research questions: the Löbian obstacle to self-modifying systems.

Some of the research from our April workshop will be published in June, so if you’d like to read about those results right away, you might like to subscribe to our blog.


Luke Muehlhauser

Executive Director

Read more »

New Transcript: Yudkowsky and Aaronson

 |   |  News


In When Will AI Be Created?, I referred to a conversation between Eliezer Yudkowsky and Scott Aaronson. A transcript of that dialogue is now available, thanks to MIRI volunteers Ethan Dickinson, Daniel Kokotajlo, and Rick Schwall.

See also the transcript for a conversation between Eliezer Yudkowsky and Massimo Pigliucci.

To join these volunteers in assisting our cause, visit!

Sign up for DAGGRE to improve science & technology forecasting

 |   |  News

In When Will AI Be Created?, I named four methods that might improve our forecasts of AI and other important technologies. Two of these methods were explicit quantification and leveraging aggregation, as exemplified by IARPA’s ACE program, which aims to “dramatically enhance the accuracy, precision, and timeliness of… forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many… analysts.”

GMU’s DAGGRE program, one of five teams participating in ACE, recently announced a transition from geopolitical forecasting to science & technology forecasting:

DAGGRE will continue, but it will transition from geo-political forecasting to science and technology (S&T) forecasting to better use its combinatorial capabilities. We will have a brand new shiny, friendly and informative interface co-designed by Inkling Markets, opportunities for you to provide your own forecasting questions and more!

Another exciting development is that our S&T forecasting prediction market will be open to everyone in the world who is at least eighteen years of age. We’re going global!

If you want to help improve humanity’s ability to forecast important technological developments like AI, please register for DAGGRE’s new S&T prediction website here.

I did.

Four Articles Added to Research Page

 |   |  Papers

Four older articles have been added to our research page.

The first is the early draft of Christiano et al.’s “Definability of ‘Truth’ in Probabilistic Logic” previously discussed here and here. The draft was last updated on April 2, 2013.

The second paper is a cleaned-up version of an article originally published in December 2012 by Luke Muehlhauser and Chris Williamson to Less Wrong: “in December 2012 by Luke Muehlhauser and Chris Williamson to Less Wrong: “Ideal Advisor Theories and Personal CEV.”

The third and fourth papers were originally published by Bill Hibbard in the AGI 2012 Conference Proceedings: “AGI 2012 Conference Proceedings: “Avoiding Unintended AI Behaviors” and “Decision Support for Safe AI Design.” Hibbard wrote these articles before he became a MIRI research associate, but he gave us permission to include them on our research page because (1) he became a MIRI research associate during the AGI-12 conference at which the articles were published, (2) the articles were partly inspired by a public dialogue with Luke Muehlhauser, and (3) the articles build on MIRI’s paper “public dialogue with Luke Muehlhauser, and (3) the articles build on MIRI’s paper “Intelligence Explosion and Machine Ethics.”

As mentioned in our December 2012 newsletter, “Avoiding Unintended AI Behaviors” was awarded MIRI’s $1000 Turing Prize for Best AGI Safety Paper. The prize was awarded in honor of Alan Turing, who not only discovered some of the key ideas of machine intelligence, but also grasped its importance, writing that “…it seems probable that once [human-level machine thinking] has started, it would not take long to outstrip our feeble powers… At some stage therefore we should have to expect the machines to take control…”

When Will AI Be Created?

 |   |  Analysis

Strong AI appears to be the topic of the week. Kevin Drum at Mother Jones thinks AIs will be as smart as humans by 2040. Karl Smith at Forbes and “M.S.” at The Economist seem to roughly concur with Drum on this timeline. Moshe Vardi, the editor-in-chief of the world’s most-read computer science magazine, predicts that “by 2045 machines will be able to do if not any work that humans can do, then a very significant fraction of the work that humans can do.”

But predicting AI is more difficult than many people think.

To explore these difficulties, let’s start with a 2009 conversation between MIRI researcher Eliezer Yudkowsky and MIT computer scientist Scott Aaronson, author of the excellent Quantum Computing Since Democritus. Early in that dialogue, Yudkowsky asked:

It seems pretty obvious to me that at some point in [one to ten decades] we’re going to build an AI smart enough to improve itself, and [it will] “foom” upward in intelligence, and by the time it exhausts available avenues for improvement it will be a “superintelligence” [relative] to us. Do you feel this is obvious?

Aaronson replied:

The idea that we could build computers that are smarter than us… and that those computers could build still smarter computers… until we reach the physical limits of what kind of intelligence is possible… that we could build things that are to us as we are to ants — all of this is compatible with the laws of physics… and I can’t find a reason of principle that it couldn’t eventually come to pass…

The main thing we disagree about is the time scale… a few thousand years [before AI] seems more reasonable to me.

Those two estimates — several decades vs. “a few thousand years” — have wildly different policy implications.

If there’s a good chance that AI will replace humans at the steering wheel of history in the next several decades, then we’d better put our gloves on and get to work making sure that this event has a positive rather than negative impact. But if we can be pretty confident that AI is thousands of years away, then we needn’t worry about AI for now, and we should focus on other global priorities. Thus it appears that “When will AI be created?” is a question with high value of information for our species.

Let’s take a moment to review the forecasting work that has been done, and see what conclusions we might draw about when AI will likely be created.

Read more »

Advise MIRI with Your Domain-Specific Expertise

 |   |  News

MIRI currently has a few dozen volunteer advisors on a wide range of subjects, but we need more! If you’d like to help MIRI pursue its mission more efficiently, please sign up to be a MIRI advisor.

If you sign up, we will occasionally ask you questions, or send you early drafts of upcoming writings for feedback.

We don’t always want technical advice (“Well, you can do that with a relativized arithmetical hierarchy…”); often, we just want to understand how different groups of experts respond to our writing (“The tone of this paragraph rubs me the wrong way because…”).

At the moment, we are most in need of advisors on the following subjects:

Even if you don’t have much time to help, please sign up! We will of course respect your own limits on availability.