Assessing our past and potential impact

 |   |  Analysis

We’ve received several thoughtful questions in response to our fundraising post to the Effective Altruism Forum and our new FAQ. From quant trader Maxwell Fritz:

My snap reaction to MIRI’s pitches has typically been, “yeah, AI is a real concern. But I have no idea whether MIRI are the right people to work on it, or if their approach to the problem is the right one”.

Most of the FAQ and pitch tends to focus on the “does this matter” piece. It might be worth selling harder on the second component – if you agree AI matters, why MIRI?

At that point, there’s two different audiences – one that has the expertise in the field to make a reasoned assessment based on the quality of your existing work, and a second that doesn’t have a clue (me) and needs to see a lot of corroboration from unaffiliated, impressive sources (people in that first group).

The pitches tend to play up famous people who know their shit and corroborate AI as a concern – but should especially make it clear when those people believe in MIRI. That’s what matters for the “ok, why you?” question. And the natural follow up is if all of these megarich people are super on board with the concern of AI, and experts believe MIRI should lead the charge, why aren’t you just overflowing with money already?

And from mathematics grad student Tristan Tager:

I would guess that “why MIRI”, rather than “who’s MIRI” or “why AI”, is the biggest marketing hurdle you guys should address.

For me, “why MIRI” breaks down into two questions. The first and lesser question is, what can MIRI do? Why should I expect that the MIRI vision and the MIRI team are going to get things done? What exactly can I expect them to get done? Most importantly in addressing this question, what have they done already and why is it useful? The Technical Agenda is vague and mostly just refers to the list of papers. And the papers don’t help much — those who don’t know much about academia need something more accessible, and those who do know more about academia will be skeptical about MIRI’s self-publishing and lack of peer review.

But the second and much bigger question is, what would MIRI do that Google wouldn’t? Google has tons of money, a creative and visionary staff, the world’s best programmers, and a swath of successful products that incorporate some degree of AI — and moreover they recently acquired several AI businesses and formed an AI ethics board. It seems like they’re approaching the same big problem directly rather than theoretically, and have deep pockets, keen minds, and a wealth of hands-on experience.

There are a number of good questions here. Later this week, Nate plans to post a response to Tristan’s last question: Why is MIRI currently better-positioned to work on this problem than AI groups in industry or academia? (Update February 17: Link here.)

Here, I want to reply to several other questions Tristan and Maxwell raised:

  • How can non-specialists assess MIRI’s research agenda and general competence?
  • What kinds of accomplishments can we use as measures of MIRI’s past and future success?
  • And lastly: If a lot of people take this cause seriously now, why is there still a funding gap?

General notes

When we make our case for MIRI, we usually focus on the reasons to consider AI an important point of leverage on existential risk (Nate’s four background claims, Eliezer’s five theses) and for thinking that early theoretical progress is possible in this area (MIRI’s Approach).

We focus on these big-picture arguments because the number of people working on this topic is still quite small. The risk scenario MIRI works on has only risen to national attention in the last 6-12 months; at the moment, MIRI is the only research organization I know of that is even claiming to specialize in early technical research on the alignment problem.

There are multiple opportunities to support technical research into long-term AI safety at the scale of “funding individual researcher X to work on discrete project Y.” Some recipients of the 2015 Future of Life Institute (FLI) grants fall into this category, e.g., Stuart Russell and Paul Christiano.

However, there aren’t multiple opportunities in this area at the scale of “funding an AI group to pursue a large-scale or long-term program,” and there aren’t many direct opportunities to bring in entirely new people and grow and diversify the field. MIRI would love to have organizational competitors (and collaborators) in this space, but they don’t yet exist. We expect this to change eventually, and one of our expansion goals is to make this happen faster, by influencing other math and computer science research groups to take on more AI alignment work and by recruiting highly qualified specialists from outside the existential risk community to become career AI alignment researchers.

The upside of getting started early is that we have a chance to have a larger impact in a less crowded space. The downside is that there are fewer authoritative sources to appeal to when outsiders want to verify that our research agenda is on the right track.

For the most part, those authoritative sources will probably need to wait a year or two. Academia is slow, and at this stage many computer scientists have only been aware of this area for a few months. Our mission is seen as important by a growing number of leaders in science and industry, and our technical agenda is seen by a number of AI specialists as promising and deserving of more attention — hence its inclusion in the FLI research priorities document. But we don’t expect the current state of the evidence to be universally convincing to our most skeptical potential donors.

For those who want detailed independent assessments of MIRI’s output, our advice is to wait a bit for the wider academic community to respond. (We’re also looking into options for directly soliciting public feedback from independent researchers regarding our research agenda and early results.)

In the interim, however, “Why Now Matters” notes reasons donations to MIRI are likely to have a much larger impact now than they would several years down the line. For donors who are skeptical (or curious), but are not so skeptical that they require a fine-grained evaluation of our work by the scholarly community, I’ll summarize some of the big-picture reasons to think MIRI’s work is likely to be high-value.1

Influence on the AI conversation

In the absence of in-depth third-party evaluations of our work, interested non-specialists can look at our publication history and the prominence of our analyses in scholarly discussions of AI risk.

MIRI’s most important accomplishments fall into three categories: writing up top-priority AI alignment problems; beginning early work on these problems; and getting people interested in our research and our mission. Our review of the last year discusses our recent progress in formalizing alignment problems, as well as our progress in getting the larger AI community interested in long-term AI safety. Our publications list gives a more detailed and long-term picture of our output.

MIRI co-founder and senior researcher Eliezer Yudkowsky and Future of Humanity Institute (FHI) founding director Nick Bostrom are responsible for much of the early development and popularization of ideas surrounding smarter-than-human AI risk. Eliezer’s ideas are cited prominently in the 2009 edition of Artificial Intelligence: A Modern Approach (AI:MA), the leading textbook in the field of AI, and also in Bostrom’s 2014 book Superintelligence.

Credit for more recent success in popularizing long-term AI safety research is shared between MIRI and a number of actors: Nick Bostrom and FHI, Max Tegmark and FLI, Stuart Russell (co-author of AI:MA), Jaan Tallinn, and others. Many people in this existential-risk-conscious cluster broadly support MIRI’s efforts and are in regular contact with us about our decisions. Bostrom, Tegmark, Russell, and Tallinn are all MIRI advisors, and Tallinn, a co-founder of FLI and of the Cambridge Centre for the Study of Existential Risk (CSER), cites MIRI as a key source for his views on AI risk.

Writing in early 2014, Russell and Tegmark, together with Stephen Hawking and Frank Wilczek, noted in The Huffington Post that “little serious research is devoted to these issues [of long-term AI risk] outside small non-profit institutes such as the Cambridge Center for Existential Risk, the Future of Humanity Institute, the Machine Intelligence Research Institute, and the Future of Life Institute.”2 Of these organizations, MIRI is the one that currently specializes in the technical obstacles to designing safe, beneficial smarter-than-human AI systems. FHI, CSER, and FLI do important work on a broader set of issues, including forecasting and strategy work, outreach, and investigations into other global risks.


Tristan raised the concern that MIRI’s technical agenda was self-published. Reviewing our past publications, MIRI has self-published many of its results. More than once, we’ve had the experience of seeing papers rejected with comments that the results are interesting but the AI motivation is just too strange. We’ve begun submitting stripped-down papers and putting the full versions on arXiv, but figuring out the best way to get these results published took some trial and error.

Part of the underlying problem is that the AI field has been repeatedly burned by “winters” when past generations over-promised and under-delivered. Members of the field are often uncomfortable looking too far ahead, and have historically been loathe to talk about general intelligence.

Our approach is exactly the opposite: we focus directly on trying to identify basic aspects of general reasoning that are still not well-understood, while explicitly avoiding safety research focused on present-day AI systems (which is more crowded and more likely to occur regardless of our efforts). This means that our work often lacks direct practical application today, while also broaching the unpopular subject of general intelligence.

We’re getting more work published these days in part because the topic of smarter-than-human AI is no longer seen as academically illegitimate to the extent it was in the past, and in part because we’ve pivoted in recent years from being an organization that primarily worked on movement growth (via Singularity Summits, writings on rationality, etc.) and some forecasting research, to an organization that focuses solely on novel technical research.

Our seven-paper technical agenda was initially self-published, but this was primarily in order to make it available early enough to be read and cited by attendees of the “Future of AI” conference in January. Since then, a shorter version of the technical agenda paper “Toward Idealized Decision Theory” has been accepted to AGI-15 (the full version is on arXiv), and we’ve presented the full technical agenda paper “Corrigibility” at AAAI-15, a leading academic conference in AI. The overview paper, “Aligning Superintelligence with Human Interests,” is forthcoming in a Springer anthology on the technological singularity.

The other four technical agenda papers aren’t going through peer review because they’re high-level overview papers that are long on explanation and background, but short on new results. We’ve been putting associated results through peer review instead. We published Vingean reflection results in the AGI-14 proceedings (“Problems of Self-Reference in Self-Improving Space-Time Embedded Intelligence”), and other results have been accepted to ITP 2015 (“Proof-Producing Reflection for HOL“). We have two peer-reviewed papers related to both logical uncertainty and realistic world-models: one we presented two weeks ago at AGI-15 (“Reflective Variants of Solomonoff Induction and AIXI”) and another we’re presenting at LORI-V later this year (“Reflective Oracles: A Foundation for Classical Game Theory”). We also presented relevant decision theory results at AAAI-14.

The “Value Learning” paper is the only paper in the research agenda suite that hasn’t had associated work go through peer review yet. It’s the least technical part of the agenda, so it may be a little while before we have technical results to put through peer review on this topic.

(Update July 15: “The Value Learning Problem” has now been peer reviewed and presented at the IJCAI 2016 Ethics for Artificial Intelligence workshop.)

By publishing in prominent journals and conference proceedings, we hope to get many more researchers interested in our work. A useful consequence of this is that there will be more non-MIRI evaluations of (and contributions to) the basic research questions we’re working on. In the nearer future, we also have a few blog posts in the works that are intended to explain some of the more technical parts of our research agenda, such as Vingean reflection.

In all, we’ve published seven peer-reviewed papers since Nate and Benja came on in early 2014. In response to recent industry progress and new work by MIRI and the existential risk community, it’s become much easier to publish papers that wrestle directly with open AI alignment problems, and we expect it to become even easier over the next few years.

What does success look like?

Success for MIRI partly means delivering on our Summer Fundraiser targets: growing our research team and taking on additional projects conditional on which funding targets we’re able to hit. Our current plan this year is to focus on producing a few high-quality publications in elite venues. If our fundraiser goes well, this should impact how effectively we can execute on that plan and how quickly we can generate and publish new results.

A more direct measure of success is our ability to make progress on the specific technical problems we’ve chosen to focus on, as assessed by MIRI researchers and the larger AI community. In “MIRI’s Approach,” Nate distinguishes two types of problems: ones for which we know the answer in principle, but lack practical algorithms; and ones that are not yet well-specified enough for us to know how to construct an answer even in principle. At this point, large-scale progress for MIRI looks like moving important and neglected AI problems from the second category to the first category.

Maxwell raised one more question: if MIRI is receiving more mainstream attention and approval, why does it still have a funding gap?

Part of the answer, sketched out in “Why Now Matters,” is that we do think there’s a good chance that large grants or donations could close our funding gap in the coming years. However, large donors and grantmakers can be slow to act, and whether or not our funding gap is closed five or ten years down the line, it’s very valuable for us to be able to expand and diversify our activities now.

Elon Musk and the Open Philanthropy Project recently awarded $7M in grants to AI safety research, and MIRI’s core research program received a large project grant. This is a wonderful infusion of funding into the field, and means that many more academics will be able to start focusing on AI alignment research. However, given the large number of high-quality grant recipients, the FLI grants aren’t enough to make the most promising research opportunities funding-saturated. MIRI received the fourth-largest project grant, which amounts to $83,000 per year for three years.3 This is a very generous grant, and it will significantly bolster our efforts to support researchers and run workshops, but it’s nowhere near enough to close our funding gap.

Since this is the first fundraiser we’ve run in 2015, it’s a bit early to ask why the newfound attention and approval our work has received this year hasn’t yet closed the gap. The FLI grants and our ongoing fundraiser are part of the mechanism by which the funding gap shrinks. It is shrinking, but the process isn’t instantaneous — and part of the process is making our case to new potential supporters. Our hope is that if we make our case for MIRI clearer to donors, we can close our funding gap faster and thereby have a bigger impact on the early scholarly conversation about AI safety.

We’re very grateful for all the support we’ve gotten so far — and in particular for the support we received before we had mainstream computer science publications, a fleshed-out research agenda, or a track record of impacting the discourse around the future of AI. The support we received early on was critical in getting us to where we are today, and as our potential as an organization becomes clearer through our accomplishments, we hope to continue to attract a wider pool of supporters and collaborators.


  1. As always, you can also shoot us more specific questions that aren’t addressed here. 
  2. Quoting Russell, Tegmark, Hawking, and Wilczek:

    Whereas the short-term impact of AI depends on who controls it, the long-term impact depends on whether it can be controlled at all.

    So, facing possible futures of incalculable benefits and risks, the experts are surely doing everything possible to ensure the best outcome, right? Wrong. If a superior alien civilization sent us a text message saying, “We’ll arrive in a few decades,” would we just reply, “OK, call us when you get here — we’ll leave the lights on”? Probably not — but this is more or less what is happening with AI. Although we are facing potentially the best or worst thing ever to happen to humanity, little serious research is devoted to these issues outside small non-profit institutes such as the Cambridge Center for Existential Risk, the Future of Humanity Institute, the Machine Intelligence Research Institute, and the Future of Life Institute. 

  3. We additionally received about $50,000 for the AI Impacts project, and will receive some fraction of the funding from two other grants where our researchers are secondary investigators, “Inferring Human Values” and “Applying Formal Verification to Reflective Reasoning.” 
  • sanxiyn

    “Proof-Producing Reflection for HOL” is available at I am not sure why it was not linked.

  • Evan Gaensbauer

    Reading this a second time several months later, and not finding it able to satisfactorily able to answer the questions originally posed, I now find a closer reading has made me conclude this response was indeed satisfactory.