2013 in Review: Friendly AI Research

 |   |  MIRI Strategy

This is the 4th part of my personal and qualitative self-review of MIRI in 2013, in which I review MIRI’s 2013 Friendly AI (FAI) research activities.1

Friendly AI research in 2013

  1. In early 2013, we decided to shift our priorities from research plus public outreach to a more exclusive focus on technical FAI research. This resulted in roughly as much public-facing FAI research in 2013 as in all past years combined.
  2. Also, our workshops succeeded in identifying candidates for hire. We expect to hire two 2013 workshop participants in the first half of 2014.
  3. During 2013, I learned many things about how to create an FAI research institute and FAI research field. In particular…
  4. MIRI needs to attract more experienced workshop participants.
  5. Much FAI research can be done by a broad community, and need not be labeled as FAI research. But, more FAI progress is made when the researchers themselves conceive of the research as FAI research.
  6. Communication style matters a lot.

The shift to Friendly AI research

From MIRI’s founding in 2000 until our strategic shift in early 2013,2 we did some research and much public outreach (e.g. the Singularity Summit and The Sequences).3 In early 2013, we decided that enough outreach and movement-building had been done that we could productively shift to a primary focus on research, and Friendly AI research specifically.

The task before us was, essentially, to create a new FAI research institute (out of what had previously been primarily an outreach organization), and to create a new field of FAI research. We still have much to learn about how to achieve these goals (see below).

Our initial steps were to (1) hold a series of research workshops, and to (2) describe open problems in Friendly AI theory to potential research collaborators. Our workshops and open problem descriptions were aimed at three goals in particular. We wanted them to:

  1. help us identify researchers MIRI should hire to work full-time on Friendly AI theory,
  2. expose additional researchers to the Friendly AI research agenda, and
  3. spur concrete progress on open problems in Friendly AI.

First, I’ll describe our 2013 Friendly AI research activities. After that, I’ll review “how good” I think these results are, and what lessons I’ve learned.

The workshops

The workshops strategy had been suggested by the success of our one-week November 2012 workshop, which had been an experiment involving only four researchers, and had produced the core result of “Definability of Truth in Probabilistic Logic.”

Our first workshop of 2013, held in April, was an attempt to tackle as many open problems as we could, with as many people as we could gather, to quickly learn which problems were most tractable and which researchers were most likely to contribute in the future. It involved 12 participants and lasted 3 weeks, though (due to scheduling constraints) only 5 researchers participated for the entire duration of the workshop. We learned a great deal about the workshop’s participants, and three problems in particular showed the most progress: Christiano’s “Definability of Truth” framework, LaVictoire’s “Robust Cooperation” framework, and Fallenstein’s “parametric polymorphism” approach to the Löbian obstacle for self-modifying systems. The success of this workshop encouraged us to hold more such workshops, albeit at a smaller scale and with tighter research foci.

Our next workshop, in July 2013, had 8 participants and lasted one week. It focused on issues related to logical omniscience and the Löbian obstacle / self-reflective agents, and produced less progress-per-day than the April workshop. Its chief result was described in a blog post by participant Abram Demski.

Our September workshop focused instead on decision theory. It had 11 participants and lasted one week. Participants brainstormed “well-posed problems” in the area, built on LaVictoire’s robust cooperation framework, made some progress on formalizing updateless decision theory, and formulated additional toy problems such as the Ultimate Newcomb’s Problem.

Our November workshop was our first workshop held outside of Berkeley. FHI graciously hosted us at Oxford University. As with the July workshop, this workshop focused on logical omniscience and self-reflective agents. There were 11 participants, and it lasted one week. November’s theoretical progress flowed into the progress made at our December workshop (same topics, 13 participants, one week), which was captured in 7 new technical reports.

Next, some basic statistics:

  • We held 5 research workshops in 2013, with all but one of them being one week long.
  • These workshops were attended by 35 unique researchers, plus 7 first-day-only visitors (e.g. Hannes Leitgeb and Nik Weaver).4
  • For first-time attendees, the median reply to the question “How happy are you that you came to the workshop, 0-10?” was 8.5.
  • From the time it went live in March 2013 to the end of 2013, about a dozen people contacted us about our Recommended Courses for MIRI Math Researchers page. However, we have reason to believe it has influenced the study patterns of a much larger number of people. Some MIRI supporters have told us they routinely point smart young acquaintances to that page. Moreover, the page received more unique pageviews in 2013 than (e.g.) our Donate or About pages, despite not being linked from every page of the site like the Donate and About pages are. The Recommended Courses page made it possible for at least one person (Nate Soares) to quickly upgrade his math skills and attend a workshop in 2013, which he couldn’t have done before studying several of the textbooks on the Courses page.
  • From the time it went live in June 2013 to the end of 2013, we received 227 non-junk applications5 to attend future MIRI workshops, 47 of which are still being processed. So far, 60 applicants are ones we’ve deemed “promising,” 23 of whom attended a workshop in 2013. Of those 23, about half were researchers with whom we had little to no prior contact.


Describing open problems in Friendly AI

In 2013, MIRI described open problems in Friendly AI (OPFAIs) to researchers via three standard methods: articles, talks, and tutorials at workshops.

On OPFAI articles: Yudkowsky’s article on “OPFAI #1” discussed intelligence explosion microeconomics (aka AI takeoff dynamics), which I consider to be an open problem in “strategy research” rather than in Friendly AI theory, so I discussed it in a previous post. From my perspective, the first written OPFAI description of 2013 was on logical decision theory. Alex Altair (then a MIRI researcher) described the problem in an April 2013 paper called “A Comparison of Decision Algorithms on Newcomblike Problems.” This open problem had been described before, in Less Wrong posts and in a 117-page technical report, but Altair’s presentation of the issue was more succinct and formal than previous presentations had been.

The second written OPFAI description of 2013 was on the tiling agents problem, and specifically the Löbian obstacle to tiling agents. Yudkowsky brought a draft of this paper to the April workshop, and heavily modified the draft as a result of the progress at that workshop, finally publishing the draft in June 2013. The third written OPFAI description of 2013, by Patrick LaVictoire and co-authors, was on the robust cooperation problem. The fourth written OPFAI description of 2013 was on naturalized induction.

Because the tiling agents paper took ~2 months of FAI researcher time to produce, we decided to experiment with a process that would minimize the amount of FAI-researcher-time required to produce new OPFAI descriptions. First, Yudkowsky brain-dumped the OPFAI to a Facebook group. Then, Robby Bensinger worked with several others to produce Less Wrong posts that described the OPFAI more clearly. The first post produced via this process was published in December 2013: Building Phenomenological Bridges. The rest of the posts explaining this OPFAI will be published in Q1 2014. Because we want to maximize the amount of FAI researcher hours that goes into FAI research rather than exposition, we hope to hire additional expository writing talent in 2014 (see our Careers page).

On OPFAI talks: MIRI scheduled two OPFAI talks in 2013. Yudkowsky’s Oct. 15th talk, “Recursion in rational agents: Foundations for self-modifying AI,” described both the robust cooperation and tiling agents problems to an audience at MIT. Two days earlier, (MIRI research associate) Paul Christiano gave a talk about probabilistic metamathematics at Harvard, following up on the earlier results from the “Definability of Truth” paper.6 Unfortunately, Yudkowsky’s talk was not recorded, but Christiano’s was.

On OPFAI tutorials at workshops: Each MIRI workshop in 2013 opened with a day or two of tutorials on the open problems being addressed by that workshop. These tutorials exposed ~35 researchers (participants and first-day visitors) to OPFAIs they weren’t previously very familiar with. (The others — e.g. Yudkowsky, Christiano, and Fallenstein — were already pretty familiar with the OPFAIs described in the tutorials.)

How good are these results?

For comparison’s sake, MIRI’s 2000-2012 FAI research efforts consisted in:

  • Yudkowsky’s early research into the general “shape” of the Friendly AI challenge, resulting in publications such as “Creating Friendly AI” (2001), “Coherent Extrapolated Volition” (2004), and “Artificial Intelligence as a Positive and Negative Factor in Global Risk” (2008). These publications did not yet describe any OPFAIs as well-defined as the open problems described in Altair (2013), Yudkowsky & Herreshoff (2013), or LaVictoire et al (2013).7
  • Yudkowsky’s early decision theory research, which resulted in TDT circa 2005, though this work wasn’t written up in much detail until 2009 (123) and 2010.
  • Yudkowsky’s early work on Friendly consequentialist AI, in 2003-2009, some of it with Marcello Herreshoff, and one summer (2006) with Peter de Blanc and Nick Hay as well. This work resulted in early versions of many of the OPFAIs described by MIRI in 2013, currently being written up, or currently in Yudkowsky’s queue to write up. It also resulted in the “infinite waterfall” method later described in Yudkowsky & Herreshoff (2013).
  • Yudkowsky worked again with Herreshoff in the summer of 2009, in part on the Löbian obstacle.
  • MIRI held a decision theory workshop in March 2010, attended by Eliezer Yudkowsky, Wei Dai, Stuart Armstrong, Gary Drescher, Anna Salamon, and about a dozen others who were present for some but not all discussions.8 This workshop spawned a decision theory mailing list that has, from 2010 through the present day, produced much of the recent progress on TDT/UDT-style decision theories, though mostly via non-MIRI researchers like Wei Dai, Vladimir Slepnev, Stuart Armstrong, and Vladimir Nesov.
  • (Former MIRI researcher) Peter de Blanc’s work on “convergence of expected utility for universal AI” and ontological crises, resulting in de Blanc (2009) and de Blanc (2011).
  • (MIRI research associate) Daniel Dewey’s work on value learning, resulting in Dewey (2011).

Thus, MIRI’s public-facing Friendly AI research from 2000-2012 consisted in a few non-technical works like “Creating Friendly AI” and “Coherent Extrapolated Volition,” some philosophical writings on TDT, and three somewhat technical papers by Peter de Blanc and Daniel Dewey. Compare this to MIRI’s 2013 public-facing FAI research: Muehlhauser & Williamson (2013),9 Altair (2013), Christiano et al. (2013)Yudkowsky & Herreshoff (2013)LaVictoire et al (2013), and these 7 technical reports.10

Subjectively, it feels to me like MIRI produced about as much public-facing Friendly AI research progress in 2013 as in all past years combined (2000-2012), and possibly more. This is good but not particularly surprising, since 2013 was also the first year in which MIRI tried to focus on producing public-facing FAI research progress. (But to be clear: if we remove the “public-facing” qualifier, then it’s clear that Yudkowsky alone produced far more FAI research progress in 2000-2012 than MIRI and its workshops produced in 2013 alone.)

So, did our workshops and open problem descriptions achieve our stated goals? Let’s check:

  1. Yes, they helped us identify candidates for hire. We expect to hire two 2013 workshop participants in the first half of 2014. (One of these hires is pending a visa application approval.)
  2. Yes, they exposed many new researchers to the Friendly AI research program. But, this exposure didn’t lead to as much independent Friendly AI work as I had hoped, and I have some theories as to why this was (see below).
  3. Yes, they spurred concrete research progress on Friendly AI (see above).

While this represents a promising start toward growing an FAI research institute and a new field of FAI research, there are many dimensions on which our output needs to improve for MIRI to have the impact we hope for (see below).

What have I learned about how to create an FAI research institute, and a new field of FAI research?

Some of my “lessons learned” from 2013’s FAI research activities were things I genuinely didn’t know at the start of the year. Most of them are things I suspected already, and I think they were confirmed by our experiences in 2013. Here are a few of them, in no particular order.

1. Keep operations work away from researchers.

In other words, “Don’t be afraid of a high operations-staff-to-researchers ratio.” Operations talent (including executive talent) is easier to find than FAI research talent, so it’s important to hire sufficient operations talent to make sure the FAI researchers we do find can spend approximately all their time on FAI research, and almost none of their time on tasks that can mostly be handled by operations staff (writing grant proposals, organizing events, fundraising, paper bibliographies, etc.). MIRI should hire enough operations talent to do this even if it makes our operations-staff-to-researcher ratio looks high for a research institute.11

Universities often struggle with this (from a research productivity perspective), loading up some of the best research talent in the world with teaching duties, grant writing duties, and university service.12 As an independent research institute, MIRI can set its own policies and minimize these problems.

2. We need to attract more experienced workshop participants.

Our workshops attracted some very bright participants, but they were almost exclusively younger than 30, with relatively few publications to their name. More experienced researchers would probably have advantages in (1) knowing related results and formal tools, (2) knowing productive research tactics, and (3) writing up results for peer-review, among other advantages.

3. Much FAI research can be done by a broad community, and need not be labeled as FAI research.

Presently, the Yudkowskian paradigm for “Friendly AI research” describes a very large research program that breaks down into dozens of sub-problems (OPFAIs), e.g. the tiling agents toy problem. Locating and formulating open problems plausibly relevant for Friendly AI is a challenge in itself, one that especially benefits from specializing in Friendly AI for several years.

Many of the OPFAIs themselves, however, can be framed as “ordinary” open problems in AI safety engineering, philosophy, mathematical logic, theoretical computer science, economics, and other fields. These open problems can often be stated without any mention of Friendly AI, and sometimes without any mention of AI in general.

For every OPFAI Yudkowsky has described,13 I’ve been able to locate earlier related work.14 Although this earlier work has not produced what we would regard as good solutions to open problems in FAI, it does suggest that FAI can be framed in ways palatable to academia. FAI need not be an “alien” research program, operating strictly outside mainstream academia, and conducted only by those explicitly motivated by FAI. Instead, FAI researchers should be able to frame their work in the context of mainstream research paradigms if they choose to do so. Moreover, much FAI research can be done even by those who aren’t explicitly motivated by FAI, so long as they find (e.g.) the Löbian obstacle interesting as mathematics — or as computer science, or as philosophy, etc.


4. But, more FAI progress is made when the researchers themselves conceive of the research as FAI research.

Still, researchers do seem more likely to produce useful work on Friendly AI if they are thinking about the problems from the perspective of Friendly AI, rather than merely thinking about them as interesting open problems in philosophy, computer science, economics, etc. As I said in my conversation with Jacob Steinhardt:

People work on different pieces of the problem depending on whether they’re trying to solve the problem for Friendly AI or just for a math journal. If they aren’t thinking about it from the FAI perspective, people can work all day on stuff that’s very close to what we care about in concept-space and yet has no discernable value to FAI theory. Thus, the people who have contributed by far the most novel FAI progress are people explicitly thinking about the problems from the perspective of FAI…

5. Communication style matters a lot.

When I talk to the kinds of top-notch researchers MIRI would like to collaborate with on open problems in Friendly AI, perhaps the most common complaint I hear is that our work is not formal enough, or not described clearly enough for them to understand it without more effort on their part than they are willing to expend. For an example of such a conversation that was recorded and transcribed, see again my conversation with Jacob Steinhardt.

I’ve thought this for a long time, and my experiences in 2013 have only reinforced the point. I’ll be writing more about this in the future.


  1. What counts as “Friendly AI research” is, naturally, a matter of debate. For most of this post I’ll assume “Friendly AI research” means “what Yudkowsky thinks of as Friendly AI research,” with the exception of intelligence explosion microeconomics, for reasons given in this post. 
  2. Until early 2013, the organization currently named “Machine Intelligence Research Institute” was known as the “Singularity Institute for Artificial Intelligence.” 
  3. From 2000-2004, “MIRI” was just Eliezer Yudkowsky, doing early FAI research. The organization began to grow in 2004, and by 2006 most efforts were outreach-related rather than research-related. This remained true until early 2013. 
  4. Some statistics about 2013’s 35 workshop participants: 15 have a PhD, three are women, and 3 hold a university faculty position of assistant professor or higher rank. In short, our workshop participants have thus far largely been graduate students, post-docs, and independent researchers. Among the 15 participants who have a PhD, 9 have a PhD in mathematics, 4 have a PhD in computer science, one has a PhD in cognitive science, and one has a joint PhD in philosophy and computer science. 
  5. By “junk applications” I mean to include both spam applications and applications from people who are clearly incapable of math research, e.g. “Hello, I would love to come to America to learn algebra.” 
  6. Probabilistic metamathematics is an OPFAI in itself, and also one possible path toward a solution to the tiling agents problem. 
  7. The open problems in these publications, too, need additional formalization. Such is the current state of research. 
  8. For example, Steve Rayhawk and Henrik Jonsson. 
  9. This short paper lies deep in the “philosophy” end of the philosophy -> math -> engineering spectrum
  10. For both the 2000-2012 and 2013 calendar periods, when I write of “MIRI’s public-facing FAI work” I’m not including work that was “enabled” but not really “produced” by MIRI or its workshops, for example most work on UDT/ADT (which were nevertheless largely developed on MIRI’s LessWrong.com website and its decision theory mailing list). 
  11. At the end of 2013, we had five full-time staff members: Luke Muehlhauser (executive director), Louie Helm (deputy director), Eliezer Yudkowsky (research fellow), Malo Bourgon (program manager), and Alex Vermeer (program management analyst), totaling 4 operations staff and one researcher. This 4:1 ratio will shrink as we are able to hire more FAI researchers, but I think it would have been a mistake to try to get by with fewer operations staff in 2013. 
  12. Link et al. (2008); Marsh & Hattie (2002); NSOPF (2004)
  13. Sometimes with much help from Robby Bensinger and/or others. 
  14. I’ll list some examples of earlier related work. (1Superrationality: getting agents to rationally cooperate with agents like themselves. Before “Robust Cooperation” there was: Rapoport (1966)McAfee (1984)Hofstadter (1985)Binmore (1987)Howard (1988)Tennenholtz (2004)Fortnow (2009)Kalai et al. (2010)Peters & Szentes (2012). (For Rapoport 1966, see especially pages 141-144 and 209-210. (2Coherent extrapolated volition: figuring out what we would wish if we knew more, thought better, were more the people we wished we were, etc. Before Yudkowsky (2004) there was: Rawls (1971)Harsanyi (1982)Railton (1986)Rosati (1995). (For an overview of this background, see Muehlhauser & Williamson 2013.) (3) Parliamentary methods for values aggregation: using voting mechanisms to resolve challenges in normative uncertainty and values aggregation. Before Bostrom (2009) there was a vast literature on this topic in social choice theory. For recent overviews, see List (2013)Brandt et al. (2012)Rossi et al. (2011)Gaertner (2009). (4Reasoning under fragility: figuring out how to get an agent not to operate with full autonomy before it has been made fully trustworthy. Before Yudkowsky began to discuss the issue, there was much work on “adjustable autonomy”: Schreckenghost et al. (2010)Mouaddib et al. (2010)Zieba et al. (2010)Pynadath & Tambe (2002)Tambe et al. (2002). (5Logical decision theory: finding a decision algorithm which can represent the agent’s deterministic decision process. Before Yudkowsky (2010) there was: Spohn (2003)Spohn (2005). (6Stable self-improvement: getting a self-modifying agent to avoid rewriting its own code unless it has very high confidence that these rewrites will maintain desirable agent properties. Before Yudkowsky & Herreshoff (2013) there was: Schmidhuber (2003)Schmidhuber (2009)Steunebrink & Schmidhuber (2012). (7Naturalized induction: getting an induction algorithm to treat itself, its data inputs, and its hypothesis outputs as reducible to its physical posits. Before “Building Phenomenological Bridges” there was: Orseau & Ring (2011)Orseau & Ring (2012)