# 2015 in review

|   |  MIRI Strategy

As Luke had done in years past (see 2013 in review and 2014 in review), I (Malo) wanted to take some time to review our activities from last year. In the coming weeks Nate will provide a big-picture strategy update. Here, I’ll take a look back at 2015, focusing on our research progress, academic and general outreach, fundraising, and other activities.

After seeing signs in 2014 that interest in AI safety issues was on the rise, we made plans to grow our research team. Fueled by the response to Bostrom’s Superintelligence and the Future of Life Institute’s “Future of AI” conference, interest continued to grow in 2015. This suggested that we could afford to accelerate our plans, but it wasn’t clear how quickly.

In 2015 we did not release a mid-year strategic plan, as Luke did in 2014. Instead, we laid out various conditional strategies dependent on how much funding we raised during our 2015 Summer Fundraiser. The response was great; we had our most successful fundraiser to date. We hit our first two funding targets (and then some), and set out on an accelerated 2015/2016 growth plan.

As a result, 2015 was a big year for MIRI. After publishing our technical agenda at the start of the year, we made progress on many of the open problems it outlined, doubled the size of our core research team, strengthened our connections with industry groups and academics, and raised enough funds to maintain our growth trajectory. We’re very grateful to all our supporters, without whom this progress wouldn’t have been possible.

### 2015 Research Progress

Our “Agent Foundations for Aligning Machine Intelligence with Human Interests” research agenda divides open problems into three categories: high reliability (which includes logical uncertainty, naturalized induction, decision theory, and Vingean reflection), error tolerance, and value specification.1 MIRI’s top goal in 2015 was to make progress on these problems.

We met our expectations for research progress in each category, with the exception of logical uncertainty and naturalized induction (where we made more progress than expected) and error tolerance (where we made less progress than expected).

Below I’ve provided a brief summary of our progress in each area, with additional details and a full publication list in collapsed “Read More” sections. Some of the papers we published in 2015 were based on research from 2014 or earlier, and some of our 2015 results weren’t published until 2016 (or remain unpublished). In this review I’ll focus on 2015’s new technical developments, rather than on pre-2015 material that happened to be published in that year.

##### Logical Uncertainty and Naturalized Induction

We expected to make modest progress on these two problems in 2015. I’m pleased to report we made sizable progress.

2015 saw the tail end of our development of reflective oracles, and early work on “optimal estimators.” Our most important research advance of the year, however, was likely our success dividing logical uncertainty into two subproblems, which happened in late 2015 and the very beginning of 2016.

One intuitive constraint on correct logically uncertain reasoning is that one’s probabilities reflect known logical relationships between claims. For example, if you know that two claims are mutually exclusive (such as “this computation outputs a 3” and “this computation outputs a 7”), then even if you can’t evaluate the claims, you should assign probabilities to the two claims that sum to at most 1.

A second intuitive constraint is that one’s probabilities reflect empirical regularities. Once you observe enough digits of π, you should eventually guess that the numbers 8 and 3 occur equally often in π’s decimal expansion, even if you have not yet proven that π is normal.

In 2015, we developed two different algorithms to solve these two subproblems in isolation.

In collaboration with Benya Fallenstein and other MIRI researchers, Scott Garrabrant solved the problem of respecting logical relationships in a series of Intelligent Agent Foundations Forum (IAFF) posts, resulting in the “Inductive Coherence” paper. The problem of respecting observational patterns in logical sentences was solved by Scott and the MIRIxLosAngeles group in “Asymptotic Logical Uncertainty and the Benford Test,” which was further developed into the “Asymptotic Convergence in Online Learning with Unbounded Delays” paper in 2016.

These two approaches to logical uncertainty were not only nonequivalent, but seemed to preclude each other. The obvious next step is to investigate whether there is a way to solve both subproblems at once with a single procedure—a task we have since made some (soon-to-be-announced) progress on in 2016.

MIRI research associate Vadim Kosoy’s work on his “optimal estimators” framework represents a large separate corpus of work on logical uncertainty, which may also have applications for decision theory. Vadim’s work has not yet been officially published, but much of it is available on IAFF.

Our other significant result in logical uncertainty was Benya Fallenstein, Jessica Taylor, and Paul Christiano’s reflective oracles, building on work that began before 2015 (IAFF digest). Reflective oracles avoid a number of paradoxes that normally arise when agents attempt to answer questions about equivalently powerful agents, allowing us to study multi-agent dilemmas and reflective reasoning with greater precision.

Reflective oracles are interesting in their own right, and have proven applicable to a number of distinct open problems. The fact that reflective oracles require no privileged agent/environment distinction suggests that they’re a step in the right direction for naturalized induction. Jan Leike has recently demonstrated that reflective oracles also solve a longstanding open problem in game theory, the grain of truth problem. Reflective oracles provide the first complete decision-theoretic foundation for game theory, showing that general-purpose methods for maximizing expected utility can achieve approximate Nash equilibria in repeated games.

In summary, our 2015 logical uncertainty and naturalized induction papers based on pre-2015 work were:

2015 research published the same year:

2015 research published in 2016 or forthcoming:

For other logical uncertainty work on IAFF, see The Two-Update Problem, Subsequence Induction, and Strict Dominance for the Modified Demski Prior.

##### Decision Theory

In 2015 we produced a number of new incremental advances in decision theory, constituting modest progress, in line with our expectations.

Of these advances, we have published Andrew Critch’s proof of a version of Löb’s theorem and Gödel’s second incompleteness that holds for bounded reasoners.

Critch applies this parametric bounded version of Löb’s theorem to prove that a wide range of resource-limited software agents, given access to each other’s source code, can achieve unexploitable mutual cooperation in the one-shot prisoner’s dilemma. Although we considered our past robust cooperation results strong reason to believe that bounded cooperation was possible, the confirmation is useful and gives us new formal tools for studying bounded reasoners.

Over this period, Eliezer Yudkowsky, Benya Fallenstein, and Nate Soares also improved our technical (and philosophical) understanding of the decision theory we currently favor, “functional decision theory”—a slightly modified version of updateless decision theory.

The biggest obstacle to formalizing decision theory currently seems to be that we lack a suitable formal account of logical counterfactuals. Logical counterfactuals are questions of the form “If X (which I know to be false) were true, what (if anything) would that imply about Y?” These are important in decision theory, one special case being off-policy predictions. (Even if I can predict that I’m definitely not taking action X, I want to be able to ask what would ensue if I did; a wrong answer to this can lead to me accepting substandard self-fulfilling prophecies like two-boxing in the transparent Newcomb problem.)

In 2015, we examined a decision theory related to functional decision theory, proof-based decision theory, that has proven easier to formalize. We found that proof-based decision theory’s lack of logical counterfactuals is a serious weakness for the theory.

We explored some proof-length-based approaches to logical counterfactuals, and ultimately rejected them, though we have continued to devote some thought to this approach. During our first 2015 workshop, Scott Garrabrant proposed an informal conjecture on proof length and counterfactuals, which was subsequently revised; but both versions of the conjecture were shown to be false by Sam Eisenstat (1, 2). (See also Scott’s Optimal and Causal Counterfactual Worlds.)

In a separate line of research, Patrick LaVictoire and others applied the proof-based decision theory framework to questions of bargaining and division of trade gains. For other decision theory work on IAFF, see Vadim and Scott’s Superrationality in Arbitrary Games and Armstrong’s Reflective Oracles and Superrationality: Prisoner’s Dilemma.

Our github repository contains lots of new code from our work on modal agents, representing our most novel work on decision theory in the past year. We have one or two papers in progress that will explain the advances we’ve made in decision theory via this work. See “Evil” Decision Problems in Provability Logic and other posts in the decision theory IAFF digest for background on modal universes.

Pre-2015 work published in 2015:

2015 research published in 2016 or forthcoming:

##### Vingean Reflection

We were expecting modest progress on these problems in 2015, and we made modest progress.

Benya Fallenstein and Ramana Kumar’s “Proof-Producing Reflection for HOL” demonstrates a practical form of self-reference (and a partial solution to both the Löbian obstacle and the procrastination paradox) in the HOL theorem prover. This result provides some evidence that it is possible for a reasoning system to trust another reasoning system that reasons the same way, so long as the systems have different internal states.

More specifically, this paper establishes that it is possible to formally specify an infinite chain of reasoning systems such that each system trusts the next system in the chain, as long as the reasoners are unable to delegate any individual task indefinitely.

There is some internal debate within MIRI about what more is required for real-world Vingean reflection, aside from satisfactory accounts of logical uncertainty and logical counterfactuals. There’s also debate about whether any better results than this are likely to be possible in the absence of a full theory of logical uncertainty. Regardless, “Proof-Producing Reflection for HOL” demonstrates, via machine-checked proof, that it is possible to implement a form of reflective reasoning that is remarkably strong.

Benya and Ramana’s work also provides us with an environment in which to build better toy models of reflective reasoners. Jack Gallagher, a MIRI research intern, is currently implementing a cellular automaton in HOL that will let us implement reflective agents.

By applying results from the reflective oracles framework mentioned above, we also improved our theoretical understanding of Vingean reflection. In the IAFF post A Limit-Computable, Self-Reflective Distribution, research associate Tsvi Benson-Tilsen helped solidify our understanding of what kinds of reflection are and aren’t possible. Jessica, working with Benya and Paul, further showed that reflective oracles can’t readily be used to define reflective probabilistic logics.

Pre-2015 work published in 2015:

2015 research published the same year:

Other relevant IAFF posts include A Simple Model of the Löbstacle, Waterfall Truth Predicates, and Existence of Distributions that are Expectation-Reflective and Know It.

##### Error Tolerance

We were expecting modest progress on these problems in 2015, but we made only limited progress.

Corrigibility was a mid-level priority for us in 2015, and we spent some effort trying to build better models of corrigible agents. In spite of this, we didn’t achieve any big breakthroughs. We made some progress on fixing minor defects in our understanding of corrigibility, reflected, e.g., in our error-tolerance IAFF digest, Stuart Armstrong’s AI control ideas, and Jessica Taylor’s overview post; but these results are relatively small.

In 2015 our main novelties were Google DeepMind researcher Laurent Orseau and FHI researcher / MIRI research associate Stuart Armstrong’s work on corrigibility (“Safely Interruptible Agents“), along with work on two other error tolerance subproblems: mild optimization (Jessica’s Quantilizers and Abram Demski’s Structural Risk Minimization) and conservative concepts (Jessica’s Learning a Concept Using Only Positive Examples).

Pre-2015 work published in 2015:

• N Soares, B Fallenstein, E Yudkowsky, S Armstrong. “Corrigibility.” 2014 tech report presented at the AAAI 2015 Ethics and Artificial Intelligence Workshop.

2015 research published in 2016 or forthcoming:

Our failure to make much progress on corrigibility may be a sign that corrigibility is not as tractable a problem as we thought, or that more progress is needed in areas like logical uncertainty (so that we can build better models of AI systems that model their operators as uncertain about the implications of their preferences) before we can properly formalize corrigibility.

We are more optimistic about corrigibility research, however, in light of recent advances in logical uncertainty and some promising discussions of related topics at our recent colloquium series: “Cooperative Inverse Reinforcement Learning” (via Stuart Russell’s group), “Avoiding Wireheading with Value Reinforcement Learning” (via Tom Everitt), and some items in Stuart Armstrong’s bag of tricks.

##### Value Specification

We were expecting limited progress on these problems in 2015, and we made limited progress.

Value learning and related problems were low-priority for us last year, so we didn’t see any big advances.

MIRI research associate Kaj Sotala made value specification his focus, examining several interesting questions outside our core research agenda. Jessica Taylor also began investigating the problem on the research forum.

Pre-2015 work published in 2015:

2015 research published in 2016 or forthcoming:

Error-tolerant agent designs and value specification will be larger focus areas for us going forward, under the alignment for advanced machine learning systems research program.

##### Miscellaneous

We released our technical agenda in late 2014 and early 2015. The overview paper, “Agent Foundations for Aligning Machine Intelligence with Human Interests,” is slated for external publication in The Technological Singularity in 2017.

In 2015 we also produced some research unrelated to our agent foundations agenda. This research generally focused on forecasting and strategy questions.

Pre-2015 work published in 2015:

2015 research published in 2016 or forthcoming:

Beginning in 2015, new AI strategy/forecasting research supported by MIRI has been hosted on Katja Grace’s independent AI Impacts project. AI Impacts featured 31 new articles and 27 new blog posts in 2015, on topics from the range of human intelligence to computing cost trends.

On the whole, we’re happy about our 2015 research output and expect our team growth to further accelerate technical progress.

### 2015 Research Support Activities

Focusing on activities that directly grew the technical research community or facilitated technical research and collaborations, in 2015 we:

• Launched the Intelligent Agent Foundations Forum, a public discussion forum for AI alignment researchers. MIRI researchers and collaborators made 139 top-level posts to IAFF in 2015.
• Hired four new full-time research fellows. Patrick LaVictoire joined in March, Jessica Taylor in August, Andrew Critch in September, and Scott Garrabrant in December. With Nate transitioning to a non-research role, overall we grew from a three-person research team (Eliezer, Benya, and Nate) to a six-person team.
• Overhauled our research associates program. Before 2015, our research associates were mostly unpaid collaborators with varying levels of involvement in our active research. Following our successful summer fundraiser, we made “research associate” a paid position in which researchers based at other institutions spend significant amounts of time on research projects for us. Under this program, Stuart Armstrong, Tsvi Benson-Tilsen, Abram Demski, Vadim Kosoy, Ramana Kumar, Kaj Sotala, and (prior to joining MIRI full-time) Scott Garrabrant all made significant contributions in associate roles.
• Hired three research interns. Kaya Stechly and Rafael Cosman worked on polishing and consolidating old MIRI results (example on IAFF), while Jack Gallagher worked on our type theory in type theory project (github repo).
• Acquired two new research advisors, Stuart Russell and Bart Selman.
• Hosted six summer workshops and sponsored the three-week MIRI Summer Fellows program. These events helped forge a number of new academic connections and directly resulted in us making job offers to two extremely promising attendees: Mihály Bárász (who has plans to join at a future date) and Scott Garrabrant.
• Helped organize two other academic events, a Cambridge decision theory conference and a ten-week AI alignment seminar series at UC Berkeley. We also ran 6 research retreats, sponsored 36 MIRIx events, and spoke at an Oxford Big Picture Thinking seminar series.
• Spoke at five other academic events. We participated in the Future of Life Institute’s “Future of AI” conference, AAAI-15, AGI-15, LORI 2015, and APS 2015. We also attended NIPS.

I’m excited about our 2015 progress in growing our team and collaborating with the larger academic community. Over the course of the year, we built closer relationships with people at Google DeepMind, Google Brain, OpenAI, Vicarious, Good AI, the Future of Humanity Institute, and other research groups. All of this has put us in a better position to share our research results, methodology, and goals with other researchers, and to attract new talent to AI alignment work.

### 2015 General Activities

Beyond direct research support, in 2015 we:

Although we have deemphasized outreach efforts, we continue to expect these activities to be useful for spreading general awareness about MIRI, our research program, and AI safety research more generally. Ultimately, we expect this to help build our donor base, as well as attract potential future researchers (to MIRI and the field more generally), as with our past outreach and capacity-building efforts.

### 2015 Fundraising

I am very pleased with our fundraising performance. In 2015 we:

• Continued our strong fundraising growth, with a total of $1,584,109 in contributions.3 • Received$166,943 in grants from the Future of Life Institute (FLI), with another ~$80,000 annually for the next two years.4 • Experimented with a new kind of fundraiser (non-matching, with multiple targets). I consider these experiments to have been successful. Our summer fundraiser was our biggest fundraiser of to date, raising$632,011, and our winter fundraiser also went well, raising $328,148. Total contributions grew 28% in 2015. This was driven by an increase in contributions from new funders, including a one-time$219,000 contribution from an anonymous funder, $166,943 in FLI grants, and at least$137,023 from Raising for Effective Giving (REG) and regranting from the Effective Altruism Foundation.5 The decrease in contributions from returning funders is due to Peter Thiel’s discontinuation of support in 2015, plus a large one-time outlier donation from Jed McCaleb in the years prior ($526,316 arriving in 2013,$104,822 in 2014).

Drawing conclusions from these year-by-year comparisons is a little tricky. MIRI underwent significant organizational changes over this time span, particularly in 2013. We switched to accrual-based accounting in 2014, which also complicates comparisons with previous years.6 In general, though, we’re continuing to see solid fundraising growth.

The number of new funders decreased from 2014 to 2015. In our 2014 review, Luke explains the large increase in funders in 2014:

New donor growth was strong in 2014, though this mostly came from small donations made during the SV Gives fundraiser. A significant portion of growth in returning donors can also be attributed to lapsed donors making small contributions during the SV Gives fundraiser.

Comparing our numbers in 2015 and 2013, we see healthy growth in the number of returning funders and total number of funders.

The above chart shows contributions in past years from small, mid-sized, large, and very large funder segments. Contributions from the three largest segments increased (approximately) proportionally from last year, with the notable exception of contributions from large funders, which increased from 26% to 31% of total contributions. We had a small year-over-year decrease in contributions in the small funder segment, which is again due having received an unusually large amount of small contributions during SV Gives in 2014.

As in past years, a full report on our finances (in the form of an independent accountant’s review report) will be made available on our transparency and financials page. The report will most likely be up in late August or early September.

### 2016 and Beyond

What’s next? Beyond our research goal of making significant progress in five of our six focus areas, we set the following operational goals for ourselves in July/August 2015:

1. Accelerated growth: “expand to a roughly ten-person core research team.” (source)
2. Type theory in type theory project: “hire one or two type theorists to work on developing relevant tools full-time.” (source)
3. Visiting scholar program: “have interested professors drop by for the summer, while we pay their summer salaries and work with them on projects where our interests overlap.” (source)
4. Independent review: “We’re also looking into options for directly soliciting public feedback from independent researchers regarding our research agenda and early results.” (source)
5. Higher-visibility publications: “Our current plan this year is to focus on producing a few high-quality publications in elite venues.” (source)

In 2015 we doubled the size of our research team from three to six. With the restructuring of our research associates program and the addition of two research interns, I’m pleased with the growth we achieved in 2015. We deemphasized growth in the first half of 2016 in order to focus on onboarding, but plan to expand again by the end of the year.

We have a job ad out for our type theorist position, which will likely be filled after we make our next few core researcher hires. In the interim, we’ve been having our research intern Jack Gallagher work on the type theory in type theory project, and we also ran an April 2016 type theory workshop.

With help from our research advisors, our visiting scholars program morphed into a three-week-long colloquium series. Rather than hosting a handful of researchers for longer periods of time, we hosted over fifty researchers for shorter stretches of time, comparing notes on a wide variety of active AI safety research projects. Speakers at the event included Stuart Russell, Francesca Rossi, Tom Dietterich, and Bart Selman. We’re also collaborating with Stuart Russell on a corrigibility grant.

Work is underway on conducting an external review of our research program; the results should be available in the next few months.

With regards to our fifth goal, in addition to “Proof-Producing Reflection for HOL” (which was presented at ITP 2015 in late August), we’ve since published papers at LORI-V (“Reflective Oracles”), at UAI 2016 (“Safely Interruptible Agents” and “A Formal Solution to the Grain of Truth Problem”), and at an IJCAI 2016 workshop (“The Value Learning Problem”). Of those venues, UAI is generally considered more prestigious than most venues that we have published in in the past. I’d count this as moderate (but not great) progress towards the goal of publishing in more elite venues. Nate will have more to say about our future publication plans.

Elaborating further on our plans would take me beyond the scope of this review. In the coming weeks, Nate will be providing more details on our 2016 activities and our goals going forward in a big-picture MIRI strategy post.7

1. This paper was originally titled “Aligning Superintelligence with Human Interests.” We’ve renamed it in order to emphasize that this research agenda takes a specific approach to the alignment problem, and other approaches are possible too—including, relevantly, Jessica Taylor’s new “Alignment for Advanced Machine Learning Systems” agenda.
2. I (Malo Bourgon) more recently took on a leadership role as MIRI’s new COO and second-in-command.
3. $80,480 of this was earmarked funding for the AI Impacts project. 4. MIRI is administering three FLI grants (and participated in a fourth). We are to receive$250,000 over three years to fund work on our agent foundations technical agenda, $49,310 towards AI Impacts, and we are administering Ramana’s$36,750 to study self-reference in the HOL theorem prover in collaboration with Benya.
5. This only counts direct contributions through REG to MIRI. REG’s support for MIRI is likely closer to \$200,000 when accounting for contributions made directly to MIRI as a result of REG’s advice to funders.
6. Also note that numbers in this section might not exactly match previously published estimates, since small corrections are often made to contributions data. Finally, note that these number do not include in-kind donations.
7. My thanks to Rob Bensinger for his substantial contributions to this review.

Thanks for posting updates on MIRIs progress. However, there is alot of jargon in here that might not be immediately understandable to those not directly in the field. Since its a review post, in the future can you give very short expansions of what some of these concepts are? It will help situate your advancements and make it clearer to those in the broader field.

• http://www.nothingismere.com/ Rob Bensinger

Our hope was that the hyperlinks would make it easier for people to learn more about most of these ideas, while keeping the post concise for people who already have the background. But I’ll keep your feedback in mind that we may want to do more exposition next time; thanks!