# 2016 in review

|   |  MIRI Strategy

It’s time again for my annual review of MIRI’s activities.1 In this post I’ll provide a summary of what we did in 2016, see how our activities compare to our previously stated goals and predictions, and reflect on how our strategy this past year fits into our mission as an organization. We’ll be following this post up in April with a strategic update for 2017.

After doubling the size of the research team in 2015,2 we slowed our growth in 2016 and focused on integrating the new additions into our team, making research progress, and writing up a backlog of existing results.

2016 was a big year for us on the research front, with our new researchers making some of the most notable contributions. Our biggest news was Scott Garrabrant’s logical inductors framework, which represents by a significant margin our largest progress to date on the problem of logical uncertainty. We additionally released “Alignment for Advanced Machine Learning Systems” (AAMLS), a new technical agenda spearheaded by Jessica Taylor.

We also spent this last year engaging more heavily with the wider AI community, e.g., through the month-long Colloquium Series on Robust and Beneficial Artificial Intelligence we co-ran with the Future of Humanity Institute, and through talks and participation in panels at many events through the year.

### 2016 Research Progress

We saw significant progress this year in our agent foundations agenda, including Scott Garrabrant’s logical inductor formalism (which represents possibly our most significant technical result to date) and related developments in Vingean reflection. At the same time, we saw relatively little progress in error tolerance and value specification, which we had planned to put more focus on in 2016. Below, I’ll note the highlights from each of our research areas:

##### Logical Uncertainty and Naturalized Induction
• 2015 progress: sizable. (Predicted: modest.)
• 2016 progress: sizable. (Predicted: sizable.)

We saw a large body of results related to logical induction. Logical induction developed out of earlier work led by Scott Garrabrant in late 2015 (written up in April 2016) that served to divide the problem of logical uncertainty into two subproblems. Scott demonstrated that both problems could be solved at once using an algorithm that satisfies a highly general “logical induction criterion.”

This criterion provides a simple way of understanding idealized reasoning under resource limitations. In Andrew Critch’s words, logical induction is “a financial solution to the computer science problem of metamathematics”: a procedure that assigns reasonable probabilities to arbitrary (empirical, logical, mathematical, self-referential, etc.) sentences in a way that outpaces deduction, explained by analogy to inexploitable stock markets.

Our other main 2016 work in this domain is an independent line of research spearheaded by MIRI research associate Vadim Kosoy, “Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm.” Vadim approaches the problem of logical uncertainty from a more complexity-theoretic angle of attack than logical induction does, providing a formalism for defining optimal feasible approximations of computationally infeasible objects that retain a number of relevant properties of those objects.

##### Decision Theory
• 2015 progress: modest. (Predicted: modest.)
• 2016 progress: modest. (Predicted: modest.)

We continue to see a steady stream of interesting results related to the problem of defining logical counterfactuals. In 2016, we began applying the logical inductor framework to decision-theoretic problems, working with the idea of universal inductors. Andrew Critch also developed a game-theoretic method for resolving policy disagreements that outperforms standard compromise approaches and also allows for negotiators to disagree on factual questions.

We have a backlog of many results to write up in this space. Our newest, “Cheating Death in Damascus,” summarizes the case for functional decision theory, a theory that systematically outperforms the conventional academic views (causal and evidential decision theory) in decision theory and game theory. This is the basic framework we use for studying logical counterfactuals and related open problems, and is a good introductory paper for understanding our other work in this space.

For an overview of our more recent work on this topic, see Tsvi Benson-Tilsen’s decision theory index on the research forum.

##### Vingean Reflection
• 2015 progress: modest. (Predicted: modest.)
• 2016 progress: modest-to-strong. (Predicted: limited.)

Our main results in reflective reasoning last year concerned self-trust in logical inductors. After seeing no major advances in Vingean reflection for many years—the last big step forward was perhaps Benya Fallenstein’s model polymorphism proposal in late 2012—we had planned to de-prioritize work on this problem in 2016, on the assumption that other tools were needed before we could make much more headway. However, in 2016 logical induction turned out to be surprisingly useful for solving a number of outstanding tiling problems.

As described in “Logical Induction,” logical inductors provide a simple demonstration of self-referential reasoning that is highly general and accurate, is free of paradox, and assigns reasonable credence to the reasoner’s own beliefs. This provides some evidence that the problem of logical uncertainty itself is relatively central to a number of puzzles concerning the theoretical foundations of intelligence.

##### Error Tolerance
• 2015 progress: limited. (Predicted: modest.)
• 2016 progress: limited. (Predicted: modest.)

2016 saw the release of our “Alignment for Advanced ML Systems” research agenda, with a focus on error tolerance and value specification. Less progress occurred in these areas than expected, partly because investigations here are still very preliminary. We also spent less time on research in mid-to-late 2016 overall than we had planned, in part because we spent a lot of time writing up our new results and research proposals.

Nate noted in our October AMA that he considers this time investment in drafting write-ups one of our main 2016 errors, and we plan to spend less time on paper-writing in 2017.

Our 2016 work on error tolerance included “Two Problems with Causal-Counterfactual Utility Indifference” and some time we spent discussing and critiquing Dylan Hadfield-Menell’s proposal of corrigibility via CIRL. We plan to share our thoughts on the latter line of research more widely later this year.

##### Value Specification
• 2015 progress: limited. (Predicted: limited.)
• 2016 progress: weak-to-modest. (Predicted: modest.)

Although we planned to put more focus on value specification last year, we ended up making less progress than expected. Examples of our work in this area include Jessica Taylor and Ryan Carey’s posts on online learning, and Jessica’s analysis of how errors might propagate within a system of humans consulting one another.

We’re extremely pleased with our progress on the agent foundations agenda over the last year, and we’re hoping to see more progress cascading from the new set of tools we’ve developed. At the same time, it remains to be seen how tractable the new set of problems we’re tackling in the AAMLS agenda are.

### 2016 Research Support Activities

In September, we brought on Ryan Carey to support Jessica’s work on the AAMLS agenda as an assistant research fellow.3 Our assistant research fellowship program seems to be working out well; Ryan has been a lot of help to us in working with Jessica to write up results (e.g., “Bias-Detecting Online Learners”), along with setting up TensorFlow tools for a project with Patrick LaVictoire.

We’ll likely be expanding the program this year and bringing on additional assistant research fellows, in addition to a slate of new research fellows.

Focusing on other activities that relate relatively directly to our technical research program, including collaborating and syncing up with researchers in industry and academia, in 2016 we:

On the whole, our research team growth in 2016 was somewhat slower than expected. We’re still accepting applicants for our type theorist position (and for other research roles at MIRI, via our Get Involved page), but we expect to leave that role unfilled for at least the next 6 months while we focus on onboarding additional core researchers.4

### 2016 General Activities

Also in 2016, we:

### 2016 Fundraising

2016 was a strong year in MIRI’s fundraising efforts. We raised a total of $2,285,200, a 44% increase on the$1,584,109 raised in 2015. This increase was largely driven by:

• A general grant of $500,000 from the Open Philanthropy Project.5 • A donation of$300,000 from Blake Borgeson.
• Contributions of $93,548 from Raising for Effective Giving.6 • A research grant of$83,309 from the Future of Life Institute.7
• Our community’s strong turnout during our Fall Fundraiser—at $595,947, our second-largest fundraiser to date. • A gratifying show of support from supporters at the end of the year, despite our not running a Winter Fundraiser. Assuming we can sustain this funding level going forward, this represents a preliminary fulfillment of our primary fundraising goal from January 2016: Our next big push will be to close the gap between our new budget and our annual revenue. In order to sustain our current growth plans — which are aimed at expanding to a team of approximately ten full-time researchers — we’ll need to begin consistently taking in close to$2M per year by mid-2017.

As the graph below indicates, 2016 continued a positive trend of growth in our fundraising efforts.

Drawing conclusions from these year-by-year comparisons can be a little tricky. MIRI underwent significant organizational changes over this time span, particularly in 2013. We also switched to accrual-based accounting in 2014, which also complicates comparisons with previous years.

However, it is possible to highlight certain aspects of our progress in 2016:

• The Fall Fundraiser: For the first time, we held a single fundraiser in 2016 instead of our “traditional” summer and winter fundraisers—from mid-September to October 31. While we didn’t hit our initial target of $750k, we hoped that our funders were waiting to give later in the year and would make up the shortfall at the end of year. We were pleased that they came through in large numbers at the end of 2016, some possibly motivated by public posts by members of the community.8 All told, we received more contributions in December 2016 (~$430,000) than in the same month in either of the previous two years, when we actively ran Winter Fundraisers, an interesting data point for us. The following charts throw additional light on our supporters’ response to the fall fundraiser:

Note that if we remove the Open Philanthropy Project’s grant from the Pre-Fall data, the ratios across the 4 time segments all look pretty similar. Overall, this data is suggestive that, rather than a group of new funders coming in at the last moment, a segment of our existing funders chose to wait until the end of the year to donate.
• In 2016 the remarkable support we received from returning funders was particularly noteworthy, with 89% retention (in terms of dollars) from 2015 funders. To put this in a broader context, the average gift retention rate across a representative segment of the US philanthropic space over the last 5 years has been 46%.
• The number of unique funders to MIRI rose 16% in 2016—from 491 to 571—continuing a general increasing trend. 2014 is anomalously high on this graph due to the community’s active participation in our memorable SVGives campaign.9
• International support continues to make up about 20% of contributions. Unlike in the US, where increases were driven mainly by new institutional support (the Open Philanthropy Project), international support growth was driven by individuals across Europe (notably Scandinavia and the UK), Australia, and Canada.
• Use of employer matching programs increased by 17% year-on-year, with contributions of over $180,000 received through corporate matching programs in 2016, our highest to date. There are early signs of this growth continuing through 2017. • An analysis of contributions made from small, mid-sized, large, and very large funder segments shows contributions from all four segments increased proportionally from 2015: Due to the fact that we raised more than$2 million in 2016, we are now required by California law to prepare an annual financial statement audited by an independent certified public accountant (CPA). That report, like our financial reports of past years, will be made available by the end of September, on our transparency and financials page.

### Going Forward

As of July 2016, we had the following outstanding goals from mid-2015:

1. Accelerated growth: “expand to a roughly ten-person core research team.” (source)
2. Type theory in type theory project: “hire one or two type theorists to work on developing relevant tools full-time.” (source)
3. Independent review: “We’re also looking into options for directly soliciting public feedback from independent researchers regarding our research agenda and early results.” (source)

We currently have seven research fellows and assistant fellows, and are planning to hire several more in the very near future. We expect to hit our ten-fellow goal in the next 3–4 months, and to continue to grow the research team later this year. As noted above, we’re delaying moving forward on a type theorist hire.

The Open Philanthropy Project is currently reviewing our research agenda as part of their process of evaluating us for future grants. They released an initial big-picture organizational review of MIRI in September, accompanied by reviews of several recent MIRI papers (which Nate responded to here). These reviews were generally quite critical of our work, with Open Phil expressing a number of reservations about our agent foundations agenda and our technical progress to date. We are optimistic, however, that we will be able to better make our case to Open Phil in discussions going forward, and generally converge more in our views of what open problems deserve the most attention.

In our August 2016 strategic update, Nate outlined our other organizational priorities and plans:

1. Technical research: continue work on our agent foundations agenda while kicking off work on AAMLS.
2. AGI alignment overviews: “Eliezer Yudkowsky and I will be splitting our time between working on these problems and doing expository writing. Eliezer is writing about alignment theory, while I’ll be writing about MIRI strategy and forecasting questions.”
3. Academic outreach events: “To help promote our approach and grow the field, we intend to host more workshops aimed at diverse academic audiences. We’ll be hosting a machine learning workshop in the near future, and might run more events like CSRBAI going forward.”
4. Paper-writing: “We also have a backlog of past technical results to write up, which we expect to be valuable for engaging more researchers in computer science, economics, mathematical logic, decision theory, and other areas.”

All of these are still priorities for us, though we now consider 5 somewhat more important (and 6 and 7 less important). We’ve since run three ML workshops, and have made more headway on our AAMLS research agenda. We now have a large amount of content prepared for our AGI alignment overviews, and are beginning a (likely rather long) editing process. We’ve also released “Logical Induction” and have a number of other papers in the pipeline.

We’ll be providing more details on how our priorities have changed since August in a strategic update post next month. As in past years, object-level technical research on the AI alignment problem will continue to be our top priority, although we’ll be undergoing a medium-sized shift in our research priorities and outreach plans.10

1. See our previous reviews: 2015, 2014, 2013
2. From 2015 in review: “Patrick LaVictoire joined in March, Jessica Taylor in August, Andrew Critch in September, and Scott Garrabrant in December. With Nate transitioning to a non-research role, overall we grew from a three-person research team (Eliezer, Benya, and Nate) to a six-person team.”
3. As I noted in our AMA: “At MIRI, research fellow is a full-time permanent position. A decent analogy in academia might be that research fellows are to assistant research fellows as full-time faculty are to post-docs. Assistant research fellowships are intended to be a more junior position with a fixed 1–2 year term.”
4. In the interim, our research intern Jack Gallagher has continued to make useful contributions in this domain.
5. Note that numbers in this section might not exactly match previously published estimates, since small corrections are often made to contributions data. Note also that these numbers do not include in-kind donations.
6. This figure only counts direct contributions through REG to MIRI. REG/EAF’s support for MIRI is closer to $150,000 when accounting for contributions made through EAF, many made on REG’s advice. 7. We were also awarded a$75,000 grant from the Center for Long-Term Cybersecurity to pursue a corrigibility project with Stuart Russell and a new UC Berkeley postdoc, but we weren’t able to fill the intended postdoc position in the relevant timeframe and the project was canceled. Stuart Russell subsequently received a large grant from the Open Philanthropy Project to launch a new academic research institute for studying corrigibility and other AI safety issues, the Center for Human-Compatible AI
8. We received timely donor recommendations from investment analyst Ben Hoskin, Future of Humanity Institute researcher Owen Cotton-Barratt, and Daniel Dewey and Nick Beckstead of the Open Philanthropy Project (echoed by 80,000 Hours).
9. Our 45% retention of unique funders from 2015 is very much in line with funder retention across the US philanthropic space, which combined with the previous point, suggests returning MIRI funders were significantly more supportive than most.
10. My thanks to Rob Bensinger, Colm Ó Riain, and Matthew Graves for their substantial contributions to this post.

• Alexei Andreev

> Helped put together OpenAI Gym’s safety environments
Oh, I wasn’t aware MIRI was involved in that. What did you guys do specifically? (Is there a post/link I missed?)

• http://www.nothingismere.com/ Rob Bensinger

“Development of AI safety environments by Rafael Cosman and other attendees for the OpenAI Reinforcement Learning Gym, illustrating topics like interruptibility and semi-supervised learning. Ideas and conversation from Chris Olah, Dario Amodei, Paul Christiano, and Jessica Taylor helped seed these gyms, and CSRBAI participants who helped develop them included Owain Evans, Sune Jakobsen, Stuart Armstrong, Tom Everitt, Rafael Cosman, and David Krueger.”