Our primary focus at MIRI in 2018 was twofold: research—as always!—and growth.
Thanks to the incredible support we received from donors the previous year, in 2018 we were able to aggressively pursue the plans detailed in our 2017 fundraiser post. The most notable goal we set was to “grow big and grow fast,” as our new research directions benefit a lot more from a larger team, and require skills that are a lot easier to hire for. To that end, we set a target of adding 10 new research staff by the end of 2019.
2018 therefore saw us accelerate the work we started in 2017, investing more in recruitment and shoring up the foundations needed for our ongoing growth. Since our 2017 fundraiser post, we’re up 3 new research staff, including noted Haskell developer Edward Kmett. I now think that we’re most likely to hit 6–8 hires by the end of 2019, though hitting 9–10 still seems quite possible to me, as we are still engaging with many promising candidates, and continue to meet more.
Overall, 2018 was a great year for MIRI. Our research continued apace, and our recruitment efforts increasingly paid out dividends.
Below I’ll elaborate on our:
- research progress and outputs,
- research program support activities, including more details on our recruitment efforts,
- outreach related activities, and
- fundraising and spending.
Our 2018 update discussed the new research directions we’re pursuing, and the nondisclosure-by-default policy we’ve adopted for our research overall. As described in the post, these new directions aim at deconfusion (similar to our traditional research programs, which we continue to pursue), and include the themes of “seeking entirely new low-level foundations for optimization,” “endeavoring to figure out parts of cognition that can be very transparent as cognition,” and “experimenting with some [relatively deep] alignment problems,” and require building software systems and infrastructure.
In 2018, our progress on these new directions and the supporting infrastructure was steady and significant, in line with our high expectations, albeit proceeding significantly slower than we’d like, due in part to the usual difficulties associated with software development. On the whole, our excitement about these new directions is high, and we remain very eager to expand the team to accelerate our progress.
In parallel, Agent Foundations work continued to be a priority at MIRI. Our biggest publication on this front was “Embedded Agency,” co-written by MIRI researchers Scott Garabrant and Abram Demski. “Embedded Agency” reframes our Agent Foundations research agenda as different angles of attack on a single central difficulty: we don’t know how to characterize good reasoning and decision-making for agents embedded in their environment.
Below are notable technical results and analyses we released in each research category last year.1 These are accompanied by predictions made last year by Scott Garrabrant, the research lead for MIRI’s Agent Foundations work, and Scott’s assessment of the progress our published work represents against those predictions. The research categories below are explained in detail in “Embedded Agency.”
The actual share of MIRI’s research that was non-public in 2018 ended up being larger than Scott expected when he registered his predictions. The list below is best thought of as a collection of interesting (though not groundbreaking) results and analyses that demonstrate the flavor of some of the directions we explored in our research last year. As such, these assessments don’t represent our model of our overall progress, and aren’t intended to be a good proxy for that question. Given the difficulty of predicting what we’ll disclose for our 2019 public-facing results, we won’t register new predictions this year.
- Predicted progress: 3 (modest)
- Actual progress: 2 (weak-to-modest)
Scott sees our largest public decision theory result of 2018 as Prisoners’ Dilemma with Costs to Modeling, a modified version of open-source prisoners’ dilemmas in which agents must pay resources in order to model each other.
Other significant write-ups include:
- Logical Inductors Converge to Correlated Equilibria (Kinda): A game-theoretic analysis of logical inductors.
- New results in Asymptotic Decision Theory and When EDT=CDT, ADT Does Well represent incremental progress on understanding what’s possible with respect to learning the right counterfactuals.
Additional decision theory research posts from 2018:
- From Alex Appel, a MIRI contractor and summer intern: (a) Distributed Cooperation; (b) Cooperative Oracles; (c) When EDT=CDT, ADT Does Well; (d) Conditional Oracle EDT Equilibria in Games
- From Abram Demski: (a) In Logical Time, All Games are Iterated Games; (b) A Rationality Condition for CDT Is That It Equal EDT (Part 1); (c) A Rationality Condition for CDT Is That It Equal EDT (Part 2)
- From Scott Garrabrant: (a) Knowledge is Freedom; (b) Counterfactual Mugging Poker Game; (c) (A → B) → A
- From Alex Mennen, a MIRI summer intern: When Wishful Thinking Works
- Predicted progress: 3 (modest)
- Actual progress: 1 (limited)
Some of our relatively significant results related to embedded world-models included:
- Sam Eisenstat’s untrollable prior, explained in illustrated form by Abram Demski, shows that there is a Bayesian solution to one of the basic problems which motivated the development of non-Bayesian logical uncertainty tools (culminating in logical induction). This informs our picture of what’s possible, and may lead to further progress in the direction of Bayesian logical uncertainty.
- Sam Eisenstat and Tsvi Benson-Tilsen’s formulation of Bayesian logical induction. This framework, which has yet to be written up, forces logical induction into a Bayesian framework by constructing a Bayesian prior which trusts the beliefs of a logical inductor (which must supply those beliefs to the Bayesian regularly).
Sam and Tsvi’s work can be viewed as evidence that “true” Bayesian logical induction is possible. However, it can also be viewed as a demonstration that we have to be careful what we mean by “Bayesian”—the solution is arguably cheating, and it isn’t clear that you get any new desirable properties by doing things this way.
Scott assigns the untrollable prior result a 2 (weak-to-modest progress) rather than a 1 (limited progress), but is counting this among our 2017 results, since it was written up in 2018 but produced in 2017.
Other recent work in this category includes:
- From Alex Appel: (a) Resource-Limited Reflective Oracles; (b) Bounded Oracle Induction
- From Abram Demski: (a) Toward a New Technical Explanation of Technical Explanation; (b) Probability is Real, and Value is Complex
- Predicted progress: 2 (weak-to-modest)
- Actual progress: 1 (limited)
Other posts on robust delegation:
- From Stuart Armstrong (MIRI Research Associate): (a) Standard ML Oracles vs. Counterfactual Ones; (b) “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents“
- From Abram Demski: Stable Pointers to Value II: Environmental Goals
- From Scott Garrabrant: Optimization Amplifies
- From Vanessa Kosoy (MIRI Research Associate): (a) Quantilal Control for Finite Markov Decision Processes; (b) Computing An Exact Quantilal Policy
- From Alex Mennen: Safely and Usefully Spectating on AIs Optimizing Over Toy Worlds
- Predicted progress: 2 (weak-to-modest)
- Actual progress: 2
We achieved greater clarity on subsystem alignment in 2018, largely reflected in Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse,3 and Scott Garrabrant’s forthcoming paper, “Risks from Learned Optimization in Advanced Machine Learning Systems.”4 This paper is currently being rolled out on the AI Alignment Forum, as a sequence on “Mesa-Optimization.”5
Scott Garrabrant’s Robustness to Scale also discusses issues in subsystem alignment (“robustness to relative scale”), alongside other issues in AI alignment.
- Predicted progress: 2 (weak-to-modest)
- Actual progress: 2
Some of the 2018 publications we expect to be most useful cut across all of the above categories:
- “Embedded Agency,” Scott and Abram’s new introduction to all of the above research directions.
- Fixed Point Exercises, a set of exercises created by Scott to introduce people to the core ideas and tools in agent foundations research.
Here, other noteworthy posts include:
- From Scott Garrabrant: (a) Sources of Intuitions and Data on AGI; (b) History of the Development of Logical Induction
2018 Research Program Support
We added three new research staff to the team in 2018: Ben Weinstein-Raun, James Payor, and Edward Kmett.
We invested a large share of our capacity into growing the research team in 2018, and generally into activities aimed at increasing the amount of alignment research in the world, including:
- Running eight AI Risk for Computer Scientist (AIRCS) workshops. This is an ongoing all-expenses-paid workshop series for computer scientists and programmers who want to get started thinking about or working on AI alignment. At these workshops, we introduce AI risk and related concepts, share some CFAR-style rationality content, and introduce participants to the work done by MIRI and other safety research teams. Our overall aim is to cause good discussions to happen, improve participants’ ability to make progress on whether and how to contribute, and in the process work out whether they may be interested in joining MIRI or other alignment groups. Of 2018 workshop participants, we saw one join MIRI full-time, four take on internships with us, and on the order of ten with good prospects of joining MIRI within a year, in addition to several who have since joined other safety-related organizations.
- Running a 2.5-week AI Summer Fellows Program (AISFP) with CFAR.6 Additionally, MIRI researcher Tsvi Benson-Tilsen and MIRI summer intern Alex Zhu ran a mid-year AI safety retreat for MIT students and alumni.
- Running a 10-week research internship program over the summer, reviewed in our summer updates. Interns also participated in AISFP and in a joint research workshop with interns from the Center for Human-Compatible AI. Additionally, we hosted three more research interns later in the year. We are hopeful that at least one of them will join the team in 2019.
- Making grants to two individuals as part of our AI Safety Retraining Program. In 2018 we received $150k in restricted funding from the Open Philanthropy Project, “to provide stipends and guidance to a few highly technically skilled individuals. The goal of the program is to free up 3–6 months of time for strong candidates to spend on retraining, so that they can potentially transition to full-time work on AI alignment.” We issued grants to two people in 2018, including Carroll Wainwright who went on to become a Research Scientist at Partnership on AI.
In addition to the above, in 2018 we:
- Hired additional operations staff to ensure we have the required operational capacity to support our continued growth.
- Moved into new larger office space.
2018 Outreach and Exposition
On the outreach, coordination, and exposition front, we:
- Released a new edition of Rationality: From AI to Zombies, beginning with volumes one and two, featuring a number of updates to the text and an official print edition. We also made Stuart Armstrong’s 2014 book on AI risk, Smarter Than Us: The Rise of Machine Intelligence, available on the web for free at smarterthan.us.
- Released 2018 Update: Our New Research Directions, a lengthy discussion of our research, our nondisclosure-by-default policies, and the case for computer scientists and software engineers to apply to join our team.
- Produced other expository writing: Two Clarifications About “Strategic Background”; Challenges to Paul Christiano’s Capability Amplification Proposal (discussion on LessWrong, including follow-up conversations); Comment on Decision Theory; The Rocket Alignment Problem (LessWrong link).
- Received press coverage in Axios, Forbes, Gizmodo, and Vox (1, 2), and were interviewed in Nautilus and on Sam Harris’ podcast.
- Spoke at Effective Altruism Global in San Francisco and at the Human-Aligned AI Summer School in Prague.
- Presented on logical induction at the joint Applied Theory Workshop / Workshop in Economic Theory.
- Released a paper, “Categorizing Variants of Goodhart’s Law,” based on Scott Garrabrant’s 2017 “Goodhart Taxonomy.” We also reprinted Nate Soares’ “The Value Learning Problem” and Nick Bostrom and Eliezer Yudkowsky’s “The Ethics of Artificial Intelligence” in Artificial Intelligence Safety and Security.
- Several MIRI researchers also received recognition from the AI Alignment Prize, including Scott Garrabrant receiving first place and second place in the first round and second round, respectively, MIRI Research Associate Vanessa Kosoy winning first prize in the third round, and Scott and Abram Demski tying Alex Turner for first place in the fourth round.
- MIRI senior staff also participated in AI research and strategy events and conversations throughout the year.
2018 was another strong year for MIRI’s fundraising. While the total raised of just over $5.1M was a 12% drop from the amount raised in 2017, the graph below shows that our strong growth trend continued—with 2017, as I surmised in last year’s review, looking like an outlier year driven by the large influx of cryptocurrency contributions during a market high in December 2017.7
(In this chart and those that follow, “Unlapsed” indicates contributions from past supporters who did not donate in the previous year.)
- $1.02M, our largest ever single donation by an individual, from “Anonymous Ethereum Investor #2,” based in Canada, made through Rethink Charity Forward’s recently established tax-advantaged fund for Canadian MIRI supporters.8
- $1.4M in grants from the Open Philanthropy Project, $1.25M in general support and $150k for our AI Safety Retraining Program.
- $951k during our annual fundraiser, driven in large part by MIRI supporters’ participation in multiple matching campaigns during the fundraiser, including WeTrust Spring’s Ethereum-matching campaign, Facebook’s Giving Tuesday event, and in partnership with Raising for Effective Giving (REG), professional poker players’ Double Up Drive.
- $529k from 2 grants recommended by the EA Funds Long-Term Future Fund.
- $115K from Poker Stars, also through REG.
In 2018, we received contributions from 637 unique contributors, 16% less than in 2017. This drop was largely driven by a 27% reduction in the number of new donors, partly offset by the continuing trend of steady growth in the number of returning donors9:
Donations of cryptocurrency were down in 2018 both in absolute terms (-$1.2M in value) and as a percentage of total contributions (23% compared to 42% in 2017). It’s plausible that if cryptocurrency values continue to rebound in 2019, we may see this trend reversed.
In 2017, donations received from matching initiatives dramatically increased with almost a five-fold increase over the previous year. In 2018, our inclusion in two different REG-administered matching challenges, a significantly increased engagement among MIRI supporters with Facebook’s Giving Tuesday, and MIRI’s winning success in WeTrust’s Spring campaign, offset a small decrease in corporate match dollars to improve slightly on 2017’s matching total. The following graph represents the matching amounts received over the last 5 years:
Following the amazing show of support we received from donors last year (and continuing into 2018), we had significantly more funds than we anticipated, and we found more ways to usefully spend it than we expected. In particular, we’ve been able to translate the “bonus” support we received in 2017 into broadening the scope of our recruiting efforts. As a consequence, our 2018 spending, which will come in at around $3.5M, actually matches the point estimate I gave in 2017 for our 2019 budget, rather than my prediction for 2018—a large step up from what I predicted, and an even larger step from last year’s  budget of $2.1M.
The post goes on to give an overview of the ways in which we put this “bonus” support to good use. These included, in descending order by cost:
- Investing significantly more in recruiting-related activities, including our AIRCS workshop series; and scaling up the number of interns we hosted, with an increased willingness to pay higher wages to attract promising candidates to come intern/trial with us.
- Filtering less on price relative to fit when choosing new office space to accommodate our growth, and spending more on renovations, than we otherwise would have been able to, in order to create a more focused working environment for research staff.
- Raising salaries for some existing staff, who were being paid well below market rates.
With concrete numbers now in hand, I’ll go into more detail below on how we put those additional funds to work.
Total spending came in just over $3.75M. The chart below compares our actual spending in 2018 with our projections, and with our spending in 2017.10
At a high level, as expected, personnel costs in 2018 continued to account for the majority of our spending—though represented a smaller share of total spending than in 2017, due to increased spending on recruitment-related activities along with one-time costs related to securing and renovating our new office space.
Our spending on recruitment-related activities is captured in the program activities category. The major ways we put additional funds to use, which account for the increase over my projections, break down as follows:
- ~$170k on internships: We hosted nine research interns for an average of ~2.5 months each. We were able to offer more competitive wages for internships, allowing us to recruit interns (especially those with an engineering focus) that we otherwise would have had a much harder time attracting, given the other opportunities they had available to them. We are actively interested in hiring three of these interns, and have made formal offers to two of them. I’m hopeful that we’ll have added at least one of them to the team by the end of this year.
- $54k on AI Safety Retraining Program grants, described above.
- The bulk of the rest of the additional funds we spent in this category went towards funding our ongoing series of AI Risk for Computer Scientists workshops, described above.
Expenses related to our new office space are accounted for in the cost of doing business category. The surplus spending in this category resulted from:
- ~$300k for securing, renovating, and filling out our new office space. Finding a suitable new space to accommodate our growth in Berkeley ended up being much more challenging and time-consuming than we expected.11 We made use of additional funds to secure our preferred space ahead of when we were prepared to move, and to renovate the space to meet our needs, whereas if we’d been operating with the budget I originally projected, we would have almost certainly ended up in a much worse space.
- The remainder of the spending beyond my projection in this category comes from higher-than-expected legal costs to secure visas for staff, and slightly higher-than-projected spending across many other subcategories.
- Our summaries of our more significant results below largely come from our 2018 fundraiser post. ↩
- Not to be confused with Nate Soares’ forthcoming tiling agents paper. ↩
- Evan was a MIRI research intern, while Chris, Vladimir, and Joar are external collaborators. ↩
- This paper was previously cited in “Embedded Agency” under the working title “The Inner Alignment Problem.” ↩
- The full PDF version of the paper will be released in conjuction with the last post of the sequence. ↩
- As noted in our summer updates:
We had a large and extremely strong pool of applicants, with over 170 applications for 30 slots (versus 50 applications for 20 slots in 2017). The program this year was more mathematically flavored than in 2017, and concluded with a flurry of new analyses by participants. On the whole, the program seems to have been more successful at digging into AI alignment problems than in previous years, as well as more successful at seeding ongoing collaborations between participants, and between participants and MIRI staff.
The program ended with a very active blogathon, with write-ups including: Dependent Type Theory and Zero-Shot Reasoning; Conceptual Problems with Utility Functions (and follow-up); Complete Class: Consequentialist Foundations; and Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet. ↩
- Note that amounts in this section may vary slightly from our audited financial statements, due to small differences between how we tracked donations internally, and how we are required to report them in our financial statements. ↩
- A big thanks to Colm for all the work he’s put into setting this up; have a look at our Tax-Advantaged Donations page for more information. ↩
- 2014 is anomalously high on this graph due to the community’s active participation in our memorable SVGives campaign. ↩
- Note that these numbers will differ slightly compared to our forthcoming audited financial statements for 2018, due to subtleties of how certain types of expenses are tracked. For example, in the financial statements, renovation costs are considered to be a fixed asset that depreciates over time, and as such, won’t show up as an expense. ↩
- The number of options available in the relevant time frame were very limited, and most did not meet many of our requirements. Of the available spaces, the option that offered the best combination of size, layout, and location, was looking for a tenant starting November 1st 2018, while we weren’t able to move until early January 2019. Additionally, the space was configured with a very open layout that wouldn’t have met our needs, but that many other prospective tenants found desirable, such that we’d have to cover renovation costs. ↩