MIRI’s 2020 has been a year of experimentation and adjustment. In response to the COVID-19 pandemic, we largely moved our operations to more rural areas in March, and shifted to a greater emphasis on remote work. We took the opportunity to try new work set-ups and approaches to research, and have been largely happy with the results.
At the same time, 2020 saw limited progress in the research MIRI’s leadership had previously been most excited about: the new research directions we started in 2017. Given our slow progress to date, we are considering a number of possible changes to our strategy, and MIRI’s research leadership is shifting much of their focus toward searching for more promising paths.
Last year, I projected that our 2020 budget would be $6.4M–$7.4M, with a point estimate of $6.8M. I now expect that our 2020 spending will be slightly above $7.4M. The increase in spending above my point estimate largely comes from expenses we incurred relocating staff and taking precautions in response to the COVID-19 pandemic.
Our budget for 2021 is fairly uncertain, given that we are more likely than usual to see high-level shifts in our strategy in the coming year. My current estimate is that our spending will fall somewhere between $6M and $7.5M, which I expect to roughly break down as follows:
Given that our research program is in a transitional period, and given the strong support we have already received this year—$4.38M from Open Philanthropy, $903k from SFF, and ~$1.1M from other contributors (thank you all!)—we aren’t holding a formal fundraiser this winter. Donations are still welcome and appreciated during this transition; but we’ll wait to make our case to donors when our plans are more solid. For now, see our donate page if you are interested in supporting our research.
Below, I’ll go into more detail on how our 2020 has gone, and on our plans for the future.
2017-Initiated Research Directions and Research Plans
In 2017, we introduced a new set of research directions, which we described and motivated more in “2018 Update: Our New Research Directions.” We wrote that we were “seeking entirely new low-level foundations for optimization,” “endeavoring to figure out parts of cognition that can be very transparent as cognition,” and “experimenting with some specific alignment problems.” In December 2019, we noted that we felt we were making “steady progress” on this research, but were disappointed with the concrete results we’d had to date.
After pushing more on these lines of research, MIRI senior staff have become more pessimistic about this approach. MIRI executive director and senior researcher Nate Soares writes:
The non-public-facing research I (Nate) was most excited about had a flavor of attempting to develop new pragmatically-feasible foundations for alignable AI, that did not rely on routing through gradient-descent-style machine learning foundations. We had various reasons to hope this could work, despite the obvious difficulties.
That project has, at this point, largely failed, in the sense that neither Eliezer nor I have sufficient hope in it for us to continue focusing our main efforts there. I’m uncertain whether it failed due to implementation failures on our part, due to the inherent difficulty of the domain, or due to flaws in the underlying theory.
Part of the reason we lost hope is a sense that we were moving too slowly, given our sense of how far off AGI may be and our sense of the difficulty of the alignment problem. The field of AI alignment is working under a deadline, such that if work is going sufficiently slowly, we’re better off giving up and pivoting to new projects that have a real chance of resulting in the first AGI systems being built on alignable foundations.
We are currently in a state of regrouping, weighing our options, and searching for plans that we believe may yet have a shot at working.
Looking at the field as a whole, MIRI’s research leadership remains quite pessimistic about most alignment proposals that we have seen put forward so far. That is, our update toward being more pessimistic about our recent research directions hasn’t reduced our pessimism about the field of alternatives, and the next directions we undertake are not likely to resemble the directions that are popular outside of MIRI today.
MIRI sees the need for a change of course with respect to these projects. At the same time, many (including Nate) still have some hope in the theory underlying this research, and have hope that the projects may be rescued in some way, such as by discovering and correcting failures in how we approached this research. But time spent on rescue efforts trades off against finding better and more promising alignment plans.
So we’re making several changes affecting staff previously focused on this work. Some are departing MIRI for different work, as we shift direction away from lines they were particularly suited for. Some are seeking to rescue the 2017-initiated lines of research. Some are pivoting to different experiments and exploration.
We are uncertain about what long-term plans we’ll decide on, and are in the process of generating new possible strategies. Some (non-mutually-exclusive) possibilities include:
- We may become a home to diverse research approaches aimed at developing a new path to alignment. Given our increased uncertainty about the best angle of attack, it may turn out to be valuable to house a more diverse portfolio of projects, with some level of intercommunication and cross-pollination between approaches.
- We may commit to an entirely new approach after a period of exploration, if we can identify one that we believe has a real chance of ensuring positive outcomes from AGI.
- We may carry forward theories and insights from our 2017-initiated research directions into future plans, in a different form.
Although our 2017-initiated research directions have been our largest focus over the last few years, we’ve been running many other research programs in parallel with it.
The bulk of this work is nondisclosed-by-default as well, but it includes work we’ve written up publicly. (Note that as a rule, this public-facing work is unrepresentative of our research as a whole.)
From our perspective, our most interesting public work this year is Scott Garrabrant’s Cartesian frames model and Vanessa Kosoy’s work on infra-Bayesianism.
Cartesian frames is a new framework for thinking about agency, intended as a successor to the cybernetic agent model. Whereas the cybernetic agent model assumes as basic an agent and environment persisting across time with a defined and stable I/O channel, Cartesian frames treat these features as more derived and dependent on how one conceptually carves up physical situations.
The Cartesian Frames sequence focuses especially on finding derived, approximation-friendly versions of the notion of “subagent” (previously discussed in “Embedded Agency”) and temporal sequence (a source of decision-theoretic problems in cases where agents can base their decisions on predictions or proofs about their own actions). The sequence’s final post discusses these and other potential directions for future work for the field to build on.
In general, MIRI’s researchers are quite interested in new conceptual frameworks like these, as research progress can often be bottlenecked on our using the wrong lenses for thinking about problems, or on our lack of a simple formalism for putting intuitions to the test.
Meanwhile, Vanessa Kosoy and Alex Appel’s infra-Bayesianism is a novel framework for modeling reasoning in cases where the reasoner’s hypothesis space may not include the true environment.
This framework is interesting primarily because it seems applicable to such a wide variety of problems: non-realizability, decision theory, anthropics, embedded agency, reflection, and the synthesis of induction/probability with deduction/logic. Vanessa describes infra-Bayesianism as “opening the way towards applying learning theory to many problems which previously seemed incompatible with it.”
2020 also saw a large update to Scott and Abram’s “Embedded Agency,” with some discussions clarified and several new subsections added. Additionally, a revised version of Vanessa’s “Optimal Polynomial-Time Estimators: A Bayesian Notion of Approximation Algorithm,” co-authored with Alex Appel, was published in the Journal of Applied Logics.
To give a picture of some of the other research areas we’ve been pushing on, we asked some MIRI researchers and research associates to pick out highlights from their work over the past year, with comments on their selections.
Abram Demski highlights the following write-ups:
- “An Orthodox Case Against Utility Functions” — “Although in some sense this is a small technical point, it is indicative of a shift in perspective in some recent agent foundations research which I think is quite important.”
- “Radical Probabilism” — “Again, although one could see this as merely an explanation of the older logical induction result, I think it points at an important shift in perspective.”
- “Learning Normativity: A Research Agenda” — “In a sense, this research agenda clarifies the shift in perspective which the above two posts were communicating, although I haven’t tied everything together yet.
- “Dutch-Booking CDT: Revised Argument” — “To my eye, this is a large-ish decision theory result.”
Evan Hubinger summarizes his public research from the past year:
- “An Overview of 11 Proposals for Building Safe Advanced AI” — “Probably my biggest project this year, this paper is my attempt at a unified explanation of the current major leading prosaic alignment proposals. The paper includes an exploration of each proposal’s pros and cons from the perspective of outer alignment, inner alignment, training competitiveness, and performance competitiveness.”
- “I started mentoring Adam Shimi and Mark Xu this year, helping them start spinning up work in AI safety. Concrete things that came out of this include Adam Shimi’s ‘Universality Unwrapped’ and Mark Xu’s ‘Does SGD Produce Deceptive Alignment?’”
- “I spent a lot of time this year thinking about AI safety via debate, which resulted in two new alternative debate proposals: ‘AI Safety via Market Making’ and ‘Synthesizing Amplification and Debate.’”
- “I spent some time looking at different alignment proposals from a computational complexity standpoint, resulting in ‘Alignment Proposals and Complexity Classes’ and ‘Weak HCH Accesses EXP.’
- “‘Outer Alignment and Imitative Amplification’ makes the case for why imitative amplification is outer aligned; ‘Learning the Prior and Generalization’ provides my perspective on Paul’s new ‘learning the prior’ approach; and ‘Clarifying Inner Alignment Terminology’ revisits terminology from ‘Risks from Learned Optimization.’”
Earlier this year, Buck Shlegeris (link) and Evan Hubinger (link) also appeared on the Future of Life Institute’s AI Alignment Podcast. Buck also gave a talk at Stanford: “My Personal Cruxes for Working on AI Safety.”
Lastly, Future of Humanity Institute researcher and MIRI research associate Stuart Armstrong summarizes his own research highlights:
- “Pitfalls of Learning a Reward Function Online,” working with DeepMind’s Jan Leike, Laurent Orseau, and Shane Legg — “This shows how agents can manipulate a “learning” process, the conditions that make that learning actually uninfluenceable, and some methods for turning influenceable learning processes into uninfluenceable ones.”
- “Model Splintering” — “Here I argue that a lot of AI safety problems can be reduced to the same problem: that of dealing with what happens when you move out of distribution from the training data. I argue that a principled way of dealing with these “model splinterings” is necessary to get safe AI, and sketch out some examples.”
- “Syntax, Semantics, and Symbol Grounding, Simplified” — “Here I argue that symbol grounding is a practical, necessary thing, not an abstract philosophical concept.”
Process Improvements and Plans
Given the unusual circumstances brought on by the COVID-19 pandemic, in 2020 MIRI decided to run various experiments to see if we could improve our researchers’ productivity while our Berkeley office was unavailable. In the process, a sizable subset of our research team has found good modifications to our work environment that we aim to maintain and expand on.
Many of our research staff who spent this year in live-work quarantine groups in relatively rural areas in response to the COVID-19 pandemic have found surprisingly large benefits from living in a quieter, lower-density area together with a number of other researchers. Coordination and research have felt faster at a meta-level, with shorter feedback cycles, more efforts on more cruxy experiments, and more resulting pivots. Our biggest such pivot has been away from our 2017-initiated research directions, as described above.
Separately, MIRI staff have been weighing the costs and benefits of possibly moving somewhere outside the Bay Area for several years—taking into account the housing crisis and other governance failures, advantages and disadvantages of the local culture, tail risks of things taking a turn for the worse in the future, and other factors.
Partly as a result of these considerations, and partly because it’s easier to move when many of us have already relocated this year due to COVID-19, MIRI is considering relocating away from Berkeley. As we weigh the options, a particularly large factor in our considerations is whether our researchers expect the location, living situation, and work setup to feel good and comfortable, as we generally expect this to result in improved research progress. Increasingly, this factor is pointing us towards moving someplace new.
Many at MIRI have noticed in the past that there are certain social settings, such as small effective altruism or alignment research retreats, that seem to spark an unusually high density of unusually productive conversations. Much of the energy and vibrancy in such retreats presumably stems from their novelty and their time-boxed nature. However, we suspect this isn’t the only reason these events tend to be dense and productive, and we believe that we may be able to create a space that has some of these features every day.
This year, a number of our researchers have indeed felt that our new work set-up during the pandemic has a lot of this quality. We’re therefore very eager to see if we can modify MIRI as a workplace so as to keep this feature around, or further augment it.
Our year, then, has been characterized by some significant shifts in our thinking about research practices and which research directions are most promising.
Although we’ve been disappointed by our level of recent concrete progress toward understanding how to align AGI-grade optimization, we plan to continue capitalizing on MIRI’s strong pool of talent and accumulated thinking about alignment as we look for new and better paths forward. We’ll provide more updates about our new strategy as our plans solidify.