MIRI 2024 Mission and Strategy Update

 |   |  MIRI Strategy

As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It’s a big pair of shoes to fill, and an awesome responsibility that I’m honored to take on.

There have been several changes at MIRI since our 2020 strategic update, so let’s get into it.1


The short version:

We think it’s very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction and the loss of the future’s potential value, that we expect will result from loss of control to smarter-than-human AI systems.

However, developments this past year like the release of ChatGPT seem to have shifted the Overton window in a lot of groups. There’s been a lot more discussion of extinction risk from AI, including among policymakers, and the discussion quality seems greatly improved.

This provides a glimmer of hope. While we expect that more shifts in public opinion are necessary before the world takes actions that sufficiently change its course, it now appears more likely that governments could enact meaningful regulations to forestall the development of unaligned, smarter-than-human AI systems. It also seems more possible that humanity could take on a new megaproject squarely aimed at ending the acute risk period.

As such, in 2023, MIRI shifted its strategy to pursue three objectives:

  1. Policy: Increase the probability that the major governments of the world end up coming to some international agreement to halt progress toward smarter-than-human AI, until humanity’s state of knowledge and justified confidence about its understanding of relevant phenomena has drastically changed; and until we are able to secure these systems such that they can’t fall into the hands of malicious or incautious actors.2
  2. Communications: Share our models of the situation with a broad audience, especially in cases where talking about an important consideration could help normalize discussion of it.
  3. Research: Continue to invest in a portfolio of research. This includes technical alignment research (though we’ve become more pessimistic that such work will have time to bear fruit if policy interventions fail to buy the research field more time), as well as research in support of our policy and communications goals.3

We see the communications work as instrumental support for our policy objective. We also see candid and honest communication as a way to bring key models and considerations into the Overton window, and we generally think that being honest in this way tends to be a good default.

Although we plan to pursue all three of these priorities, it’s likely that policy and communications will be a higher priority for MIRI than research going forward.4

The rest of this post will discuss MIRI’s trajectory over time and our current strategy. In one or more future posts, we plan to say more about our policy/comms efforts and our research plans.

Note that this post will assume that you’re already reasonably familiar with MIRI and AGI risk; if you aren’t, I recommend checking out Eliezer Yudkowsky’s recent short TED talk,

along with some of the resources cited on the TED page

Read more »

Written statement of MIRI CEO Malo Bourgon to the AI Insight Forum

 |   |  Analysis, MIRI Strategy, Video

Today, December 6th, 2023, I participated in the U.S. Senate’s eighth bipartisan AI Insight Forum, which focused on the topic of “Risk, Alignment, & Guarding Against Doomsday Scenarios.” I’d like to thank Leader Schumer, and Senators Rounds, Heinrich, and Young, for the invitation to participate in the Forum.

One of the central points I made in the Forum discussion was that upcoming general AI systems are different. We can’t just use the same playbook we’ve used for the last fifty years.

Participants were asked to submit written statements of up to 5 pages prior to the event. In my statement (included below), I chose to focus on making the case for why we should expect to lose control of the future to very capable general AI systems, sketching out at a high level what I expect would ultimately be required to guard against this risk, and providing a few policy recommendations that could be important stepping stones on the way to ultimately being able to address the risk.1


(PDF version)

Leader Schumer, Senator Rounds, Senator Heinrich, and Senator Young, thank you for the invitation to participate in the AI Insight Forum series, and for giving me the opportunity to share the perspective of the Machine Intelligence Research Institute (MIRI) on the challenges humanity faces in safely navigating the transition to a world with smarter-than-human artificial intelligence (AI).

MIRI is a research nonprofit based in Berkeley, California, founded in 2000. Our focus is forward-looking: we study the technical challenges involved in making smarter-than-human AI systems safe.

To summarize the key points I’ll be discussing below: (1) It is likely that developers will soon be able to build AI systems that surpass human performance at most cognitive tasks. (2) If we develop smarter-than-human AI with anything like our current technical understanding, a loss-of-control scenario will result. (3) There are steps the U.S. can take today to sharply mitigate these risks.

Read more »

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

 |   |  Analysis

Status: Vague, sorry. The point seems almost tautological to me, and yet also seems like the correct answer to the people going around saying “LLMs turned out to be not very want-y, when are the people who expected ‘agents’ going to update?”, so, here we are.

Okay, so you know how AI today isn’t great at certain… let’s say “long-horizon” tasks? Like novel large-scale engineering projects, or writing a long book series with lots of foreshadowing?

(Modulo the fact that it can play chess pretty well, which is longer-horizon than some things; this distinction is quantitative rather than qualitative and it’s being eroded, etc.)

And you know how the AI doesn’t seem to have all that much “want”- or “desire”-like behavior?

(Modulo, e.g., the fact that it can play chess pretty well, which indicates a certain type of want-like behavior in the behaviorist sense. An AI’s ability to win no matter how you move is the same as its ability to reliably steer the game-board into states where you’re check-mated, as though it had an internal check-mating “goal” it were trying to achieve. This is again a quantitative gap that’s being eroded.)

Well, I claim that these are more-or-less the same fact. It’s no surprise that the AI falls down on various long-horizon tasks and that it doesn’t seem all that well-modeled as having “wants/desires”; these are two sides of the same coin.

Relatedly: to imagine the AI starting to succeed at those long-horizon tasks without imagining it starting to have more wants/desires (in the “behaviorist sense” expanded upon below) is, I claim, to imagine a contradiction—or at least an extreme surprise. Because the way to achieve long-horizon targets in a large, unobserved, surprising world that keeps throwing wrenches into one’s plans, is probably to become a robust generalist wrench-remover that keeps stubbornly reorienting towards some particular target no matter what wrench reality throws into its plans.

Read more »

Thoughts on the AI Safety Summit company policy requests and responses

 |   |  Analysis

Over the next two days, the UK government is hosting an AI Safety Summit focused on “the safe and responsible development of frontier AI”. They requested that seven companies (Amazon, Anthropic, DeepMind, Inflection, Meta, Microsoft, and OpenAI) “outline their AI Safety Policies across nine areas of AI Safety”.

Below, I’ll give my thoughts on the nine areas the UK government described; I’ll note key priorities that I don’t think are addressed by company-side policy at all; and I’ll say a few words (with input from Matthew Gray, whose discussions here I’ve found valuable) about the individual companies’ AI Safety Policies.1

My overall take on the UK government’s asks is: most of these are fine asks; some things are glaringly missing, like independent risk assessments.

My overall take on the labs’ policies is: none are close to adequate, but some are importantly better than others, and most of the organizations are doing better than sheer denial of the primary risks.

Read more »

AI as a science, and three obstacles to alignment strategies

 |   |  Analysis

AI used to be a science. In the old days (back when AI didn’t work very well), people were attempting to develop a working theory of cognition.

Those scientists didn’t succeed, and those days are behind us. For most people working in AI today and dividing up their work hours between tasks, gone is the ambition to understand minds. People working on mechanistic interpretability (and others attempting to build an empirical understanding of modern AIs) are laying an important foundation stone that could play a role in a future science of artificial minds, but on the whole, modern AI engineering is simply about constructing enormous networks of neurons and training them on enormous amounts of data, not about comprehending minds.

The bitter lesson has been taken to heart, by those at the forefront of the field; and although this lesson doesn’t teach us that there’s nothing to learn about how AI minds solve problems internally, it suggests that the fastest path to producing more powerful systems is likely to continue to be one that doesn’t shed much light on how those systems work.

Absent some sort of “science of artificial minds”, however, humanity’s prospects for aligning smarter-than-human AI seem to me to be quite dim.

Viewing Earth’s current situation through that lens, I see three major hurdles:

  1. Most research that helps one point AIs, probably also helps one make more capable AIs. A “science of AI” would probably increase the power of AI far sooner than it allows us to solve alignment.
  2. In a world without a mature science of AI, building a bureaucracy that reliably distinguishes real solutions from fake ones is prohibitively difficult.
  3. Fundamentally, for at least some aspects of system design, we’ll need to rely on a theory of cognition working on the first high-stakes real-world attempt.

I’ll go into more detail on these three points below. First, though, some background:

Read more »

Announcing MIRI’s new CEO and leadership team

 |   |  News

In 2023, MIRI has shifted focus in the direction of broad public communication—see, for example, our recent TED talk, our piece in TIME magazine “Pausing AI Developments Isn’t Enough. We Need to Shut it All Down”, and our appearances on various podcasts. While we’re continuing to support various technical research programs at MIRI, this is no longer our top priority, at least for the foreseeable future.

Coinciding with this shift in focus, there have also been many organizational changes at MIRI over the last several months, and we are somewhat overdue to announce them in public. The big changes are as follows:

 

  • Malo Bourgon: Chief Executive Officer (CEO)
    • Malo Bourgon, MIRI’s most long-standing team member next to Eliezer Yudkowsky, has transitioned from Chief Operating Officer into the senior leadership role at MIRI.1 We piloted the change starting in February and made it official in June.
    • This is partly an attempt to better reflect long-standing realities at MIRI. Nate’s focus for many years has been on high-level strategy and research, while Malo has handled much of the day-to-day running of the organization.
    • This change also reflects that Malo is taking on a lot more decision-making authority and greater responsibility for steering the organization. 
  • Nate Soares: President
    • Nate, who previously held the senior leadership role at MIRI (with the title of Executive Director), has transitioned to the new role of President.
    • As President (and as a board member), Nate will continue to play a central role in guiding MIRI and setting our vision and strategy.
  • Eliezer Yudkowsky: Chair of the Board
    • Eliezer, a co-founder of MIRI and a Senior Research Fellow, was already a member of MIRI’s board. 
    • We’ve now made Eliezer the board’s chair in order to better reflect the de facto reality that his views get a large weight in MIRI strategic direction.
    • Edwin Evans, who was the board’s previous chair, remains on MIRI’s board.
    • Eliezer, Nate, and Malo have different senses of which technical research directions are most promising. To balance their different views, the board currently gives each of Eliezer, Nate, and Malo a budget to fund different technical research, in addition to the research that’s funded by the organization as a whole.
  • Alex Vermeer: Chief Operating Officer (COO)
    • Alex has stepped up to replace Malo as COO.
    • As COO, Alex is responsible for running/overseeing the operations team, as he has already been doing for some time, and he’ll continue to work closely with Malo (as he has for over a decade now) to help figure out what our core constraints are, and figure out how to break them.
  • Jimmy Rintjema: Chief Financial Officer (CFO)
    • Jimmy has been working for MIRI since 2015. Over the years, Jimmy has progressively taken on more and more of the responsibility for running MIRI’s business operations, including HR and finances. 
    • As part of this transition, Jimmy is taking on more responsibility and authority in this domain, and this title change is to reflect that.

Read more »

The basic reasons I expect AGI ruin

 |   |  Analysis

I’ve been citing AGI Ruin: A List of Lethalities to explain why the situation with AI looks lethally dangerous to me. But that post is relatively long, and emphasizes specific open technical problems over “the basics”.

Here are 10 things I’d focus on if I were giving “the basics” on why I’m so worried:[1]


1. General intelligence is very powerful, and once we can build it at all, STEM-capable artificial general intelligence (AGI) is likely to vastly outperform human intelligence immediately (or very quickly).

When I say “general intelligence”, I’m usually thinking about “whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems”.

It’s possible that we should already be thinking of GPT-4 as “AGI” on some definitions, so to be clear about the threshold of generality I have in mind, I’ll specifically talk about “STEM-level AGI”, though I expect such systems to be good at non-STEM tasks too.

Human brains aren’t perfectly general, and not all narrow AI systems or animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to millions of wildly novel tasks.

More concretely:

  • AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems that are different in kind from what AlphaGo solves.
    • These problems might be solved by the STEM AGI’s programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking.[2]
  • Some examples of abilities I expect humans to only automate once we’ve built STEM-level AGI (if ever):
    • The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment.
    • The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field.
  • In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.)[3]

When I say “general intelligence is very powerful”, a lot of what I mean is that science is very powerful, and that having all of the sciences at once is a lot more powerful than the sum of each science’s impact.[4]

Another large piece of what I mean is that (STEM-level) general intelligence is a very high-impact sort of thing to automate because STEM-level AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention.

Read more »

Misgeneralization as a misnomer

 |   |  Analysis

Here’s two different ways an AI can turn out unfriendly:

  1. You somehow build an AI that cares about “making people happy”. In training, it tells people jokes and buys people flowers and offers people an ear when they need one. In deployment (and once it’s more capable), it forcibly puts each human in a separate individual heavily-defended cell, and pumps them full of opiates.
  2. You build an AI that’s good at making people happy. In training, it tells people jokes and buys people flowers and offers people an ear when they need one. In deployment (and once it’s more capable), it turns out that whatever was causing that “happiness”-promoting behavior was a balance of a variety of other goals (such as basic desires for energy and memory), and it spends most of the universe on some combination of that other stuff that doesn’t involve much happiness.

(To state the obvious: please don’t try to get your AIs to pursue “happiness”; you want something more like CEV in the long run, and in the short run I strongly recommend aiming lower, at a pivotal act .)

In both cases, the AI behaves (during training) in a way that looks a lot like trying to make people happy. Then the AI described in (1) is unfriendly because it was optimizing the wrong concept of “happiness”, one that lined up with yours when the AI was weak, but that diverges in various edge-cases that matter when the AI is strong. By contrast, the AI described in (2) was never even really trying to pursue happiness; it had a mixture of goals that merely correlated with the training objective, and that balanced out right around where you wanted them to balance out in training, but deployment (and the corresponding capabilities-increases) threw the balance off.

Note that this list of “ways things can go wrong when the AI looked like it was optimizing happiness during training” is not exhaustive! (For instance, consider an AI that cares about something else entirely, and knows you’ll shut it down if it doesn’t look like it’s optimizing for happiness. Or an AI whose goals change heavily as it reflects and self-modifies.)

Read more »