October 2024 newsletter

October 29, 2024 | Harlan Stewart | Newsletters

News and links

Geoffrey Hinton and John Hopfield were awarded this year’s Nobel Prize in Physics for their foundational contributions to machine learning. In a press conference following the announcement, Hinton said that “if you look around, there are very few examples of more intelligent things being controlled by less intelligent things, which makes you wonder whether when AI gets smarter than us, it’s going to take over control.”
In “A Narrow Path,” a group of AI policy researchers outlines a set of proposals for avoiding human extinction from AI.Their plan involves preventing smarter-than-human systems from being developed within the next twenty years. We are skeptical about instituting a moratorium for a fixed amount of time, rather than halting until humanity is truly ready to proceed, but we appreciate that they chose a duration measured in decades rather than months.
In “Machines of Loving Grace,” Anthropic CEO Dario Amodei explores the potential benefits of smarter-than-human AI and also briefly argues that a coalition of democracies should race to quickly build it. Responding with his own essay, Max Tegmark argues that “from a game-theoretic point of view, this race is not an arms race but a suicide race[…] Because we are closer to building AGI than we are to figuring out how to align or control it.” We agree with Tegmark’s point here.
California Governor Gavin Newsom vetoed SB 1047, the bill which would have mandated risk assessments for some AI developers. Unlike most of the bill’s critics, Newsom argued that the bill actually didn’t go far enough: “By focusing only on the most expensive and large-scale models, SB 1047 establishes a regulatory framework that could give the public a false sense of security about controlling this fast-moving technology.” In “SB 1047: Our Side Of The Story,” Scott Alexander strongly challenges this rationale and reflects on the bill’s rise and fall.
OpenAI announced o1, a model trained with reinforcement learning to do more complex reasoning than previous systems. The model was shared with red teaming organizations, but only one week before it was released. One of the organizations, Apollo Research, found that o1 acts deceptively in some situations.
There was recently a string of new research and discussion related to how the ability of language models to predict the future compares to that of humans:
- Researchers at CAIS claimed that, given the right prompt, GPT-4o could forecast at a “superhuman level.”
- Forecasting platform Metaculus ran its own study which found that “no AI has demonstrated superhuman forecasting skill yet.”
- Another group of researchers introduced a forecasting benchmark and used it to find that AI systems can forecast as well as random survey-takers, but not as well as expert forecasters.

You can subscribe to the MIRI Newsletter here.

September 2024 Newsletter

September 16, 2024 | Harlan Stewart | Newsletters

July 2024 Newsletter

July 10, 2024 | Harlan Stewart | Newsletters

June 2024 Newsletter

June 14, 2024 | Harlan Stewart | Newsletters

MIRI updates

MIRI Communications Manager Gretta Duleba explains MIRI’s current communications strategy. We hope to clearly communicate to policymakers and the general public why there’s an urgent need to shut down frontier AI development, and make the case for installing an “off-switch”. This will not be easy, and there is a lot of work to be done. Some projects we’re currently exploring include a new website, a book, and an online reference resource.
Rob Bensinger argues, contra Leopold Aschenbrenner, that the US government should not race to develop artificial superintelligence. “If anyone builds it, everyone dies.” Instead, Rob outlines a proposal for the US to spearhead an international alliance to halt progress toward the technology.
At the end of June, the Agent Foundations team, including Scott Garrabrant and others, will be parting ways with MIRI to continue their work as independent researchers. The team was originally set up and “sponsored” by Nate Soares and Eliezer Yudkowsky. However, as AI capabilities have progressed rapidly in recent years, Nate and Eliezer have become increasingly pessimistic about this type of work yielding significant results within the relevant timeframes. Consequently, they have shifted their focus to other priorities.

Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team’s focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense.

The Agent Foundations team has produced some stellar work over the years, and made a true attempt to tackle one of the most crucial challenges humanity faces today. We are deeply grateful for their many years of service and collaboration at MIRI, and we wish them the very best in their future endeavors.
The Technical Governance Team responded to NIST’s request for comments on draft documents related to the AI Risk Management Framework. The team also sent comments in response to the “Framework for MItigating AI Risks” put forward by U.S. Senators Mitt Romney (R-UT), Jack Reed (D-RI), Jerry Moran (R-KS), and Angus King (I-ME).
Brittany Ferrero has joined MIRI’s operations team. Previously, she worked on projects such as the Embassy Network and Open Lunar Foundation. We’re excited to have her help to execute on our mission.

News and links

AI alignment researcher Paul Christiano was appointed as head of AI safety at the US AI Safety Institute. Last fall, Christiano published some of his thoughts about AI regulation as well as responsible scaling policies.
The Superalignment team at OpenAI has been disbanded following the departure of its co-leaders Ilya Sutskever and Jan Leike. The team was launched last year to try to solve the AI alignment problem in four years. However, Leike says that the team struggled to get the compute it needed and that “safety culture and processes have taken a backseat to shiny products” at OpenAI. This seems extremely concerning from the perspective of evaluating OpenAI’s seriousness when it comes to safety and robustness work, particularly given that a similar OpenAI exodus occurred in 2020 in the wake of concerns about OpenAI’s commitment to solving the alignment problem.
Vox’s Kelsey Piper reports that employees who left OpenAI were subject to an extremely restrictive NDA indefinitely preventing them from criticizing the company (or admitting that they were under an NDA), under threat of losing their vested equity in the company. OpenAI executives have since contacted former employees to say that they will not enforce the NDAs. Rob Bensinger comments on these developments here, strongly criticizing OpenAI for this policy.
Korea and the UK co-hosted the AI Seoul Summit, a virtual mini-summit following up on the first AI Safety Summit (which took place in the UK last November). At the Seoul summit, 16 AI companies committed to create and publish safety frameworks, including “thresholds at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable.”
California State Senator Scott Wiener’s SB 1047 passed in the California State Senate and is now being considered in the California State Assembly. The bill requires pre-deployment testing and post-deployment monitoring for models trained with 10^26 FLOP and $100M.

You can subscribe to the MIRI Newsletter here.

MIRI 2024 Communications Strategy

May 29, 2024 | Gretta Duleba | MIRI Strategy

As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research. This follow-up post goes into detail about our communications strategy.

The Objective: Shut it Down¹

Our objective is to convince major powers to shut down the development of frontier AI systems worldwide before it is too late. We believe that nothing less than this will prevent future misaligned smarter-than-human AI systems from destroying humanity. Persuading governments worldwide to take sufficiently drastic action will not be easy, but we believe this is the most viable path.

Policymakers deal mostly in compromise: they form coalitions by giving a little here to gain a little somewhere else. We are concerned that most legislation intended to keep humanity alive will go through the usual political processes and be ground down into ineffective compromises.

The only way we think we will get strong enough legislation is if policymakers actually get it, if they actually come to understand that building misaligned smarter-than-human systems will kill everyone, including their children. They will pass strong enough laws and enforce them if and only if they come to understand this central truth.

Meanwhile, the clock is ticking. AI labs continue to invest in developing and training more powerful systems. We do not seem to be close to getting the sweeping legislation we need. So while we lay the groundwork for helping humanity to wake up, we also have a less dramatic request. We ask that governments and AI labs install the “off-switch”² so that if, on some future day, they decide to shut it all down, they will be able to do so.

We want humanity to wake up and take AI x-risk seriously. We do not want to shift the Overton window, we want to shatter it.
Read more »

May 2024 Newsletter

May 14, 2024 | Harlan Stewart | Newsletters

Update (5-15-2024): I wrote that “it appears that not all of the leading AI labs are honoring the voluntary agreements they made at [AI Safety Summit],” citing a Politico article. However, after seeing more discussion about it (e.g. here), I am now highly uncertain about whether the labs made specific commitments, what those commitments were, and whether commitments were broken. These seem like important questions, so I hope that we can get more clarity.

MIRI updates:

MIRI is shutting down the Visible Thoughts Project.
- We originally announced the project in November of 2021. At the time we were hoping we could build a new type of data set for training models to exhibit more of their inner workings. MIRI leadership is pessimistic about humanity’s ability to solve the alignment problem in time, but this was an idea that seemed relatively promising to us, albeit still a longshot.
- We also hoped that the $1+ million bounty on the project might attract someone who could build an organization to build the data set. Many of MIRI’s ambitions are bottlenecked on executive capacity, and we hoped that we might find individuals (and/or a process) that could help us spin up more projects without requiring a large amount of oversight from MIRI leadership.
- Neither hope played out, and in the intervening time, the ML field has moved on. (ML is a fast-moving field, and alignment researchers are working on a deadline; a data set we’d find useful if we could start working with it in 2022 isn’t necessarily still useful if it would only become available 2+ years later.) We would like to thank the many writers and other support staff who contributed over the last two and a half years.
Mitchell Howe and Joe Rogero joined the comms team as writers. Mitch is a longtime MIRI supporter with a background in education, and Joe is a former reliability engineer who has facilitated courses for BlueDot Impact. We’re excited to have their help in transmitting MIRI’s views to a broad audience.
Additionally, Daniel Filan will soon begin working with MIRI’s new Technical Governance Team part-time as a technical writer. Daniel is the host of two podcasts: AXRP, and The Filan Cabinet. As a technical writer, Daniel will help to scale up our research output and make the Technical Governance Team’s research legible to key audiences.
The Technical Governance Team submitted responses to the NTIA’s request for comment on open-weight AI models, the United Nations’ request for feedback on the Governing AI for Humanity interim report. and the Office of Management and Budget’s request for information on AI procurement in government.
Eliezer Yudkowsky spoke with Semafor for a piece about the risks of expanding the definition of “AI safety”. “You want different names for the project of ‘having AIs not kill everyone’ and ‘have AIs used by banks make fair loans.”

A number of important developments in the larger world occurred during the MIRI Newsletter’s hiatus from July 2022 to April 2024. To recap just a few of these:

In November of 2022, OpenAI released ChatGPT, a chatbot application that reportedly gained 100 million users within 2 months of its launch. As we mentioned in our 2024 strategy update, GPT-3.5 and GPT-4 were more impressive than some of the MIRI team expected, representing a pessimistic update for some of us “about how plausible it is that humanity could build world-destroying AGI with relatively few (or no) additional algorithmic advances”. ChatGPT’s success significantly increased public awareness of AI and sparked much of the post-2022 conversation about AI risk.
In March of 2023, the Future of Life Institute released an open letter calling for a six-month moratorium on training runs for AI systems stronger than GPT-4. Following the letter’s release, Eliezer wrote in TIME that a six-month pause is not enough and that an indefinite worldwide moratorium is needed to avert catastrophe.
In May of 2023, the Center for AI Safety released a one-sentence statement, “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” We were especially pleased with this statement, because it focused attention on existential risk in particular, and did so in a way that would be maximally understandable to policymakers and the general public. The list of signatories included the three most cited researchers in AI (Bengio, Hinton, and Sutskever) and leadership at all three of the leading AI labs (Anthropic, DeepMind, and OpenAI).
In October of 2023, President Biden signed an executive order on AI. The order’s provisions include reporting requirements for some large models, rules for federal procurement of AI products, and a directive for the NIST to develop safety standards for generative AI.
In November of 2023, the UK’s AI Safety Summit brought experts and world leaders together to discuss risks from AI. The summit showed some promise, but its outcomes so far have seemed limited. Six months later, it appears that not all of the leading AI labs are honoring the voluntary agreements they made at the summit.
In March of 2024, the European Union passed the AI Act, a broad regulatory framework for the use of all AI systems, organized into risk categories. The act includes evaluation and reporting requirements for “general-purpose AI” systems trained with more than 10^25 FLOP.
Over the past year and a half, AI systems have exhibited many new capabilities, including generating high-quality images, expert-level Stratego, expert-level Diplomacy, writing code, generating music, generating video, acing AP exams, solving Olympiad-level geometry problems, and winning drone races against human world-champions.

You can subscribe to the MIRI Newsletter here.

April 2024 Newsletter

April 12, 2024 | Harlan Stewart | Newsletters

The MIRI Newsletter is back in action after a hiatus since July 2022. To recap some of the biggest MIRI developments since then:

MIRI released its 2024 Mission and Strategy Update, announcing a major shift in focus: While we’re continuing to support various technical research programs at MIRI, our new top priority is broad public communication and policy change.
- In short, we’ve become increasingly pessimistic that humanity will be able to solve the alignment problem in time, while we’ve become more hopeful (relatively speaking) about the prospect of intergovernmental agreements to hit the brakes on frontier AI development for a very long time—long enough for the world to find some realistic path forward.

Coinciding with this strategy change, Malo Bourgon transitioned from MIRI COO to CEO, and Nate Soares transitioned from CEO to President. We also made two new senior staff hires: Lisa Thiergart, who manages our research program; and Gretta Duleba, who manages our communications and media engagement.
In keeping with our new strategy pivot, we’re growing our comms team: I (Harlan Stewart) recently joined the team, and will be spearheading the MIRI Newsletter and a number of other projects alongside Rob Bensinger. I’m a former math and programming instructor and a former researcher at AI Impacts, and I’m excited to contribute to MIRI’s new outreach efforts.
- The comms team is at the tail end of another hiring round, and we expect to scale up significantly over the coming year. Our Careers page and the MIRI Newsletter will announce when our next comms hiring round begins.

We are launching a new research team to work on technical AI governance, and we’re currently accepting applicants for roles as researchers and technical writers. The team currently consists of Lisa Thiergart and Peter Barnett, and we’re looking to scale to 5–8 people by the end of the year.
- The team will focus on researching and designing technical aspects of regulation and policy which could lead to safe AI, with attention given to proposals that can continue to function as we move towards smarter-than-human AI. This work will include: investigating limitations in current proposals such as Responsible Scaling Policies; responding to requests for comments by policy bodies such as the NIST, EU, and UN; researching possible amendments to RSPs and alternative safety standards; and communicating with and consulting for policymakers.
Now that the MIRI team is growing again, we also plan to do some fundraising this year, including potentially running an end-of-year fundraiser—our first fundraiser since 2019. We’ll have more updates about that later this year.

As part of our post-2022 strategy shift, we’ve been putting far more time into writing up our thoughts and making media appearances. In addition to announcing these in the MIRI Newsletter again going forward, we now have a Media page that will collect our latest writings and appearances in one place. Some highlights since our last newsletter in 2022:

MIRI senior researcher Eliezer Yudkowsky kicked off our new wave of public outreach in early 2023 with a very candid TIME magazine op-ed and a follow-up TED Talk, both of which appear to have had a big impact. The TIME article was the most viewed page on the TIME website for a week, and prompted some concerned questioning at a White House press briefing.
Eliezer and Nate have done a number of podcast appearances since then, attempting to share our concerns and policy recommendations with a variety of audiences. Of these, we think the best appearance on substance was Eliezer’s multi-hour conversation with Logan Bartlett.
This December, Malo was one of sixteen attendees invited by Leader Schumer and Senators Young, Rounds, and Heinrich to participate in a bipartisan forum on “Risk, Alignment, and Guarding Against Doomsday Scenarios.” Malo’s written statement is the best current write-up of MIRI’s policy recommendations. At the event, Malo found it heartening to see how far the discourse has come in a very short time—Leader Schumer opened the event by asking attendees for their probability that AI could lead to a doomsday scenario, using the term “p(doom)”.
Nate has written several particularly important essays pertaining to AI risk:
In a new report, MIRI researchers Peter Barnett and Jeremy Gillen argue that without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI.
Other unusually-good podcast appearances and write-ups include Eliezer’s appearances on Bankless, Bloomberg, and the David Pakman Show, Nate’s comments on an OpenAI strategy document, and Rob Bensinger’s take on ten relatively basic reasons to expect AGI ruin. See the Media page for a fuller list.

In next month’s newsletter, we’ll discuss some of the biggest developments in the world at large since the MIRI Newsletter went on pause, as well as returning to form with a more detailed discussion of MIRI’s most recent activities and write-ups. You can subscribe to the MIRI Newsletter here.

MIRI 2024 Mission and Strategy Update

January 4, 2024 | Malo Bourgon | MIRI Strategy

As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It’s a big pair of shoes to fill, and an awesome responsibility that I’m honored to take on.

There have been several changes at MIRI since our 2020 strategic update, so let’s get into it.¹

The short version:

We think it’s very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction and the loss of the future’s potential value, that we expect will result from loss of control to smarter-than-human AI systems.

However, developments this past year like the release of ChatGPT seem to have shifted the Overton window in a lot of groups. There’s been a lot more discussion of extinction risk from AI, including among policymakers, and the discussion quality seems greatly improved.

This provides a glimmer of hope. While we expect that more shifts in public opinion are necessary before the world takes actions that sufficiently change its course, it now appears more likely that governments could enact meaningful regulations to forestall the development of unaligned, smarter-than-human AI systems. It also seems more possible that humanity could take on a new megaproject squarely aimed at ending the acute risk period.

As such, in 2023, MIRI shifted its strategy to pursue three objectives:

Policy: Increase the probability that the major governments of the world end up coming to some international agreement to halt progress toward smarter-than-human AI, until humanity’s state of knowledge and justified confidence about its understanding of relevant phenomena has drastically changed; and until we are able to secure these systems such that they can’t fall into the hands of malicious or incautious actors.²
Communications: Share our models of the situation with a broad audience, especially in cases where talking about an important consideration could help normalize discussion of it.
Research: Continue to invest in a portfolio of research. This includes technical alignment research (though we’ve become more pessimistic that such work will have time to bear fruit if policy interventions fail to buy the research field more time), as well as research in support of our policy and communications goals.³

We see the communications work as instrumental support for our policy objective. We also see candid and honest communication as a way to bring key models and considerations into the Overton window, and we generally think that being honest in this way tends to be a good default.

Although we plan to pursue all three of these priorities, it’s likely that policy and communications will be a higher priority for MIRI than research going forward.⁴

The rest of this post will discuss MIRI’s trajectory over time and our current strategy. In one or more future posts, we plan to say more about our policy/comms efforts and our research plans.

Note that this post will assume that you’re already reasonably familiar with MIRI and AGI risk; if you aren’t, I recommend checking out Eliezer Yudkowsky’s recent short TED talk,

along with some of the resources cited on the TED page:

“A.I. Poses ‘Risk of Extinction,’ Industry Leaders Warn”, New York Times
“We must slow down the race to god-like AI”, Financial Times
“Pausing AI Developments Isn’t Enough. We Need to Shut it All Down”, TIME
“AGI Ruin: A List of Lethalities”, AI Alignment Forum

October 2024 newsletter

September 2024 Newsletter

July 2024 Newsletter

June 2024 Newsletter

MIRI 2024 Communications Strategy

The Objective: Shut it Down¹

May 2024 Newsletter

April 2024 Newsletter

MIRI 2024 Mission and Strategy Update

Search

Browse

Subscribe

News and links

MIRI updates

News and links

MIRI updates

News and links

MIRI updates

News and links

The Objective: Shut it Down1

Search

Browse

Subscribe

The Objective: Shut it Down¹