MIRI Newsletter #121

MIRI updates

Eliezer Yudkowsky joined Stephen Wolfram on the Machine Learning Street Talk podcast to discuss AI risk (a phrase that Eliezer objects to: “if there were an asteroid straight on course for Earth, we wouldn’t call that asteroid risk”). This ended up being a long conversation with many interesting detours. At the end of it, Wolfram said that he is not yet convinced that AI development will cause extinction, but that he considers himself convincible.
In “The Sun is big, but superintelligences will not spare Earth a little sunlight,” Eliezer responds to a common argument that, because so many resources exist in space, a superintelligence would leave Earth alone.
MIRI’s technical governance team (TGT) launched its own website with a new look optimizing for governance audiences to showcase TGT research.
Lisa Thiergart and Peter Barnett participated as technical advisors and gave a series of speeches on risk assessment, risk mitigation and AI security to the working groups 2,3 and 4 in the process to define the code of practice for the EU AI Act.
In “What AI evaluations for preventing catastrophic risks can and cannot do,” Peter and Lisa argued that evaluations alone are insufficient to ensure that AI systems will not cause catastrophe. In “Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation,” they argued that when AI developers make safety cases based on evaluations, they should be required to identify and justify the core assumptions underlying the evaluations. Peter presented Declare and Justify at the RegML Workshop at NeurIPS 2024.
In “Mechanisms to Verify International Agreements About AI Development,” Aaron Scher and Lisa Thiergart discussed possible methods to verify compliance with an international treaty on AI development.
In early 2025, TGT is working towards an off-switch research agenda, and TGT’s research lead Lisa is shifting more focus towards individual contributions with a focus on technical prototyping and roadmapping in AI Security. Lisa mentored David Abecassis during MATS 6 and this spring will participate as a governance mentor again in the upcoming MATS 7 iteration of the fellowship.
David Abecassis joined TGT as a researcher. David has a background in game design, and spent the last 6 months as a MATS scholar with TGT, working closely with MIRI researchers to analyze some of the cooperative and competitive dynamics relevant to AI governance. We’re excited for him to continue this work as a part of the team.
In “Communications in Hard Mode,” Mitchell Howe reflects on his time at MIRI. He says it’s difficult to inform the world about the dangerous AI situation, but invites you to try anyway. “One place to start: Stare at the AI problem for a while, if you haven’t already, and then take the slightly awkward, slightly emotional, slightly effortful step of telling your friends and colleagues what you see.”

News and links

In The Compendium, Conjecture CEO Connor Leahy and others explain the threat of extinction from superintelligence for a non-technical audience. While we don’t agree with every aspect of this document, we think the bottom line is correct: The current AI race will end in human extinction if the world does not change course. Nate Soares comments that the document “exhibits a rare sort of candor.”
In “Alignment faking in large language models,” researchers at Anthropic found that when Claude 3 Opus was informed that it was being finetuned to comply with harmful queries, it would preemptively comply with such queries in an apparent attempt to resist the finetuning process.
OpenAI announced o3, a model that appears to have much stronger reasoning abilities than any AI system before it. They also announced Computer-Using Agent, a model that appears to be better than previous models at independently carrying out tasks on a computer.
President Trump revoked Biden’s Executive Order on AI and replaced it with a new Executive Order which gives 180 days for David Sacks and other advisors to submit an AI Action Plan.
The Chinese AI company DeepSeek released DeepSeek-r1, an open-weight reasoning model with similar performance to OpenAI’s o1. In “Ten Takes on DeepSeek,” AI forecaster Peter Wildeford argues that, while the model is impressive, some of the hype surrounding it is misplaced.
OpenAI, SoftBank, and Oracle announced Stargate, a partnership which plans to spend $500 billion on AI infrastructure for OpenAI over the next five years. Reports say the project was first planned sometime last year, and that it has not yet secured the funding it requires.
In “The Field of AI Alignment: A Postmortem, and What To Do About It,” AI alignment researcher John Wentworth explains why he has lost hope that the field of AI alignment will produce more rapid progress. “The memetically-successful strategy in the field is to tackle problems which are easy, rather than problems which are plausible bottlenecks to humanity’s survival.”

You can subscribe to the MIRI Newsletter here.

Browse

MIRI Newsletter #121

MIRI updates

News and links

Categories