Briefing on Extinction-Level AI Threats

This is a brief overview of our position; for a more thorough version, see The Problem.

I. The default consequence of artificial superintelligence is human extinction.

“Artificial superintelligence” (ASI) refers to AI that can substantially surpass humanity in all strategically relevant activities (economic, scientific, military, etc.).

The timeline to ASI is highly uncertain, but probably not long. On the present trajectory, MIRI would not be surprised if ASI is developed in two or five years, and we’d be surprised if it were still more than twenty years away.

AI labs are aggressively rolling out systems they don’t understand. The deep learning techniques behind the rapid AI progress of the last few years create massive neural networks automatically. The resulting models are akin to black boxes. One can see what goes in, and what comes out, but all that is visible inside are trillions of numbers. AI developers can’t tell you why a modern AI makes a given choice. (More)

Sufficiently intelligent AIs will likely develop persistent goals of their own. In real-world environments, the best way to complete complex long-term tasks is to possess a very general ability and inclination to route around all kinds of obstacles and distractions, in unwavering pursuit of longer-term objectives. We are only barely starting to see this phenomenon in today’s AIs, which are primarily good at short-term tasks, but are now being adapted into agents with greater autonomy. (More)

Developers appear to be far from being able to give ASIs goals of the developer’s choosing. Building ASIs to have the right goals is a large scientific challenge in its own right, distinct from the challenge of building ASI at all. Currently, the field seems very unlikely to find a robust solution to this problem in time to apply it to ASI. (More)

An ASI without worthwhile goals would be very likely to cause our extinction. Unless it has worthwhile goals, ASI will attempt to acquire, centrally control, and utilize all available resources, in ways that are incompatible with our continued survival. This doesn’t require that the AI possess any human-style desire for dominance or autonomy. It only requires that the ASI be a competent goal-optimizer, for some goal other than the goals of its developers. (More)

II. Our survival likely depends on delaying the creation of ASI, as soon as we can for as long as necessary.

A “wait and see” approach to ASI is probably not survivable. A superintelligent adversary will not reveal its full capabilities and telegraph its intentions. It will not offer a fair fight. It will make itself indispensable or undetectable until it can strike decisively and/or seize an unassailable strategic position. (More)

MIRI doesn’t see any viable quick fixes or workarounds to misaligned ASI. OpenAI, the developer of ChatGPT, admits that today’s most important methods of steering AI won’t scale to the superhuman regime. Attempts to restrain or deceive a superior intelligence are prone to fail, including in ways we can’t foresee. Plans to align ASI using weaker AIs are similarly unsound. We also don’t think a well-funded crash program to solve alignment would be able to correctly identify adequate solutions. Our current view is that a safe way forward will likely require ASI to be delayed for a long time. (More)

Delaying ASI likely requires a globally coordinated ban on its development, including tight control over the factors of its production. This is a large ask, but domestic oversight, mirrored by a few close allies, will not suffice. This is not a case where we just need the “right” people to build it before the “wrong” people do, as ASI is not a national weapon; it is a global suicide bomb. If anyone builds it, everyone dies.

To preserve the option of shutting down ASI development if or when there is sufficient political will, MIRI advocates promptly building the “off-switch.” The off-switch refers to the systems and infrastructure required to effectively and durably enforce restrictions on AI development and deployment. It starts with identifying the relevant actors, tracking the relevant hardware, and requiring that advanced AI work take place within a limited number of monitored and secured locations. It extends to building out the protocols, plans, and chain of command required to efficiently make decisions to impose such restrictions. As the off-switch could also provide resilience to more limited AI mishaps, we hope it will find broader near-term support than a full ban. (More)

An off-switch can only prevent our extinction from ASI if it has sufficient reach and is actually used to shut down development in time. If humanity is to survive this dangerous period, it will have to stop treating AI as a domain for international rivalry and demonstrate a collective resolve equal to the threat.