The Financial Times story on MIRI

 |   |  Analysis

Richard Waters wrote a story on MIRI and others for Financial Times, which also put Nick Bostrom’s Superintelligence at the top of its summer science reading list.

It’s a good piece. Go read it and then come back here so I can make a few clarifications.


1. Smarter-than-human AI probably isn’t coming “soon.”

“Computers will soon become more intelligent than us,” the story begins, but few experts I know think this is likely.

recent survey asked the world’s top-cited living AI scientists by what year they’d assign a 10% / 50% / 90% chance of human-level AI (aka AGI), assuming scientific progress isn’t massively disrupted. The median reply for a 10% chance of AGI was 2024, for a 50% chance of AGI it was 2050, and for a 90% chance of AGI it was 2070. So while AI scientists think it’s possible we might get AGI soon, they largely expect AGI to be an issue for the second half of this century.

Moreover, many of those who specialize in thinking about AGI safety actually think AGI is further away than the top-cited AI scientists do. For example, relative to the surveyed AI scientists, Nick Bostrom and I both think more probability should be placed on later years. We advocate more work on the AGI safety challenge today not because we think AGI is likely in the next decade or two, but because AGI safety looks to be an extremely difficult challenge — more challenging than managing climate change, for example — and one requiring several decades of careful preparation.

The greatest risks from both climate change and AI are several decades away, but thousands of smart researchers and policy-makers are already working to understand and mitigate climate change, and only a handful are working on the safety challenges of advanced AI. On the present margin, we should have much less top-flight cognitive talent going into climate change mitigation, and much more going into AGI safety research.


2. How many people are working to make sure AGI is friendly to humans?

The FT piece cites me as saying there are only five people in the world “working on how to [program] the super-smart machines of the not-too-distant future to make sure AI remains friendly.” I did say something kind of like this, but it requires clarification.

What I mean is that “When you add up fractions of people, there are about five people (that I know of) explicitly doing technical research on the problem of how to ensure that a smarter-than-human AI has a positive impact even as it radically improves itself.”

These fractions of people are: (a) most of the full-time labor of Eliezer Yudkowsky, Benja Fallenstein, Nate Soares (all at MIRI), and Stuart Armstrong (Oxford), plus (b) much smaller fractions of people who do technical research on “Friendly AI” on the side, for example MIRI’s (unpaid) research associates.

Of course, there are many, many more researchers than this doing (a) non-technical work on AGI safety, or doing (b) technical work on AI safety for extant or near-future systems, or doing (c) occasional technical work on AGI safety done with very different conceptions of “positive impact” or “radically improves itself” than I have.


3. An AGI wouldn’t necessarily see humans as “mere” collections of matter.

The article cites me as arguing that “In their single-mindedness, [AGIs] would view their biological creators as mere collections of matter, waiting to be reprocessed into something they find more useful.”

AGIs would likely have pretty accurate — and ever-improving — models of reality (e.g. via Wikipedia and millions of scientific papers), so they wouldn’t see humans as “mere” collections of matter any more than I do. Sure, humans are collections of matter, but we’re pretty special as collections of matter go. Unlike most collections of matter, we have general-purpose intelligence and consciousness and technological creativity and desires and aversions and hopes and fears and so on, and an AGI would know all that, and it would know that rocks and buildings and plants and monkeys and self-driving cars don’t have all those properties.

The point I wanted to make is that if a self-improving AGI was (say) programmed to maximize Shell’s stock price, then it would know all this about humans, and then it would just go on maximizing Shell’s stock price. It just happens to be the case that the best way to maximize Shell’s stock price is to take over the world and eliminate all potential threats to one’s achievement of that goal. In fact, for just about any goal function an AGI could have, it’s a really good idea to take over the world. That is the problem.

Even if we could program a self-improving AGI to (say) “maximize human happiness,” then the AGI would “care about humans” in a certain sense, but it might learn that (say) the most efficient way to “maximize human happiness” in the way we specified is to take over the world and then put each of us in a padded cell with a heroin drip. AGI presents us with the old problem of the all-too-literal genie: you get what you actually asked for, not what you wanted.

And yes, the AGI would be smart enough to know this wasn’t what we really wanted, especially when we start complaining about the padded cells. But we didn’t program it to do what we want. We programmed it to “maximize human happiness.”

The trouble is that “what we really want” is very hard to specify in computer code. Twenty centuries of philosophers haven’t even managed to specify it in less-exacting human languages.


4. “Toying with the intelligence of the gods.”

Finally, the article quotes me as saying “We’re toying with the intelligence of the gods. And there isn’t an off switch.”

I shouldn’t complain about Mr. Waters making me sound so eloquent, but I’m pretty sure I never said anything so succinct and quotable. 🙂

And of course, there is an off switch today, but there probably won’t be an off switch for an AGI smart enough to remove its shutdown mechanism (so as to more assuredly achieve its programmed goals) and copy itself across the internet — unless, that is, we solve the technical problem we call “corrigibility.”