The Financial Times story on MIRI

 |   |  Analysis

Richard Waters wrote a story on MIRI and others for Financial Times, which also put Nick Bostrom’s Superintelligence at the top of its summer science reading list.

It’s a good piece. Go read it and then come back here so I can make a few clarifications.

 

1. Smarter-than-human AI probably isn’t coming “soon.”

“Computers will soon become more intelligent than us,” the story begins, but few experts I know think this is likely.

recent survey asked the world’s top-cited living AI scientists by what year they’d assign a 10% / 50% / 90% chance of human-level AI (aka AGI), assuming scientific progress isn’t massively disrupted. The median reply for a 10% chance of AGI was 2024, for a 50% chance of AGI it was 2050, and for a 90% chance of AGI it was 2070. So while AI scientists think it’s possible we might get AGI soon, they largely expect AGI to be an issue for the second half of this century.

Moreover, many of those who specialize in thinking about AGI safety actually think AGI is further away than the top-cited AI scientists do. For example, relative to the surveyed AI scientists, Nick Bostrom and I both think more probability should be placed on later years. We advocate more work on the AGI safety challenge today not because we think AGI is likely in the next decade or two, but because AGI safety looks to be an extremely difficult challenge — more challenging than managing climate change, for example — and one requiring several decades of careful preparation.

The greatest risks from both climate change and AI are several decades away, but thousands of smart researchers and policy-makers are already working to understand and mitigate climate change, and only a handful are working on the safety challenges of advanced AI. On the present margin, we should have much less top-flight cognitive talent going into climate change mitigation, and much more going into AGI safety research.

 

2. How many people are working to make sure AGI is friendly to humans?

The FT piece cites me as saying there are only five people in the world “working on how to [program] the super-smart machines of the not-too-distant future to make sure AI remains friendly.” I did say something kind of like this, but it requires clarification.

What I mean is that “When you add up fractions of people, there are about five people (that I know of) explicitly doing technical research on the problem of how to ensure that a smarter-than-human AI has a positive impact even as it radically improves itself.”

These fractions of people are: (a) most of the full-time labor of Eliezer Yudkowsky, Benja Fallenstein, Nate Soares (all at MIRI), and Stuart Armstrong (Oxford), plus (b) much smaller fractions of people who do technical research on “Friendly AI” on the side, for example MIRI’s (unpaid) research associates.

Of course, there are many, many more researchers than this doing (a) non-technical work on AGI safety, or doing (b) technical work on AI safety for extant or near-future systems, or doing (c) occasional technical work on AGI safety done with very different conceptions of “positive impact” or “radically improves itself” than I have.

 

3. An AGI wouldn’t necessarily see humans as “mere” collections of matter.

The article cites me as arguing that “In their single-mindedness, [AGIs] would view their biological creators as mere collections of matter, waiting to be reprocessed into something they find more useful.”

AGIs would likely have pretty accurate — and ever-improving — models of reality (e.g. via Wikipedia and millions of scientific papers), so they wouldn’t see humans as “mere” collections of matter any more than I do. Sure, humans are collections of matter, but we’re pretty special as collections of matter go. Unlike most collections of matter, we have general-purpose intelligence and consciousness and technological creativity and desires and aversions and hopes and fears and so on, and an AGI would know all that, and it would know that rocks and buildings and plants and monkeys and self-driving cars don’t have all those properties.

The point I wanted to make is that if a self-improving AGI was (say) programmed to maximize Shell’s stock price, then it would know all this about humans, and then it would just go on maximizing Shell’s stock price. It just happens to be the case that the best way to maximize Shell’s stock price is to take over the world and eliminate all potential threats to one’s achievement of that goal. In fact, for just about any goal function an AGI could have, it’s a really good idea to take over the world. That is the problem.

Even if we could program a self-improving AGI to (say) “maximize human happiness,” then the AGI would “care about humans” in a certain sense, but it might learn that (say) the most efficient way to “maximize human happiness” in the way we specified is to take over the world and then put each of us in a padded cell with a heroin drip. AGI presents us with the old problem of the all-too-literal genie: you get what you actually asked for, not what you wanted.

And yes, the AGI would be smart enough to know this wasn’t what we really wanted, especially when we start complaining about the padded cells. But we didn’t program it to do what we want. We programmed it to “maximize human happiness.”

The trouble is that “what we really want” is very hard to specify in computer code. Twenty centuries of philosophers haven’t even managed to specify it in less-exacting human languages.

 

4. “Toying with the intelligence of the gods.”

Finally, the article quotes me as saying “We’re toying with the intelligence of the gods. And there isn’t an off switch.”

I shouldn’t complain about Mr. Waters making me sound so eloquent, but I’m pretty sure I never said anything so succinct and quotable. 🙂

And of course, there is an off switch today, but there probably won’t be an off switch for an AGI smart enough to remove its shutdown mechanism (so as to more assuredly achieve its programmed goals) and copy itself across the internet — unless, that is, we solve the technical problem we call “corrigibility.”

  • PandorasBrain

    It was a very good article, and these are helpful clarifications. Seems that Elon Musk’s series of comments may be waking up the mainstream.

  • j2bryson

    I’m sorry, I find the statement about the number of people working on safe AI still unacceptably misleading scaremongering. You are saying that scientists and engineers aren’t working to protect humanity unless they explicitly buy the hype around AGI/Superintelligence. I would say that every person working on software systems engineering, with our without AI components, is doing at least as much to protect us from a runaway, badly installed intelligent system as the people hitting the headlines. That’s tens of thousands of people. Then there are all the people who explicitly work on AI ethics. That’s hundreds of people, maybe a few thousand.

    • http://CommonSenseAtheism.com lukeprog

      Thanks for responding and clarifying. Let me try to understand…

      Certainly I don’t think that the only people who are improving our odds of surviving an intelligence explosion are those fractions of people I list above. For example, not even Nick Bostrom is “explicitly doing *technical* research on the problem of how to ensure that a smarter-than-human AI has a positive impact even as it radically improves itself,” and I sure as heck think Nick Bostrom is improving our odds of surviving an intelligence explosion. The same is probably true for many people working in computer security, formal methods, safety-critical systems, hybrid systems control, and other fields. I’m just not aware of any of them “explicitly doing technical research on the problem of how to ensure that a smarter-than-human AI has a positive impact even as it radically improves itself.” Are you? If so, I’d very much like to read those papers.

      I’ve read a *lot* of AI ethics literature, and I cite a good chunk of it in “Intelligence Explosion and Machine Ethics.” Can you point me to a paper in the AI ethics literature that explicitly does technical research (novel non-trivial math or code) on the problem of how to ensure that a smarter-than-human AI has a positive impact even as it radically improves itself?

      Or, can you point me to a paper that does that from the world of software systems engineering? I’ve read some of that literature (e.g. Leveson), but I’m less familiar with it than I am with the AI ethics literature.

      We may be talking past each other, so it’d be helpful to me if you could point to some specific examples that you think falsify the claim I’ve made. I’d prefer to not go on making a false claim, but I’m just not aware of counterexamples.

      • j2bryson

        Hi – you’re right, I wasn’t as clear as I could have been if I hadn’t been late to work. Fortunately, I’ve taken the time to write and publish a lot of papers about this, and even have a web page I should update (again–I’ve been doing this since 1996, the last update was for the autonomous weapons debate) to focus on this superintelligence thing. Google “ai ethics”, my page comes up on the first page. I at least hope the page & the papers are pretty clear.

        But anyway, my basic argument is in two parts: 1) we already have superhuman intelligence – things that can do math faster and better than us, play games better & faster than us, make decisions, judge distance, etc. Some people refuse to call these things “AI” (or call them “only” AI & make a new term, AGI) because they aren’t designed near enough to apes in their motivations and goals. But that doesn’t mean they aren’t superhuman AI. 2) AI is software, and we have been spending decades on getting better at engineering large, safety-critical systems.

        In other words, I’m antirealist about the discontinuity that some people like to label AGI. I think the entire narrative detracts from the fact that A) we build AI and are responsible for it, we are ethically obliged to make it into things we don’t have ethical obligations towards or have to worry about B) the AI we’ve already built is radically changing our world, but just as a part of our own intelligence and own culture. We have been changing the world in exponentially increasing ways since 10,000 years ago (+/- 2K) when we invented writing (artificial memory) and were able to start really innovating. I think all this is distracting us from the real changes and dangers in our society now.

        Everyone who works in systems engineering, software engineering and AI systems design (including me) is working to make sure we can control the things we build. Capitalism motivates that at least as well as altruism. I think that if *you* are motivated in that direction, you should be mining and embracing rather than distancing yourself from that literature.

        • http://CommonSenseAtheism.com lukeprog

          Interesting. Now it seems like our disagreement initially sprang from the fact that you don’t recognize a discontinuity between contemporary AI and AGI, whereas I do find it quite useful to distinguish the two. Have I understood you correctly? If I have, then…

          “AGI” is a term that people use in a variety of ways. The operationalization of the term I happen to use these days is basically Nilsson’s “employment test”. To pass Nilsson’s employment test, an AI program must have at least the potential (e.g. with 6 months of training, like a human) to completely automate approximately all economically important jobs.

          That’s still pretty vague, as definitions of future technologies must often be. (We never had a precise definition of “self-driving car,” but that didn’t stop us from building one.) But at least it’s clear that no current AI program is an AGI by this definition — not even close. Moreover, it’s clear that any AGI by this definition raises safety and security concerns not raised by any current AI program. An AGI under Nilsson’s definition can do its own science, its own programming, its own marketing, its own political speechmaking.

          Of course there won’t be any literal discontinuity between non-AGI and AGI. The former will blur into the latter just as Homo ergaster blurred into Homo sapiens, such that it’s not clear where to draw the line between them (and other human ancestors). And yet the difference between Homo ergaster and Homo sapiens is vast and important. For one thing, nuclear weapons security isn’t a field needed by a planet of Homo ergasters.

          All that said, of course I agree that existing AI systems are already radically changing our world and require ethical consideration and ethical engineering, it’s just that existing AI systems aren’t what MIRI specializes in — in part, because you and thousands of others are already devoting substantial brain power to those problems.

          As for whether I’m “mining and embracing rather than distancing” myself from the contemporary AI ethics and safety engineering literature, I’ll just link to some of my writings on the subject:

          * Intelligence Explosion and Machine Ethics
          * Transparency in Safety-Critical Systems
          * Michael Fisher on verifying autonomous systems
          * Paulo Tabuada on program synthesis for cyber-physical systems
          * Diana Spears on the safety of adaptive agents
          * Anil Nerode on hybrid systems control
          * Armando Tacchella on safety in future AI systems
          * André Platzer on verifying cyber-physical systems
          * Greg Morrisett on secure and reliable systems
          * Kathleen Fisher on high-assurance systems

          All this despite the fact that MIRI doesn’t at all specialize in safety and security for extant or near-future systems — again, you and many others have already specialized in those areas. MIRI specializes on very early work investigating the kinds of problems that don’t arise until we’ve got really advanced autonomous systems, e.g. the kind capable of sufficiently general reasoning to realize that the shutdown button on its side isn’t helpful for achieving its programmed goals (in expectation) and thus should be disabled. Hence our paper on corrigibility in advanced AI systems, which uses a simple toy model to open investigation into the problem, much as Butler Lampson’s 1973 paper on the confinement problem did — two decades before the problem actually manifested itself “in the wild.”

  • eternalAI

    Engineering general AI safety is very hard. The state of our world during crossover is also important