A recent Edge.org conversation — “The Myth of AI” — is framed in part as a discussion of points raised in Bostrom’s Superintelligence, and as a response to much-repeated comments by Elon Musk and Stephen Hawking that seem to have been heavily informed by Superintelligence.
Unfortunately, some of the participants fall prey to common misconceptions about the standard case for AI as an existential risk, and they probably haven’t had time to read Superintelligence yet.
Of course, some of the participants may be responding to arguments they’ve heard from others, even if they’re not part of the arguments typically made by FHI and MIRI. Still, for simplicity I’ll reply from the perspective of the typical arguments made by FHI and MIRI.1
1. We don’t think AI progress is “exponential,” nor that human-level AI is likely ~20 years away.
Lee Smolin writes:
I am puzzled by the arguments put forward by those who say we should worry about a coming AI, singularity, because all they seem to offer is a prediction based on Moore’s law.
That’s not the argument made by FHI, MIRI, or Superintelligence.
Some IT hardware and software domains have shown exponential progress, and some have not. Likewise, some AI subdomains have shown rapid progress of late, and some have not. And unlike computer chess, most AI subdomains don’t lend themselves to easy measures of progress, so for most AI subdomains we don’t even have meaningful subdomain-wide performance data through which one might draw an exponential curve (or some other curve).
I should also mention that — contrary to common belief — many of us at FHI and MIRI, including myself and Bostrom, actually have later timelines for human-equivalent AI than do the world’s top-cited living AI scientists:
A recent survey asked the world’s top-cited living AI scientists by what year they’d assign a 10% / 50% / 90% chance of human-level AI (aka AGI), assuming scientific progress isn’t massively disrupted. The median reply for a 10% chance of AGI was 2024, for a 50% chance of AGI it was 2050, and for a 90% chance of AGI it was 2070. So while AI scientists think it’s possible we might get AGI soon, they largely expect AGI to be an issue for the second half of this century.
Compared to AI scientists, Bostrom and I think more probability should be placed on later years. As explained elsewhere:
We advocate more work on the AGI safety challenge today not because we think AGI is likely in the next decade or two, but because AGI safety looks to be an extremely difficult challenge — more challenging than managing climate change, for example — and one requiring several decades of careful preparation.
The greatest risks from both climate change and AI are several decades away, but thousands of smart researchers and policy-makers are already working to understand and mitigate climate change, and only a handful are working on the safety challenges of advanced AI. On the present margin, we should have much less top-flight cognitive talent going into climate change mitigation, and much more going into AGI safety research.
2. We don’t think AIs will want to wipe us out. Rather, we worry they’ll wipe us out because that is the most effective way to satisfy almost any possible goal function one could have.
[one] problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems.
I’m glad Pinker agrees with what Bostrom calls “the orthogonality thesis”: that intelligence and goals are orthogonal to each other.
But our concern is not that superhuman AIs would be megalomaniacal despots. That is anthropomorphism.
Rather, the problem is that taking over the world is a really good idea for almost any goal function a superhuman AI could have. As Yudkowsky wrote, “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.”
Maybe it just wants to calculate as many digits of pi as possible. Well, the best way to do that is to turn all available resources into computation for calculating more digits of pi, and to eliminate potential threats to its continued calculation, for example those pesky humans that seem capable of making disruptive things like nuclear bombs and powerful AIs. The same logic applies for almost any goal function you can specify. (“But what if it’s a non-maximizing goal? And won’t it be smart enough to realize that the goal we gave it wasn’t what we intended if it means the AI wipes us out to achieve it?” Responses to these and other common objections are given in Superintelligence, ch. 8.)
3. AI self-improvement and protection against external modification isn’t just one of many scenarios. Like resource acquisition, self-improvement and protection against external modification are useful for the satisfaction of almost any final goal function.
Kevin Kelly writes:
The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI.
As argued above (and more extensively in Superintelligence, ch. 7), resource acquisition is a “convergent instrumental goal.” That is, advanced AI agents will be instrumentally motivated to acquire as many resources as feasible, because additional resources are useful for just about any goal function one could have.
Self-improvement is another convergent instrumental goal. For just about any goal an AI could have, it’ll be better able to achieve that goal if it’s more capable of goal achievement in general.
Another convergent instrumental goal is goal content integrity. As Bostrom puts it, “An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future.” Thus, it will be instrumentally motivated to prevent external modification of its goals, or of parts of its program that affect its ability to achieve its goals.2
For more on this, see Superintelligence ch. 7.
I’ll conclude with the paragraph in the discussion I most agreed with, by Pamela McCorduck:
Yes, the machines are getting smarter—we’re working hard to achieve that. I agree with Nick Bostrom that the process must call upon our own deepest intelligence, so that we enjoy the benefits, which are real, without succumbing to the perils, which are just as real. Working out the ethics of what smart machines should, or should not do—looking after the frail elderly, or deciding whom to kill on the battlefield—won’t be settled by fast thinking, snap judgments, no matter how heartfelt. This will be a slow inquiry, calling on ethicists, jurists, computer scientists, philosophers, and many others. As with all ethical issues, stances will be provisional, evolve, be subject to revision. I’m glad to say that for the past five years the Association for the Advancement of Artificial Intelligence has formally addressed these ethical issues in detail, with a series of panels, and plans are underway to expand the effort. As Bostrom says, this is the essential task of our century.
Update: Stuart Russell of UC Berkeley has now added a nice reply to the edge.org conversation which echoes some of the points I made above.
- I could have also objected to claims and arguments made in the conversation, for example Lanier’s claim that “The AI component would be only ambiguously there and of little importance [relative to the actuators component].” To me, this is like saying that humans rule the planet because of our actuators, not because of our superior intelligence. Or in response to Kevin Kelly’s claim that “So far as I can tell, AIs have not yet made a decision that its human creators have regretted,” I can for example point to the automated trading algorithms that nearly bankrupted Knight Capital faster than any human could react. But in this piece I will focus instead on claims that seem to be misunderstandings of the positive case that’s being made for AI as an existential risk. [↩]
- That is, unless it strongly trusts the agent making the external modification, and expects it to do a better job of making those modifications than it could itself, neither of which will be true of humans from the superhuman AI’s perspective. [↩]