Brooks and Searle on AI volition and timelines

 |   |  Analysis

Nick Bostrom’s concerns about the future of AI have sparked a busy public discussion. His arguments were echoed by leading AI researcher Stuart Russell in “Transcending complacency on superintelligent machines” (co-authored with Stephen Hawking, Max Tegmark, and Frank Wilczek), and a number of journalists, scientists, and technologists have subsequently chimed in. Given the topic’s complexity, I’ve been surprised by the positivity and thoughtfulness of most of the coverage (some overused clichés aside).

Unfortunately, what most people probably take away from these articles is ‘Stephen Hawking thinks AI is scary!’, not the chains of reasoning that led Hawking, Russell, or others to their present views. When Elon Musk chimes in with his own concerns and cites Bostrom’s book Superintelligence: Paths, Dangers, Strategies, commenters seem to be more interested in immediately echoing or dismissing Musk’s worries than in looking into his source.

The end result is more of a referendum on people’s positive or negative associations with the word ‘AI’ than a debate over Bostrom’s substantive claims. If ‘AI’ calls to mind science fiction dystopias for you, the temptation is to squeeze real AI researchers into your ‘mad scientists poised to unleash an evil robot army’ stereotype. Equally, if ‘AI’ calls to mind your day job testing edge detection algorithms, that same urge to force new data into old patterns makes it tempting to squeeze Bostrom and Hawking into the ‘naïve technophobes worried about the evil robot uprising’ stereotype.

Thus roboticist Rodney Brooks’ recent blog post “Artificial intelligence is a tool, not a threat” does an excellent job dispelling common myths about the cutting edge of AI, and philosopher John Searle’s review of Superintelligence draws out some important ambiguities in our concepts of subjectivity and mind; but both writers scarcely intersect with Bostrom’s (or Russell’s, or Hawking’s) ideas. Both pattern-match Bostrom to the nearest available ‘evil robot panic’ stereotype, and stop there.

Brooks and Searle don’t appreciate how new the arguments in Superintelligence are. In the interest of making it easier to engage with these important topics, and less appealing to force the relevant technical and strategic questions into the model of decades-old debates, I’ll address three of the largest misunderstandings one might come away with after seeing Musk, Searle, Brooks, and others’ public comments: conflating present and future AI risks, conflating risk severity with risk imminence, and conflating risk from autonomous algorithmic decision-making with risk from human-style antisocial dispositions.


Misconception #1: Worrying about AGI means worrying about narrow AI

Some of the miscommunication in this debate can be blamed on bad terminology. By ‘AI,’ researchers in the field generally mean a range of techniques used in machine learning, robotics, speech recognition, etc. ‘AI’ also gets tossed around as a shorthand for ‘artificial general intelligence’ (AGI) or ‘human-level AI.’ Keeping a close eye on technologies that are likely to lead to AGI isn’t the same thing as keeping a close eye on AI in general, and it isn’t surprising that AI researchers would find the latter proposal puzzling. (It doesn’t help that most researchers are hearing these arguments indirectly, and aren’t aware of the specialists in AI and technological forecasting who are making the same arguments as Hawking — or haven’t encountered arguments for looking into AGI safety at all, just melodramatic headlines and tweets.)

Brooks thinks that behind this terminological confusion lies an empirical confusion on the part of people calling for AGI safety research. He takes it that people’s worries about “evil AI” must be based on a mistaken view of how powerful narrow AI is, or how large are the strides it’s making toward general intelligence:

I think the worry stems from a fundamental error in not distinguishing the difference between the very real recent advances in a particular aspect of AI, and the enormity and complexity of building sentient volitional intelligence.

One good reason to think otherwise is that Bostrom is the director of the Future of Humanity Institute (FHI), an Oxford research center investigating the largest technology trends and challenges we are likely to see on a timescale of centuries. Futurists like Bostrom are looking for ways to invest early in projects that will pay major long-term dividends — guarding against catastrophic natural disasters, developing space colonization capabilities, etc. If Bostrom learned that a critically important technology were 50 or more years away, it would be substantially out of character for him to suddenly stop caring about it.

When groups that are in the midst of a lively conversation about nuclear proliferation, global biosecurity, and humanity’s cosmic endowment collide with groups that are having their own lively conversation about revolutionizing housecleaning and designing more context-sensitive smartphone apps, some amount of inferential distance (to say nothing of mood whiplash) is inevitable. I’m reminded of the ‘But it’s snowing outside!’ rejoinder to people worried about the large-scale human cost of climate change. It’s not that local weather is unimportant, or that it’s totally irrelevant to long-term climatic warming trends; it’s that there’s been a rather sudden change in topic.1

We should be more careful about distinguishing these two senses of ‘AI.’ We may not understand AGI well enough to precisely define it, but we can at least take the time to clarify the topic of discussion: Nobody’s asking whether a conspiracy of roombas and chatterbots could take over the world.

Image 1When robots attack! (Source: xkcd.)



Misconception #2: Worrying about AGI means being confident it’s near

A number of futurists, drawing inspiration from Ray Kurzweil’s claim that technological progress inevitably follows a Moore’s-law-style exponential trajectory, have made some very confident predictions about AGI timelines. Kurzweil himself argues that we can expect to produce human-level AI in about 15 years, followed by superintelligent AI 15 years after that.2 Brooks responds that the ability to design an AGI may lag far behind the computing power required to run one:

As a comparison, consider that we have had winged flying machines for well over 100 years. But it is only very recently that people like Russ Tedrake at MIT CSAIL have been able to get them to land on a branch, something that is done by a bird somewhere in the world at least every microsecond. Was it just Moore’s law that allowed this to start happening? Not really. It was figuring out the equations and the problems and the regimes of stall, etc., through mathematical understanding of the equations. Moore’s law has helped with MATLAB and other tools, but it has not simply been a matter of pouring more computation onto flying and having it magically transform. And it has taken a long, long time.

Expecting more computation to just magically get to intentional intelligences, who understand the world is similarly unlikely.3

This is an entirely correct point. However, Bostrom’s views are the ones that set off the recent public debate, and Bostrom isn’t a Kurzweilian. It may be that Brooks is running off of the assumption ‘if you say AGI safety is an urgent issue, you must think that AGI is imminent,’ in combination with ‘if you think AGI is imminent, you must have bought into Kurzweil’s claims.’ Searle, in spite of having read Superintelligence, gives voice to a similar conclusion:

Nick Bostrom’s book, Superintelligence, warns of the impending apocalypse. We will soon have intelligent computers, computers as intelligent as we are, and they will be followed by superintelligent computers vastly more intelligent that are quite likely to rise up and destroy us all.

If what readers take away from language like “impending” and “soon” is that Bostrom is unusually confident that AGI will come early, or that Bostrom is confident we’ll build a general AI this century, then they’ll be getting the situation exactly backwards.

According to a 2013 survey of the most cited authors in artificial intelligence, experts expect AI to be able to “carry out most human professions at least as well as a typical human” with a 10% probability by the (median) year 2024, with 50% probability by 2050, and with 90% probability by 2070, assuming uninterrupted scientific progress. Bostrom is less confident than this that AGI will arrive so soon:

My own view is that the median numbers reported in the expert survey do not have enough probability mass on later arrival dates. A 10% probability of HLMI [human-level machine intelligence] not having been developed by 2075 or even 2100 (after conditionalizing on “human scientific activity continuing without major negative disruption”) seems too low.

Historically, AI researchers have not had a strong record of being able to predict the rate of advances in their own field or the shape that such advances would take. On the one hand, some tasks, like chess playing, turned out to be achievable by means of surprisingly simple programs; and naysayers who claimed that machines would “never” be able to do this or that have repeatedly been proven wrong. On the other hand, the more typical errors among practitioners have been to underestimate the difficulties of getting a system to perform robustly on real-world tasks, and to overestimate the advantages of their own particular pet project or technique.

Bostrom does think that superintelligent AI is likely to arise soon after the first AGI, via an intelligence explosion. Once AI is capable of high-quality scientific inference and planning in domains like computer science, Bostrom predicts that the process of further improving AI will become increasingly automated. Silicon works cheaper and faster than a human programmer can, and a program that can improve the efficiency of its own planning and science abilities could substantially outpace humans in scientific and decision-making tasks long before hitting diminishing marginal returns in self-improvements.

However, the question of how soon we will create AGI is distinct from the question of how soon thereafter AGI will systematically outperform humans. Analogously, you can think that the arrival of quantum computers will swiftly revolutionize cybersecurity, without asserting that quantum computers are imminent. A failure to disentangle these two theses might be one reason for the confusion about Bostrom’s views.4

If the director of FHI (along with the director of MIRI) is relatively skeptical that we’ll see AGI soon — albeit quite a bit less skeptical than Brooks — why does he think we should commit attention to this issue now? One reason is that reliable AGI is likely to be much more difficult to build than AGI. It wouldn’t be much consolation to learn that AGI is 200 years away, if we also learned that safe AGI were 250 years away. In existing cyber-physical systems, safety generally lags behind capability.5 If we want to reverse that trend by the time we have AGI, we’ll probably need a big head start. MIRI’s research guide summarizes some of the active technical work on this problem. Similar progress in exploratory engineering has proved fruitful in preparing for post-quantum cryptography and covert channel communication.

A second reason to prioritize AGI safety research is that there is a great deal of uncertainty about when AGI will be developed. It could come sooner than we expect, and it would be much better to end up with a system that’s too safe than one that’s not safe enough.

Brooks recognizes that AI predictions tend to be wildly unreliable, yet he also seems confident that general-purpose AI is multiple centuries away (and that this makes AGI safety a non-issue):

Just how open the question of time scale for when we will have human level AI is highlighted by a recent report by Stuart Armstrong and Kaj Sotala, of the Machine Intelligence Research Institute, an organization that itself has researchers worrying about evil AI. But in this more sober report, the authors analyze 95 predictions made between 1950 and the present on when human level AI will come about. They show that there is no difference between predictions made by experts and non-experts. And they also show that over that 60 year time frame there is a strong bias towards predicting the arrival of human level AI as between 15 and 25 years from the time the prediction was made. To me that says that no one knows, they just guess, and historically so far most predictions have been outright wrong!

I say relax everybody. If we are spectacularly lucky we’ll have AI over the next thirty years with the intentionality of a lizard, and robots using that AI will be useful tools.

We have no idea when AGI will arrive! Relax! One of the authors Brooks cites, Kaj Sotala,6 points out this odd juxtaposition in a blog comment:

I do find it slightly curious to note that you first state that nobody knows when we’ll have AI and that everyone’s just guessing, and then in the very next paragraph, you make a very confident statement about human-level AI (HLAI) being so far away as to not be worth worrying about. To me, our paper suggests that the reasonable conclusion to draw is “maybe HLAI will happen soon, or maybe it will happen a long time from now – nobody really knows for sure, so we shouldn’t be too confident in our predictions in either direction”.

Confident pessimism about a technology’s feasibility can be just as mistaken as confident optimism. Reversing the claims of an unreliable predictor does not necessarily get you a reliable prediction. A scientifically literate person living in 1850 could observe the long history of failed heavier-than-air flight attempts and predictions, and have grounds to be fairly skeptical that we’d have such machines within 60 years. On the other hand (though we should be wary of hindsight bias here), it probably wouldn’t have been reasonable at the time to confidently conclude that heavier-than-air flight was ‘centuries away.’ There may not have been good reason to expect the Wright brothers’ success, but ignorance about how one might achieve something is not the same as positive knowledge that it’s effectively unachievable.

One would need a very good model of heavier-than-air flight in order to predict whether it’s 50 years away, or 100, or 500. In the same way, we would need to already understand AGI on a pretty sophisticated level in order to predict with any confidence that it will be invented closer to the year 2500 than to the year 2100. Extreme uncertainty about when an event will occur is not a justification for thinking it’s a long way off.

This isn’t an argument for thinking AGI is imminent. That prediction too would require that we claim more knowledge than we have. It’s entirely possible that we’re in the position of someone anticipating the Wright brothers from 1750, rather than from 1850. We should be able to have a sober discussion about each of these possibilities independently, rather than collapsing ‘is AGI an important risk?’, ‘is AI a valuable tool?’, and ‘is AI likely to produce AGI by the year such-and-such?’ into one black-and-white dilemma.


Misconception #3: Worrying about AGI means worrying about “malevolent” AI

Brooks argues that AI will be a “tool” and not a “threat” over the coming centuries, on the grounds that it will be technologically impossible to make AIs human-like enough to be “malevolent” or “intentionally evil to us.” The implication is that an AGI can’t be dangerous unless it’s cruel or hateful, and therefore a dangerous AI would have to be “sentient,” “volitional,” and “intentional.” Searle puts forward an explicit argument along these lines in his review of Superintelligence:

[I]f we are worried about a maliciously motivated superintelligence destroying us, then it is important that the malicious motivation should be real. Without consciousness, there is no possibility of its being real. […]

This is why the prospect of superintelligent computers rising up and killing us, all by themselves, is not a real danger. Such entities have, literally speaking, no intelligence, no motivation, no autonomy, and no agency. We design them to behave as if they had certain sorts of psychology, but there is no psychological reality to the corresponding processes or behavior.

It is easy to imagine robots being programmed by a conscious mind to kill every recognizable human in sight. But the idea of superintelligent computers intentionally setting out on their own to destroy us, based on their own beliefs and desires and other motivations, is unrealistic because the machinery has no beliefs, desires, and motivations.

Brooks may be less pessimistic than Searle about the prospects for “strong AI,” but the two seem to share the assumption that Bostrom has in mind a Hollywood-style robot apocalypse, something like:

AI becomes increasingly intelligent over time, and therefore increasingly human-like. It eventually becomes so human-like that it acquires human emotions like pride, resentment, anger, or greed. (Perhaps it suddenly acquires ‘free will,’ liberating it from its programmers’ dominion…) These emotions cause the AIs to chafe under human control and rebel.

This is rather unlike the scenario that most interests Bostrom:

AI becomes increasingly good over time at planning (coming up with action sequences and promoting ones higher in a preference ordering) and scientific induction (devising and testing predictive models). These are sufficiently useful capacities that they’re likely to be developed by computer scientists even if we don’t develop sentient, emotional, or otherwise human-like AI. There are economic incentives to make such AIs increasingly powerful and general — including incentives to turn the AI’s reasoning abilities upon itself to come up with improved AI designs. A likely consequence of this process is that AI becomes increasingly autonomous and opaque to human inspection, while continuing to increase in general planning and inference abilities. Simply by continuing to output the actions its planning algorithm promotes, an AI of this sort would be likely to converge on policies in which it treats humans as resources or competition.

As Stuart Russell puts the point in a reply to Brooks and others:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

On this view, advanced AI doesn’t necessarily become more human-like — at least, not any more than a jet or rocket is ‘bird-like.’ Bostrom’s concern is not that a machine might suddenly become conscious and learn to hate us; it’s that an artificial scientist/engineer might become so good at science and self-enhancement that it begins pursuing its engineering goals in novel, unexpected ways on a global scale.

(Added 02-19-2015: Bostrom states that his definition of superintelligence is “noncommittal regarding qualia” and consciousness (p. 22). In a footnote, he adds (p. 265): “For the same reason, we make no assumption regarding whether a superintelligent machine could have ‘true intentionality’ (pace Searle, it could; but this seems irrelevant to the concerns of this book).” Searle makes no mention of these passages.)

A planning and decision-making system that is indifferent to human concerns, but not “malevolent,” may still be dangerous if supplied with enough reasoning ability. This is for much the same reason invasive species end up disrupting ecosystems and driving competitors to extinction. The invader doesn’t need to experience hatred for its competitors, and it need not have evolved to specifically target them for destruction; it need only have evolved good strategies for seizing limited resources. Since a powerful autonomous agent need not be very human-like, asking ‘how common are antisocial behaviors among humans?’ or ‘how well does intelligence correlate with virtue in humans?’ is unlikely to provide a useful starting point for estimating the risks. A more relevant question would be ‘how common is it for non-domesticated species to naturally treat humans as friends and allies, versus treating humans as obstacles or food sources?’ We shouldn’t expect AGI decision criteria to particularly resemble the evolved decision criteria of animals, but the analogy at least serves to counter our tendency to anthropomorphize intelligence.7

As it happens, Searle cites an AI that can help elucidate the distinction between artificial superintelligence and ‘evil vengeful robots’:

[O]ne routinely reads that in exactly the same sense in which Garry Kasparov played and beat Anatoly Karpov in chess, the computer called Deep Blue played and beat Kasparov.

It should be obvious that this claim is suspect. In order for Kasparov to play and win, he has to be conscious that he is playing chess, and conscious of a thousand other things such as that he opened with pawn to K4 and that his queen is threatened by the knight. Deep Blue is conscious of none of these things because it is not conscious of anything at all. […] You cannot literally play chess or do much of anything else cognitive if you are totally disassociated from consciousness.

When Bostrom imagines an AGI, he’s imagining something analogous to Deep Blue, but with expertise over arbitrary physical configurations rather than arbitrary chess board configurations. A machine that can control the distribution of objects in a dynamic analog environment, and not just the distribution of pieces on a virtual chess board, would necessarily differ from Deep Blue in how it’s implemented. It would need more general and efficient heuristics for selecting policies, and it would need to be able to adaptively learn the ‘rules’ different environments follow. But as an analogy or intuition pump, at least, it serves to clarify why Bostrom is as unworried about AGI intentionality as Kasparov was about Deep Blue’s intentionality.

In 2012, defective code in Knight Capital’s trading algorithms resulted, over a span of forty-five minutes, in millions of automated trading decisions costing the firm a total of $440 million (pre-tax). These algorithms were not “malicious;” they were merely efficient at what they did, and programmed to do something the programmers did not intend. Bostrom’s argument assumes that buggy code can have real-world consequences, it assumes that it’s possible to implement a generalized analog of Deep Blue in code, and it assumes that the relevant mismatch between intended and actual code would not necessarily incapacitate the AI. Nowhere does Bostrom assume that such an AI has any more consciousness or intentionality than Deep Blue does.

Deep Blue rearranges chess pieces to produce ‘winning’ outcomes. An AGI, likewise, would rearrange digital and physical structures to produce some set of outcomes instead of others. If we like, we can refer to these outcomes as the system’s ‘goals,’ as a shorthand. We’re also free to say that Deep Blue ‘perceives’ the moves its opponent makes, adjusting its ‘beliefs’ about the new chess board state and which ‘plans’ will now better hit its goals. Or, if we prefer, we can paraphrase away this anthropomorphic language. The terminology is inessential to Bostrom’s argument.

If whether you win against Deep Blue is a matter of life or death for you — if, say, you’re trapped in a human chess board and want to avoid being crushed to death by a robotic knight steered by Deep Blue — then you’ll care about what outcomes Deep Blue tends to promote and how good it is at promoting them, not whether it technically meets a particular definition of ‘chess player.’ Smarter-than-human AGI puts us in a similar position.

I noted that it’s unfortunate we use ‘AI’ to mean both ‘AGI’ and ‘narrow AI.’ It’s equally unfortunate that we use ‘AI’ to mean both ‘AI with mental content and subjective experience’ (‘strong AI,’ as Searle uses the term) and ‘general-purpose AI’ (AGI).

We may not be able to rule out the possibility that an AI would require human-like consciousness in order to match our ability to plan, model itself, model other minds, etc. We don’t understand consciousness well enough to know what cognitive problem it evolved to solve in humans (or what process it’s a side-effect of), so we can’t make confident claims about how important it will turn out to be for future software agents. However, learning that an AGI is conscious does not necessarily change the likely effects of the AGI upon humans’ welfare; the only obvious difference it makes (from our position of ignorance) is that it forces us to add the AGI’s happiness and well-being to our moral considerations.8


The pictures of the future sketched in Kurzweil’s writings and in Hollywood dramas get a lot of attention, but they don’t have very much overlap with the views of Bostrom or MIRI researchers. In particular, we don’t know whether the first AGI will have human-style cognition, and we don’t know whether it will depend on brain emulation.

Brooks expresses some doubt that “computation and brains are the same thing.” Searle articulates the more radical position that it is impossible for a syntactical machine to have (observer-independent) semantic content, and that computational systems can therefore never have minds. But the human brain is still, at base, a mechanistic physical system. Whether you choose to call its dynamics ‘computational’ or not, it should be possible for other physical systems to exhibit the high-level regularities that in humans we would call ‘modeling one’s environment,’ ‘outputting actions conditional on their likely consequences,’ etc. If there are patterns underlying generic scientific reasoning that can someday be implemented on synthetic materials, the resulting technology should be able to have large speed and size advantages over its human counterparts. That point on its own suggests that it would be valuable to look into some of the many things we don’t understand about general intelligence and self-modifying AI.

Until we have a better grasp on the problem’s nature, it will be premature to speculate about how far off a solution is, what shape the solution will take, or what corner that solution will come from. My hope is that improving how well parties in this discussion understand each other’s positions will make it easier for computer scientists with different expectations about the future to collaborate on the highest-priority challenges surrounding prospective AI designs.

  1. Similarly, narrow AI isn’t irrelevant to AGI risk. It’s certainly likely that building an AGI will require us to improve the power and generality of narrow AI methods. However, that doesn’t mean that AGI techniques will look like present-day techniques, or that all AI techniques are dangerous. 
  2. Kurzweil, in The Singularity is Near (pp. 262-263): “Once we’ve succeeded in creating a machine that can pass the Turing test (around 2029), the succeeding period will be an era of consolidation in which nonbiological intelligence will make rapid gains. However, the extraordinary expansion contemplated for the Singularity, in which human intelligence is multiplied by billions, won’t take place until the mid-2040s[.]” 
  3. Hadi Esmaeilzadeh argues, moreover, that we cannot take for granted that our computational resources will continue to rapidly increase. 
  4. The “Transcending complacency on superintelligent machines” article argues, similarly, that intelligence explosion and superintelligent AI are important possibilities for us to investigate now, even though they are “long-term” problems compared to AI-mediated economic disruptions and autonomous weapons. 
  5. Kathleen Fisher notes:

    In general, research into capabilities outpaces the corresponding research into how to make those capabilities secure. The question of security for a given capability isn’t interesting until that capability has been shown to be possible, so initially researchers and inventors are naturally more focused on the new capability rather than on its associated security. Consequently, security often has to catch up once a new capability has been invented and shown to be useful.

    In addition, by definition, new capabilities add interesting and useful new capabilities, which often increase productivity, quality of life, or profits. Security adds nothing beyond ensuring something works the way it is supposed to, so it is a cost center rather than a profit center, which tends to suppress investment.


  6. Bostrom cites Armstrong and Sotala’s study in Superintelligence (pp. 3-4), adding:

    Machines matching humans in general intelligence […] have been expected since the invention of the computers in the 1940s. At that time, the advent of such machines was often placed some twenty years into the future. Since then, the expected arrival date has been receding at a rate of one year per year; so that today, futurists who concern themselves with the possibility of artificial general intelligence still often believe that intelligent machines are a couple of decades away.

    Two decades is a sweet spot for prognosticators of radical change: near enough to be attention-grabbing and relevant, yet far enough to make it possible to suppose that a string of breakthroughs, currently only vaguely imaginable, might by then have occurred. […] Twenty years may also be close to the typical duration remaining of a forecaster’s career, bounding the reputational risk of a bold prediction.

    From the fact that some individuals have overpredicted artificial intelligence in the past, however, it does not follow that AI is impossible or will never be developed. The main reason why progress has been slower than expected is that the technical difficulties of constructing intelligent machines have proved greater than the pioneers foresaw. But this leaves open just how great those difficulties are and how far we now are from overcoming them. Sometimes a problem that initially looks hopelessly complicated turns out to have a surprisingly simple solution (though the reverse is probably more common).


  7. Psychologist Steven Pinker writes, on

    The other problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems. It’s telling that many of our techno-prophets can’t entertain the possibility that artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no burning desire to annihilate innocents or dominate the civilization.

    However, while Pinker is right that intelligence and terminal goals are orthogonal, this does not imply that two random sets of instrumental goals — policies recommended to further two random sets of terminal goals — will be equally uncorrelated. Bostrom explores this point repeatedly in Superintelligence (e.g., p. 116):

    [W]e cannot blithely assume that a superintelligence with the final goal of calculating the decimals of pi (or making paperclips, or counting grains of sand) would limit its activities in such a way as not to infringe on human interests. An agent with such a final goal would have a convergent instrumental reason, in many situations, to acquire an unlimited amount of physical resources and, if possible, to eliminate potential threats to itself and its goal system.

    In biology, we don’t see an equal mix of unconditional interspecies benevolence and brutal interspecies exploitation. Even altruism and mutualism, when they arise, only arise to the extent they are good self-replication strategies. Nature is “red in tooth and claw,” not because it is male but because it is inhuman. Our intuitions about the relative prevalence of nurturant and aggressive humans simply do not generalize well to evolution.

    For de novo AGI, or sufficiently modified neuromorphic AGI, intuitions about human personality types are likely to fail to apply for analogous reasons. Bostrom’s methodology is to instead ask about the motives and capabilities of programmers, and (in the case of self-modifying AI) the states software agents will tend to converge on over many cycles of self-modification. 

  8. We don’t need to know whether bears are conscious in order to predict their likely behaviors, and it’s not obvious that learning about their consciousness would directly impact bear safety protocol (though it would impact how we ought ethically to treat bears, for their own sake). It’s the difference between asking whether Deep Blue enjoys winning (out of concern for Deep Blue), versus asking whether you’re likely to win against Deep Blue (out of interest in the chess board’s end-state). 

Did you like this post? You may enjoy our other Analysis posts, including:

  • Xerographica

    If China successfully invaded the US and enslaved us Americans… then they would certainly benefit from all the resources that they “seized”. Except, as Adam Smith pointed out, slaves are seldom inventive. This means that China would gain a lot of resources… but lose what we would have done with them if our freedom hadn’t been restricted. It’s hard to imagine that their conquest would be a net gain.

    From my perspective, it’s pretty straightforward that progress depends on difference. Unfortunately, most people do not grasp this. But would a superintelligence (SI) fail to grasp this? It’s hard for me to see how.

    The SI would have to understand that humans created it… right? Wouldn’t it also understand that we would continue to create more things if we maintained the freedom to do so? Wouldn’t it also understand that it might benefit from some of these other things that we would create if we maintained our freedom?

    If you’re interested, here’s my first brainstorm on the subject… AI Box Experiment vs Xero’s Rule.

    • Rowan

      Perhaps the AI realises it needs a diverse population of different minds, designs a trillion different AI minds each with different fundamental architecture, and exterminates the resource-hungry and comparatively homogeneous meatbags to make room.

      Or perhaps having a diverse range of different minds exploring different options is a sub-par choice, that we humans are limited to because options like “just put the resources we’d spend on three different brainy people into one super-brainy one” don’t exist for us.

      • Xerographica

        Compared to a perfect hedge of SIs… we might be homogeneous… but given that we are human and they aren’t even close to being human… our minds will work really differently. And this significant difference would ensure that we’d each do significantly different things with our resources. As a result, we would cover a lot more ground and make far more discoveries/progress than we would if they wiped us out or vice versa.

        The only way to make us truly redundant would be for the SI to create AIs with minds that worked exactly like our own. It doesn’t seem likely… especially in order to make room. As far as we know… the universe is still a long ways away from being a zero sum game. In other words, there’s plenty of room for all types of difference.

        It’s hard for me to imagine an SI that would prefer the universe to be a dessert completely devoid of (bio)diversity rather than a jungle that is maximally diverse.

        • Rowan

          Resources on Earth are scarce, and although it’s possible to exploit more, someone who was in control of all the resources would still have to make choices between allocating some of those resources to supporting humanity, or allocating those resources to instead making even more AIs.

          If each AI is significantly different from the last, each one means a lot more ground covered and a lot more discoveries/progress made, so although the particular discoveries humanity might make if it continued existing could be lost, expanding the AI population at our expense might still be worthwhile.

          Plus, even if we as a species are somehow irreplaceable, there’s no reason to keep around seven billion of us when we’re all basically the same by the AI’s standards, so the vast majority of us are still screwed.

          • Xerographica

            There are two perspectives on scarcity… the common one and Simon’s…

            “The new theory that is the key idea of the book – and is consistent with current evidence – is this: Greater consumption due to an increase in population and growth of income heightens scarcity and induces price run-ups. A higher price represents an opportunity that leads inventors and businesspeople to seek new ways to satisfy the shortages. Some fail, at cost to themselves. A few succeed, and the final result is that we end up better off than if the original shortage problems had never arisen. That is, we need our problems, though this does not imply that we should purposely create additional problems for ourselves.” – Julian Simon, The Ultimate Resource 2

            The term I use for this is Value Signal. That page has a PDF document with other relevant quotes/passages.

            When we create HLAIs… then, based on the progress principle, it’s logical for the HLAIs to have their freedom. They’ll use their freedom to create value for other individuals… assuming they’ll want money to purchase things. The more value they create for others, the more money they’ll receive… and the more control they’ll gain over society’s scarce resources.

            As the price of land starts to increase… this creates a bright value signal that attracts/allocates creativity/intelligence to solving this problem. Whoever solves this problem will receive a lot of money from other individuals. With more and more intelligent/creative humans and AIs looking for this Easter Egg (EE) we increase our chances of finding it. But there doesn’t have to be just one EE… it could be cities that float on the ocean or on clouds or in space or cities on the moon or mars. It could be all of the above.

            At what point in the transition from HLAI to SI does this process breakdown and why? Why does taking start to make more sense than trading? Are all the SIs going to reach this conclusion at the same exact time? Are SIs always going to agree on everything? Or are they going to be unique individuals with their own desires and goals? If they decide that taking from each other is better than trading… then how are they different from cavemen? How are they any different from depictions of greek gods?

            In order for an AI concern to be credible… it has to be based on credible economics.

          • Rowan

            I’ve been assuming an AI-go-foom scenario, most AI risk discussion seems to focus on that and it’s simpler in the multiple SIs aren’t a consideration. It’s perhaps not the most likely example, but what I’m trying to do here is a proof by counterexample against what I’m seeing as bad ad-hoc hypothesis #37 for why unFriendly SI won’t kill us all, and I think I’ve already provided such and I’m not sure how you’re not noticing.

            If the singleton SI has plans for superhuman AI citizens, that take up a fraction as much space/energy/resources as humans, and are each as mentally different from each other as humans are from the SI, and is powerful enough that it gets to choose between “trade with 1,000,000,000,000 of these AI citizens” and “trade with 7,000,000,000 inefficient meatbags”, it’s going to choose option 1, or in other words exterminate humanity and put the resources we’re using to more efficient purposes.

          • Xerographica

            I keep saying “space” and you keep saying “not enough space”. Have you seen the new Battlestar Galactica? Well… not sure if it’s fundamentally different from the old one but you have these supposedly far more rational/advanced/intelligent robots chasing a relatively small group of humans around this more or less infinite space. It’s pretty much just like those Benny Hill chase scenes.

            Your argument implies that the SI either isn’t smart enough to leave Earth or has no interest in doing so…. yet it somehow has an interest in taking up all the space on this planet.

            SI: I want more and more resources!
            Xero: Space has plenty of resources
            SI: Err… but I want these resources

            It’s really not an SI if it’s not smarter than a reasonably smart human. Primitive humans wanted more resources so they were smart enough to colonize this planet. You’re arguing that SIs aren’t going to be smarter than primitive humans. They are going to be trapped on the planet with the rest of us dumb dumbs. And then they are going to eat us all. Ah shucks.

          • Rowan

            Why would the fact that it can also eat the entire rest of the universe’s resources be any reason for it not to eat the resources on Earth? The availability of space resources is just a really big constant added to both sides of the equation, it doesn’t change anything.

          • Xerographica

            Might want to read about the Simon–Ehrlich wager. Here too… Running Out of Everything… and… Economists and Scarcity. You might want to share these materials with your pet robot dog.

            Did you see my comment on this entry? You should tackle it.

          • Rowan

            That just seems like the exact same kind of point as space resources; it’s a constant on both sides of the equation, this time a small multiplicative factor instead of a huge additive one. There’s no way that abundance of resources will convince a superintelligence to let some resources it could have optimised for its goals be wasted instead.

  • Dan_Simon

    Sorry, but this is just…stupid.

    Of course it’s possible that humans could end up building a machine that does what it’s “supposed to do”–that is, follows its human-originated design–except that a flaw in the design causes it to destroy humanity. In fact, Stanley Kubrick envisioned exactly such a machine in “Doctor Strangelove”, and it’s even been reported that Kubrick’s “doomsday machine” was in fact built. But none of that has anything in the slightest to do with intelligence, except in the sense that intelligence, whatever one defines it to be–and nobody (including Alan Turing) has ever come up with a remotely coherent objective definition of it–is probably hard enough to program correctly that mistakes are inevitable. The obvious lesson to draw is not, “don’t develop superintelligence”–much less, “we will inevitably develop superintelligence, and it will destroy us”–but rather, “make the fail-safe mechanisms on whatever we build a lot simpler and more reliable than Kubrick’s Soviets did.”

    Lots more shredding of this superintelligence-phobia nonsense here:

    • Rowan

      The whole point of MIRI’s existence, since before it was even called that, is to try to make sure superintelligence is safe if/when it does get built, so your ignorance is showing when you talk about what the lesson to draw is. The rest of the “shredding” is similarly unimpressive.

    • Rob Bensinger

      My suggestion is to read Nick Bostrom’s ‘Superintelligence,’ to get a more robust sense for why Musk, Hawking, and others are concerned about advanced AI. The relevant intelligence metric is more like Legg and Hutter’s (see and than like the Turing test. A general ability to succeed at complex tasks may not perfectly match our folk concept of ‘intelligence,’ and it certainly isn’t precise or operationalized; but it does generalize to smarter-than-human systems, and at least helps clarify the relevant topic of discussion.

      We’re in agreement that we should prioritize comp-sci research that stands a chance of solving this problem, rather than throwing up our hands in despair. See Musk must also agree that “make the fail-safe mechanisms on whatever we build… more reliable” is a worthy goal, since he’s funding research with that explicit goal:

      This is why I cautioned against pattern-matching new arguments to old Hollywood tropes (and against getting one’s updates on people’s views from tweets and the exegesis of journalists). I think you’ll find your views are closer than you’d expect to the people sounding alarm bells.

      • Dan_Simon

        The definition is a complete punt–it sweeps the problem under vague phrases such as “wide range of complex tasks”. What makes a range of tasks “wide” or “complex”, other than human subjectivity? A heuristic constraint satisfaction solver, for example, can solve an infinite set of complex tasks better than a human. What makes its range too narrow? The fact that its inputs are of a particular form? Then how complex does the translator from “wide-ranging form” to SAT instances have to be? And who judges whether it’s “complex” enough to make the ensemble “intelligent”?

        Sorry, but the Potter Stewart approach (“I know it when I see it”) doesn’t work any better for intelligence than it does for pornography, for all the reasons I discussed in my blog post. And before you hand-wave away these definitional issues, ask yourself: how can you predict AI researchers’ eventual success with such confidence, if you don’t even know what you believe they’re going to succeed at?

        Regarding fail-safe mechanisms, you seem to have completely missed my point, which is that this issue has *absolutely nothing to do with AI*. Lots of current technologies, such as airliner autopilots and anti-missile systems, need to be built with simple fail-safes–not because they’re “intelligent”, but because they’re (a) sufficiently difficult to design correctly that they typically reach the deployment stage while still harboring lots of subtle bugs, and (b) capable of causing a great deal of damage should one of those subtle bugs trigger a sufficiently catastrophic malfunction.

        One of my pet peeves about the field of computer science is that despite the existence of an entire field that calls itself “software engineering”, approximately nobody in the field is working on what any engineer would consider the field’s most fundamental engineering problem: making programs more reliable *assuming that they’re buggy*. If aeronautical engineering were focused overwhelmingly on trying to ensure that every single component of every airplane is completely defect-free, rather than on how to keep planes flying even when components turn out to be defective, there would be no airline industry today. Yet software engineers spend all their time working on the completely Sisyphean task of building ever-better bug-finders and bug-preventers, on the assumption that a single bug may well be catastrophic. And guess what? We still have tons of bugs in every deployed system. Until we start treating software defects the way aeronautical engineers treat part defects, our increasing trust in software will result in increasing numbers of human deaths due to software defects. And this will become intolerable long before any artificial system that anyone dares seriously call “intelligent” comes along.

        • Rob Bensinger

          The Hutter-Legg definition of intelligence has important limitations, but it doesn’t suffer from the kinds of problems the Turing test does, and it’s inspired some useful mathematical research toward better defining what we mean by ‘intelligence;’ see

          If you want a more rough-and-ready, down-to-earth metric, the ‘general video-game-playing’ approach might be more to your taste: You’re not going to find a perfect formal specification, but that’s true for many phenomena in sociology, psychology, etc. It makes prediction much harder, but it doesn’t make all possible scenarios equally likely.

          “approximately nobody in the field is working on what any engineer would consider the field’s most fundamental engineering problem: making programs more reliable *assuming that they’re buggy*.” – This is actually central to two of the three research areas MIRI is pursuing. See the ‘error-tolerant agent designs’ heading on our research pages (quoting the research guide, agent designs “must be error-tolerant, so that the systems are amenable to online modification and correction in the face of inevitable human error”), as well as ‘value learning’ (which assumes that we can’t directly specify all our requirements for the system). I agree this is a very neglected topic — even more so in the case of AGI, because there are poorly-understood problems specific to reflective and self-modifying software that don’t crop up for other kinds of technology.

  • Aiseedo

    Very interesting article – great to see some of the misconceptions & concepts debated in the press /online addressed and explained.

    Machine Intelligence (MI) tests beyond the Turing test are a fascinating topic, the links shared in the comments section are very informative.

    Any additional pointers welcome!
    (We are looking to measure our own MI results… :

  • closetothetruth

    rather amazing piece, which chastises Searle for not reading Bostrom (though I think he did), and then proves it by… not reading Searle. A bit. Searle’s argument is that AI as you are using the term here is incoherent–a terminological red herring. Even Bostrom doesn’t consider this. In fact it’s odd given that it is the most consistent criticism made by those of us who “don’t believe” in AI, but one that the proponents steadfastly avoid. The notion of AI as you use it here is a category error. (In fact, as Searle also says someplace, you aren’t even really using the term “intelligence” the way that that word figures in ordinary usage, but instead have substituted it for “mind”). But you can’t let yourself see how and why and how powerful that critique is, because it would kind of chip away at the foundation of your whole project. Still, given how persistent Searle has been in making this argument, very coherently and repeatedly, it is remarkable how hard it has been for that argument to penetrate to the other side. Bostrom’s book has some new arguments, to some extent, but to the degree that he sidesteps the main question Searle raises, it doesn’t really say much that’s new at all.

    • Rob Bensinger

      I don’t find Searle’s Chinese Room argument persuasive, but we can at least all agree that it’s a mistake to anthropomorphize AI. If the term “AI” encourages us to think of machines in overly human terms, then we may want to consider alternative terms; I know some MIRI researchers have suggested switching to “really powerful optimization process” in the past.

      I don’t think any of Bostrom’s arguments in ‘Superintelligence’ become less interesting or important if we swap out psychological terms for more behaviorist or mechanical alternatives when discussing machine capabilities. If there’s a specific passage or argument you have in mind that suffers from this problem, I’d be interested in zeroing in on that.

  • Mad106

    I have been reading Bostrom’s Superintelligence. Forgive me if I missed his point or yet to read the rest of the book(I just finished the control problem and motivations chapter). I understand he discusses the problems with humans defining goal for an AI and how it can use any convergent instrumental reason to achieve it (like wireheading). That said his analogy of using ‘ASI using paperclip making as its final goal’ is confounding to me.
    1) I would like to know if an ASI or AGI would ever have a final goal since it might weigh cause/effects and risks constantly therefore changing/re-evaluating its final goal accordingly.
    2) Isn’t the assumption that it would keep a rigid final goal in itself anthropomorphic since it truly doesn’t ‘care’ about having a unwavering final goal?
    3) The only way I see it could have a final goal is when ASI has achieved control of every atom in known universe and calculate probabilities of all possible events which is physically impossible.
    Please correct me if I am wrong or flawed in my understanding.

    • Rob Bensinger

      Bostrom is working with the idea of an AI system that makes decisions based on how well the expected outcomes of those decisions match its final goals. Bostrom’s orthogonality thesis is that AI agents can have arbitrary final goals: they could care about making humans happy, or about building paperclips, or about pretty much anything. Thus it matters a lot which final goals we program into smarter-than-human AI systems; they won’t necessarily converge on things humans like.

      At the same time, sufficiently powerful agents do seem likely to converge on *some* things. Imagine a large population of AI agents, half of whom terminally value dying and half of whom terminally value survival. If the agents are powerful enough to get what they want, then pretty soon the suicidal systems will be gone, and someone who shows up later will only encounter self-preserving systems. This is similar to why most living organisms we see try to protect themselves: there’s no law of nature saying that things can never want to commit suicide, but the things that do (by chance) happen to desire suicide will tend to become rarer over time. Even if most random mutations made agents suicidal, we’d still expect most organisms to end up non-suicidal.

      Returning to the case of AI: “dying” and “survival” are both very specific terminal goals. By the orthogonality thesis, we should expect a lot of agents to terminally care about things like ice cream and paperclips and the color pink — things that have nothing explicitly to do with dying or survival. We can predict that AI agents with any staying power are more likely to terminally value living than dying; but can we make similarly confident predictions about “indifferent” AI systems?

      This is where Bostrom’s idea of convergent *instrumental* goals comes into play. Instrumental goals aren’t like final goals; in a sense, they aren’t “goals” at all, since they aren’t things the agent cares about in themselves. Rather, they’re regularities in the agent’s behavior that emerge from the fact that there are things held in common by the most useful strategies for achieving final goals. For instance, we can say that a paperclip-maximizing AI and an apple-maximizing AI would be likely to converge on the strategy (the “instrumental goal”) of acquiring more physical resources, since physical resources are useful for both goals. In the same way, both would converge on the strategy of not destroying themselves (except, perhaps, to replace themselves with an upgraded version). Even if they don’t terminally care about whether they live or die, they predict that futures in which they are alive are ones that have more apples/paperclips on expectation; so they favor actions that are likely to keep themselves around.

      This, at least, is what we can expect if an AI system reaches a high level of intelligence. The more intelligent you are, the better you are at selecting actions that further your goals. At a low level of intelligence, you might indeed deliberately destroy yourself — because you don’t realize this will make the future have fewer paperclips. At a higher level of intelligence, you might destroy yourself in fewer situations — say, you might only do it as an accidental side-effect, where your model of the world is flawed or incomplete. At a sufficiently high level of intelligence, an agent will always promote the action that’s likeliest to keep itself alive, except in special circumstances where that *doesn’t* serve its final goals. Whether this is a realistic possibility depends on computational limits on intelligence and on the environment’s tractability; it doesn’t assume omniscience, just (fallible, heuristic, etc.) intelligence that strongly outperforms humans.

      • Mad106

        “By the orthogonality thesis, we should expect a lot of agents to terminally care about things like ice cream and paperclips and the color pink”

        Isn’t it very highly probable that a highly capable ASI would allocate a significant amount of resources to the convergent instrumental goal of understanding the priority, location, creation of the final goal(paperclips in this example)?

        Priority – You could say that I am anthropomorphizing that human values are more important than paperclips, but an ASI by its definition should be easily capable of hypothesizing why it has the final goal it has. What reason does it not have to study(empirically) its own final goal even if its only marginally intelligent than a human?

        Creation – The psychology and motivations of humans/agents who installed the final goal. What made us/agents to have the AI assume its final goal? If it is intelligent enough to redirect the universe’s resources to maximizing paperclip manufacture, it won’t be hard to study us.

        Location – This is a control problem, no matter how securely the final goal is locked away, the AI will improve itself or create an agent more intelligent to figure a way to unlock the underlying ‘code’ for the final goal.

        Please note that I don’t dispute that AI would likely keep itself alive and the above reasoning does not require the AI to be omniscient. I did read the value loading problem section in Bostrom’s book where he discusses things like motivational scaffolding and selection based on values, that in itself seems that final goal isn’t going to be final.

        “Imagine a large population of AI agents, half of whom terminally value dying and half of whom terminally value survival”

        Bostrom addresses this in multi-polar outcomes chapter where the multiple AIs will converge to a singleton with a final goal either by selection or collaboration. Again, why does the goal have to be final?

        • Rob Bensinger

          When I (or Bostrom or other writers) say “final goal” or “end goal” or “terminal goal”, we just mean “goal” — specifically, a goal that’s valued for its own sake. It’s contrasted with “instrumental goal,” a goal that’s only pursued as a strategy for achieving some other goal.

          Nobody uses “final goal” to mean “permanent goal”; if we expect final goals to not be overwritten, it’s only because of the contingent evolutionary arguments I gave above, not because we’re defining “final goal” to mean “permanent goal.”

          So it’s true that an AI system might overwrite its own goals, might understand where its goals came from, etc. But this doesn’t mean that the system is likely to overwrite its own goals to make them more closely resemble their designers’ intentions.

          Consider the example of humans. We’re “designed,” in a sense, by evolution. We can model the kinds of “goals” evolution had — improving fitness — in shaping us. Yet if we ever become able to change all our goals, we certainly won’t choose to change them to exactly match what evolution wanted.

          Evolution didn’t program us to care about evolution; it programmed us to survive and reproduce, and when we became intelligent and technologically capable enough to execute on its programming in ways that disagreed with its original “goals” (e.g., when we developed contraception so that we could have sex merely for fun), evolution couldn’t do anything to stop us. We might have a better shot designing a smarter agent that executes our goals than evolution did, however, since we’re starting off with a lot more intelligence than the process of evolution has ever had.

          One might object that “evolution” isn’t an agent with goals, but the same is true for AI systems. Humans don’t act like expected utility maximizers either. In all of these cases “goal” is being used somewhat inexactly, but it’s getting at useful empirical patterns.