Davis on AI capability and motivation

In a review of Superintelligence, NYU computer scientist Ernest Davis voices disagreement with a number of claims he attributes to Nick Bostrom: that “intelligence is a potentially infinite quantity with a well-defined, one-dimensional value,” that a superintelligent AI could “easily resist and outsmart the united efforts of eight billion people” and achieve “virtual omnipotence,” and that “though achieving intelligence is more or less easy, giving a computer an ethical point of view is really hard.”

These are all stronger than Bostrom’s actual claims. For example, Bostrom never characterizes building a generally intelligent machine as “easy.” Nor does he say that intelligence can be infinite or that it can produce “omnipotence.” Humans’ intelligence and accumulated knowledge gives us a decisive advantage over chimpanzees, even though our power is limited in important ways. An AI need not be magical or all-powerful in order to have the same kind of decisive advantage over humanity.

Still, Davis’ article is one of the more substantive critiques of MIRI’s core assumptions that I have seen, and he addresses several deep issues that directly bear on AI forecasting and strategy. I’ll sketch out a response to his points here.

Measuring an intelligence explosion

Davis writes that Bostrom assumes “that a large gain in intelligence would necessarily entail a correspondingly large increase in power.” This is again too strong. (Or it’s trivial, if we’re using the word “intelligence” to pick out a specific kind of power.)

Bostrom is interested in intelligence for its potential to solve practical problems and shape the future. If there are other kinds of intelligence, they’re presumably of less economic and strategic importance than the “cognitive superpowers” Bostrom describes in chapter 6. It is the potential power autonomous machines could exhibit that should primarily concern us from a safety standpoint, and “intelligence” seems as good a term as any for the kind of power that doesn’t depend on an agent’s physical strength or the particularities of its environment.

When it comes to Bostrom’s intelligence explosion thesis, I don’t think ‘does an increase in intelligence always yield a corresponding increase in power?’ gets at the heart of the issue. Consider David Chalmers’ version of the argument:

[I]t is not unreasonable to hold that we can create systems with greater programming ability than our own, and that systems with greater programming ability will be able to create systems with greater programming ability in turn. It is also not unreasonable to hold that programming ability will correlate with increases in various specific reasoning abilities. If so, we should expect that absent defeaters, the reasoning abilities in question will explode.

Here, there’s no explicit appeal to “intelligence,” which is replaced with programming ability plus an arbitrarily large number of “specific reasoning abilities.” Yet if anything I find this argument more plausible than Bostrom’s formulation. For that reason, I agree with Bostrom that the one-dimensional representation of intelligence is inessential and “one could, for example, instead represent a cognitive ability profile as a hypersurface in a multidimensional space” (p. 273). (( Still more fine-grained versions of the same argument may be possible. E.g., “programming ability” might decompose into multiple abilities, such as the ability to efficiently explore search spaces for code that meets constraints and the ability to efficiently test candidate code. ))

Moreover, the relevant question isn’t whether an increase in a self-improving AI’s general programming ability always yields a corresponding increase in its ability to improve its own programming ability. Nor is the question whether either of those abilities always correlates with the other cognitive capabilities Bostrom is interested in (“strategizing,” “social manipulation,” “hacking,” “technology research,” “economic productivity”). I’d instead say that the five core questions from Bostrom’s point of view are:

Is the first superintelligent AI likely to result from self-improving AI systems?
If so, how much of the AI’s self-improvement is likely to be driven by improvements to some cognitive capability (e.g., programming ability) that facilitates further enhancement of the capability in question?
Are improvements to this self-improving capability likely to accelerate, as early advances result in cascades of more rapid advances? Or will the self-improving capability repeatedly stall out, advancing in small fits and starts?
If self-improvement cascades are likely, are they also likely to result in improvements to other cognitive capabilities that we more directly care about? (( For example: If an AI approaching superintelligence stumbles upon a cascade of improvements to its programming ability, will its capabilities and decision criteria also result in repeated improvements its physics modules, or its psychology modules? )) Or will those other cognitive capabilities lag far behind the capabilities that ‘explode’?
If an ability like ‘programming’ is likely to be self-reinforcing in an accelerating way, and is likely to foster accelerating improvements in other cognitive abilities, exactly how fast will those accelerations be? Are we talking about a gap of decades between the AI’s first self-reinforcing self-improvements and its attainment of superintelligence? Months? Hours?

Bostrom’s position on those questions — that a fast or moderate intelligence explosion is likely — at no point presupposes that “intelligence” is a well-defined scalar value one could do calculus with, except as a toy model for articulating various qualitative possibilities. When he writes out differential equations, Bostrom is careful to note that intelligence cannot be infinite, that one-dimensionality is a simplifying assumption, and that his equations are “intended for illustration only.” (( On page 76, for example, Bostrom writes: “This particular growth trajectory has a positive singularity at t = 18 months. In reality, the assumption that recalcitrance is constant would cease to hold as the system began to approach the physical limits of information processing, if not sooner.” On page 77, Bostrom says that the point he intends to illustrate is only that if AI progress is primarily AI-driven, resultant feedback loops that do arise will have a larger accelerating effect. ))

We should expect artificial general intelligence (AGI) to specialize in some domains and neglect others. Bostrom’s own analysis assumes that a recursively self-improving AI would tend to prioritize acquiring skills like electrical engineering over skills like impressionist painting, all else being equal. (( This is because a wide variety of final goals are best served through the acquisition or resources and the building of infrastructure, a set of objectives that are more likely to be furthered by electrical engineering skills than by painting skills. This argument is an instance of Bostrom’s instrumental convergence thesis in chapter 7. )) For that matter, present-day AI is already superhuman in some cognitive tasks (e.g., chess and mental arithmetic), yet subhuman in many others. Any attempt to quantify the ‘overall intelligence’ of Deep Blue or Google Maps will obscure some important skills and deficits of these systems. ((While it turns out that many intelligence-related characteristics correlate with a single easily-measured number in humans (g), this still doesn’t allow us to make fine-grained predictions about individual competency levels. I also can’t think of an obvious reason to expect a number even as informative as g to arise for AGI, especially early-stage AGI. Bostrom writes (p. 93):

[S]uppose we could somehow establish that a certain future AI will have an IQ of 6,455: then what? We would have no idea of what such an AI could actually do. We would not even know that such an AI had as much general intelligence as a normal human adult–perhaps the AI would instead have a bundle of special-purpose algorithms enabling it to solve typical intelligence test questions with superhuman efficiency but not much else.

Some recent efforts have been made to develop measurements of cognitive capacity that could be applied to a wider range of information-processing systems, including artificial intelligences. Work in this direction, if it can overcome various technical difficulties, may turn out to be quite useful for some scientific purposes including AI development. For purposes of the present investigation, however, its usefulness would be limited since we would remain unenlightened about what a given superhuman performance score entails for actual ability to achieve practically important outcomes in the world. )) Still, using concepts like ‘intelligence’ or ‘power’ to express imprecise hypotheses is better than substituting precise metrics that overstate how much we currently know about general intelligence and the future of AI. (( Imagine a Sumerian merchant living 5,000 years ago, shortly after the invention of writing, who has noticed the value of writing for storing good ideas over time, and not just market transactions. Writing could even allow one to transmit good ideas to someone you’ve never met, such as a future descendant. The merchant notices that his own successes have often involved collecting others’ good ideas, and that good ideas often open up pathways to coming up with other, even better ideas. From his armchair, he concludes that if writing becomes sufficiently popular, it will allow a quantity called society’s knowledge level to increase in an accelerating fashion; which, if the knowledge is used wisely, could result in unprecedented improvements to human life.

In retrospect we can say ‘knowledge’ would have been too coarse-grained a category to enable any precise predictions, and that there have really been multiple important breakthroughs that can be considered ‘knowledge explosions’ in different senses. Yet the extremely imprecise prediction can still give us a better sense of what to expect than we previously had. It’s a step up from the (historically common) view that civilizational knowledge only diminishes over time, the view that things will always stay the same, the view that human welfare will radically improve for reasons unrelated to knowledge build-up, etc.

The point of this analogy is not that people are good at making predictions about the distant future. Rather, my point is that hand-wavey quantities like ‘society’s knowledge level’ can be useful for making predictions, and can be based on good evidence, even if the correspondence between the quantity and the phenomenon it refers to is inexact. ))

Superintelligence superiority

In Superintelligence (pp. 59-61), Bostrom lists a variety of ways AGI may surpass humans in intelligence, owing to differences in hardware (speed and number of computational elements, internal communication speed, storage capacity, reliability, lifespan, and sensors) and software (editability, duplicability, goal coordination, memory sharing, and new modules, modalities, and algorithms). Davis grants that these may allow AGI to outperform humans, but expresses skepticism that this could give AGI a decisive advantage over humans. To paraphrase his argument:

Elephants’ larger brains don’t make them superintelligent relative to mice; squirrels’ speed doesn’t give them a decisive strategic advantage over turtles; and we don’t know enough about what makes Einstein smarter than the village idiot to make confident predictions about how easy it is to scale up from village idiot to Einstein, or from Einstein to super-Einstein. So there’s no particular reason to expect a self-improving AGI to be able to overpower humanity.

My response is that the capabilities Bostrom describes, like “speed superintelligence,” predict much larger gaps than the gaps we see between elephants and mice. Bostrom writes (in a footnote on p. 270):

At least a millionfold speedup compared to human brains is physically possible, as can be seen by considering the difference in speed and energy of relevant brain processes in comparison to more efficient information processing. The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.

Davis objects that “all that running faster does is to save you time,” noting that a slower system could eventually perform all the feats of a faster one. But the ability to save time is exactly the kind of ability Bostrom is worried about. Even if one doubts that large improvements in collective or quality superintelligence are possible, a ‘mere’ speed advantage makes an enormous practical difference.

Imagine a small community of scientist AIs whose only advantages over human scientists stem from their hardware speed — they can interpret sensory information and assess hypotheses and policies a million times faster than a human. At that speed, an artificial agent could make ~115 years of intellectual progress in an hour, ~2700 years of progress in a day, and 250,000 years of progress in three months.

The effect of this speedup would be to telescope human history. Scientific and technological advances that would have taken us tens of thousands of years to reach can be ours by Tuesday. If we sent a human a thousand years into the past, equipped with all the 21st-century knowledge and technologies they wanted, they could conceivably achieve dominant levels of wealth and power in that time period. This gives us cause to worry about building machines that can rapidly accumulate millennia of experience over humans, even before we begin considering any potential advantages in memory, rationality, editability, etc. (( Most obviously, a speed advantage can give the AI the time to design an even better AI.

None of this means that we can make specific highly confident predictions about when and how AI will achieve superintelligence. An AGI that isn’t very human-like may be slower than a human at specific tasks, or faster, in hard-to-anticipate ways. If a certain scientific breakthrough requires that one first build a massive particle accelerator, then the resources needed to build that accelerator may be a more important limiting factor than the AGI’s thinking speed. In that case, humans would have an easier time monitoring and regulating an AGI’s progress. We can’t rule out the possibility that speed superintelligence will face large unexpected obstacles, but we also shouldn’t gamble on that possibility or take it for granted. ))

At least some of the routes to superintelligence described by Bostrom look orders of magnitude larger than the cognitive advantages Einstein has over a village idiot (or than an elephant has over a mouse). We can’t rule out the possibility that we’ll run into larger-than-expected obstacles when we attempt to build AGI, but we shouldn’t gamble on that possibility. Black swans happen, but superintelligent AI is a white swan, and white swans happen too.

Since Bostrom’s pathways to superintelligence don’t have a lot in common with the neurological differences we can observe in mammals, there is no special reason to expect the gap between smarter-than-human AI and humans to resemble the gap between elephants and mice. Bostrom’s pathways also look difficult to biologically evolve, which means that their absence in the natural world tells us little about their feasibility.

We have even less cause to expect, then, that the gap between advanced AI and humans will resemble the gap between Einstein and a median human. If a single generation of random genetic recombination can produce an Einstein, the planning and design abilities of human (and artificial) engineers should make much greater feats a possibility.

Delegating AI problems to the AI

Separately, Davis makes the claim that “developing an understanding of ethics as contemporary humans understand it is actually one of the easier problems facing AI”. Again, I’ll attempt to summarize his argument:

Morality doesn’t look particularly difficult, especially compared to, e.g., computer vision. Moreover, if we’re going to build AGI, we’re going to solve computer vision. Even if morality is as tough as vision, why assume one will be solved and not the other?

Here my response is that you probably don’t need to solve every AI problem to build an AGI. We may be able to cheat at vision, for example, by tasking a blind AGI with solving the problem for us. But it is much easier to observe whether an algorithm is making progress on solving visual puzzles than to observe whether one is making progress on ethical questions, where there is much more theoretical and object-level disagreement among humans, there is less incentive for artificial agents to move from partial solutions to complete solutions, and failures don’t necessarily reduce the system’s power.

Davis notes that a socially fluent superintelligence would need to be an expert moralist:

Bostrom refers to the AI’s ‘social manipulation superpowers’. But if an AI is to be a master manipulator, it will need a good understanding of what people consider moral; if it comes across as completely amoral, it will be at a very great disadvantage in manipulating people. […] If the AI can understand human morality, it is hard to see what is the technical difficulty in getting it to follow that morality.

This is similar to Richard Loosemore’s argument against AI safety research, which I’ve responded to in a blog post. My objection is that an AI could come to understand human morality without thereby becoming moral, just as a human can come to understand the motivations of a stranger without thereby acquiring those motivations.

Since we don’t understand our preferences in enough generality or detail to translate them into code, it would be nice to be able to delegate the bulk of this task to a superintelligence. But if we program it to hand us an answer to the morality problem, how will we know whether it is being honest with us? To trust an AGI’s advice about how to make it trustworthy, we’d need to already have solved enough of the problem ourselves to make the AGI a reliable advisor.

Taking Davis’ proposal as an example, we can imagine instilling behavioral prescriptions into the AI by programming it to model Gandhi’s preferences and do what Gandhi would want it to. But if we try to implement this idea in code before building seed AI, we’re stuck with our own fallible attempts to operationalize concepts like ‘Gandhi’ and ‘preference;’ and if we try to implement it after, recruiting the AI to solve the problem, we’ll need to have already instilled some easier-to-program safeguards into it. What makes AGI safety research novel and difficult is our lack of understanding of how to initiate this bootstrapping process with any confidence.

The idea of using the deceased for value learning is interesting. Bostrom endorses the generalized version of this approach when he says that it is critical for value learning that the locus of value be “an object at a particular time” (p. 193). However, this approach may still admit of non-obvious loopholes, and it may still be too complicated for us to directly implement without recourse to an AGI. If so, it will need to be paired with solutions to the problems of corrigibility and stability in self-modifying AI, just as “shutdown buttons” and other tripwire solutions will.

From Bostrom’s perspective, what makes advanced AI a game-changer is first and foremost its capacity to meaningfully contribute to AI research. The vision problem may be one of many areas where we can outsource sophisticated AGI problems to AGI, or to especially advanced narrow-AI algorithms. This is the idea underlying the intelligence explosion thesis, and it also underlies Bostrom’s worry that capabilities research will continue to pull ahead of safety research.

In self-improving AI scenarios, the key question is which AI breakthroughs are prerequisites for automating high-leverage computer science tasks. This holds for capabilities research, and it also holds for safety research. Even if AI safety turned out to be easier than computer vision in an absolute sense, it would still stand out as a problem that is neither a prerequisite for building a self-improving AI, nor one we can safely delegate to such an AI.

Browse

Davis on AI capability and motivation

Categories