Strong AI appears to be the topic of the week. Kevin Drum at Mother Jones thinks AIs will be as smart as humans by 2040. Karl Smith at Forbes and “M.S.” at The Economist seem to roughly concur with Drum on this timeline. Moshe Vardi, the editor-in-chief of the world’s most-read computer science magazine, predicts that “by 2045 machines will be able to do if not any work that humans can do, then a very significant fraction of the work that humans can do.”
But predicting AI is more difficult than many people think.
To explore these difficulties, let’s start with a 2009 bloggingheads.tv conversation between MIRI researcher Eliezer Yudkowsky and MIT computer scientist Scott Aaronson, author of the excellent Quantum Computing Since Democritus. Early in that dialogue, Yudkowsky asked:
It seems pretty obvious to me that at some point in [one to ten decades] we’re going to build an AI smart enough to improve itself, and [it will] “foom” upward in intelligence, and by the time it exhausts available avenues for improvement it will be a “superintelligence” [relative] to us. Do you feel this is obvious?
The idea that we could build computers that are smarter than us… and that those computers could build still smarter computers… until we reach the physical limits of what kind of intelligence is possible… that we could build things that are to us as we are to ants — all of this is compatible with the laws of physics… and I can’t find a reason of principle that it couldn’t eventually come to pass…
The main thing we disagree about is the time scale… a few thousand years [before AI] seems more reasonable to me.
Those two estimates — several decades vs. “a few thousand years” — have wildly different policy implications.
If there’s a good chance that AI will replace humans at the steering wheel of history in the next several decades, then we’d better put our gloves on and get to work making sure that this event has a positive rather than negative impact. But if we can be pretty confident that AI is thousands of years away, then we needn’t worry about AI for now, and we should focus on other global priorities. Thus it appears that “When will AI be created?” is a question with high value of information for our species.
Let’s take a moment to review the forecasting work that has been done, and see what conclusions we might draw about when AI will likely be created.
The challenge of forecasting AI
Maybe we can ask the experts? Astronomers are pretty good at predicting eclipses, even decades or centuries in advance. Technological development tends to be messier than astronomy, but maybe the experts can still give us a range of years during which we can expect AI to be built? This method is called expert elicitation.
Several people have surveyed experts working in AI or computer science about their AI timelines. Unfortunately, most of these surveys suffer from rather strong sampling bias, and thus aren’t very helpful for our purposes.1
Should we expect experts to be good at predicting AI, anyway? As Armstrong & Sotala (2012) point out, decades of research on expert performance2 suggest that predicting the first creation of AI is precisely the kind of task on which we should expect experts to show poor performance — e.g. because feedback is unavailable and the input stimuli are dynamic rather than static. Muehlhauser & Salamon (2013) add, “If you have a gut feeling about when AI will be created, it is probably wrong.”
That said, the experts surveyed in Michie (1973) — a more representative sample than in other surveys3 — did pretty well. When asked to estimate a timeline for “[computers] exhibiting intelligence at adult human level,” the most common response was “More than 50 years.” Assuming (as most people do) that AI will not arrive by 2023, these experts will have been correct.
Unfortunately, “more than 50 years” is a broad time frame that includes both “several decades from now” and “thousands of years from now.” So we don’t yet have any evidence that a representative survey of experts can predict AI within a few decades, and we have general reasons to suspect experts may not be capable of doing this kind of forecasting very well — although various aids (e.g. computational models; see below) may help them to improve their performance.
How else might we forecast when AI will be created?
Many have tried to forecast the first creation of AI by extrapolating various trends. Like Kevin Drum, Vinge (1993) based his own predictions about AI on hardware trends (e.g. Moore’s Law). But in a 2003 reprint of his article, Vinge noted the insuﬃciency of this reasoning: even if we acquire hardware suﬃcient for AI, we may not have the software problem solved.4 As Robin Hanson reminds us, “AI takes software, not just hardware.”
Perhaps instead we could extrapolate trends in software progress?5 Some people estimate the time until AI by asking what proportion of human abilities today’s software can match, and how quickly machines are “catching up.”6 Unfortunately, it’s not clear how to divide up the space of “human abilities,” nor how much each ability matters. Moreover, software progress seems to come in fits and starts.7 With the possible exception of computer chess progress, I’m not aware of any trend in software progress as robust across multiple decades as Moore’s Law is in computing hardware.
On the other hand, Tetlock (2005) points out that, at least in his large longitudinal database of pundit’s predictions about politics, simple trend extrapolation is tough to beat. Consider one example from the field of AI: when David Levy asked 1989 World Computer Chess Championship participants when a chess program would defeat the human World Champion, their estimates tended to be inaccurately pessimistic,8 despite the fact that computer chess had shown regular and predictable progress for two decades by that time. Those who forecasted this event with naive trend extrapolation (e.g. Kurzweil 1990) got almost precisely the correct answer (1997).
Hence, it may be worth searching for a measure for which (a) progress is predictable enough to extrapolate, and for which (b) a given level of performance on that measure robustly implies the arrival of Strong AI. But to my knowledge, this has not yet been done, and it’s not clear that trend extrapolation can tell us much about AI timelines until such an argument is made, and made well.
Worse, several events could significantly accelerate or decelerate our progress toward AI, and we don’t know which of these events will occur, nor in what order. For example:
- An end to Moore’s Law. The “serial speed” version of Moore’s Law broke down in 2004, requiring a leap to parallel processors, which raises substantial new difficulties for software developers (Fuller & Millett 2011). The most economically relevant formulation of Moore’s law, computations per dollar, has been maintained thus far,9 but it remains unclear whether this will continue much longer (Mack 2011; Esmaeilzadeh et al. 2012).
- Depletion of low-hanging fruit. Progress is not only a function of effort but also of the difficulty of the progress. Some fields see a pattern of increasing difficulty with each successive discovery (Arbesman 2011). AI may prove to be a field in which new progress requires far more effort than earlier progress. That is clearly the case for many parts of AI already, for example natural language processing (Davis 2012).
- Societal collapse. Political, economic, technological, or natural disasters may cause a societal collapse during which progress in AI would be essentially stalled (Posner 2004; Bostrom and Ćirković 2008).
- Disinclination. Chalmers (2010) and Hutter (2012a) think the most likely “speed bump” in our progress toward AI will be disinclination. As AI technologies become more powerful, humans may question whether it is wise to create machines more powerful than themselves.
- A breakthrough in cognitive neuroscience. It is difficult, with today’s tools, to infer the cognitive algorithms behind human intelligence (Trappenberg 2009). New tools and methods, however, might enable cognitive neuroscientists to decode how the human brain achieves its own intelligence, which might allow AI scientists to replicate that approach in silicon.
- Human enhancement. Human enhancement technologies may make scientists more effective via cognitive enhancement pharmaceuticals (Bostrom and Sandberg 2009), brain-computer interfaces (Groß 2009), and genetic selection or engineering for cognitive enhancement.10
- Quantum computing. Quantum computing has overcome some of its early hurdles (Rieffel and Polak 2011), but it remains difficult to predict whether quantum computing will contribute significantly to the development of machine intelligence. Progress in quantum computing depends on particularly unpredictable breakthroughs. Furthermore, it seems likely that even if built, a quantum computer would provide dramatic speedups only for specific applications (e.g. searching unsorted databases).
- A tipping point in development incentives. The launch of Sputnik in 1957 demonstrated the possibility of space flight to the public. This event triggered a space race between the United States and the Soviet Union, and led to long-term funding for space projects from both governments. If there is a “Sputnik moment” for AI that makes it clear to the public and to governments that smarter-than-human AI is inevitable, a race to Strong AI may ensue, especially since the winner of the AI race might reap extraordinary economic, technological and geopolitical advantage.11
Given these considerations, I think the most appropriate stance on the question “When will AI be created?” is something like this:
We can’t be confident AI will come in the next 30 years, and we can’t be confident it’ll take more than 100 years, and anyone who is confident of either claim is pretending to know too much.
How confident is “confident”? Let’s say 70%. That is, I think it is unreasonable to be 70% confident that AI is fewer than 30 years away, and I also think it’s unreasonable to be 70% confident that AI is more than 100 years away.
This statement admits my inability to predict AI, but it also constrains my probability distribution over “years of AI creation” quite a lot.
I think the considerations above justify these constraints on my probability distribution, but I haven’t spelled out my reasoning in great detail. That would require more analysis than I can present here. But I hope I’ve at least summarized the basic considerations on this topic, and those with different probability distributions than mine can now build on my work here to try to justify them.
How to reduce our ignorance
But let us not be satisfied with a declaration of ignorance. Admitting our ignorance is an important step, but it is only the first step. Our next step should be to reduce our ignorance if we can, especially for high-value questions that have large strategic implications concerning the fate of our entire species.
- Explicit quantification: “The best way to become a better-calibrated appraiser of long-term futures is to get in the habit of making quantitative probability estimates that can be objectively scored for accuracy over long stretches of time. Explicit quantification enables explicit accuracy feedback, which enables learning.”
- Signposting the future: Thinking through specific scenarios can be useful if those scenarios “come with clear diagnostic signposts that policymakers can use to gauge whether they are moving toward or away from one scenario or another… Falsifiable hypotheses bring high-flying scenario abstractions back to Earth.”13
- Leveraging aggregation: “the average forecast is often more accurate than the vast majority of the individual forecasts that went into computing the average…. [Forecasters] should also get into the habit that some of the better forecasters in [an IARPA forecasting tournament called ACE] have gotten into: comparing their predictions to group averages, weighted-averaging algorithms, prediction markets, and financial markets.” See Ungar et al. (2012) for some aggregation-leveraging results from the ACE tournament.
Many forecasting experts add that when making highly uncertain predictions, it usually helps to decompose the phenomena into many parts and make predictions about each of the parts.14 As Raiffa (1968) succinctly put it, our strategy should be to “decompose a complex problem into simpler problems, get one’s thinking straight [on] these simpler problems, paste these analyses together with a logical glue, and come out with a program for action for the complex problem” (p. 271). MIRI’s The Uncertain Future is a simple toy model of this kind, but more sophisticated computational models — like those successfully used in climate change modeling (Allen et al. 2013) — could be produced, and integrated with other prediction techniques.
We should expect AI forecasting to be difficult, but we need not be as ignorant about AI timelines as we are today.
My thanks to Carl Shulman, Ernest Davis, Louie Helm, Scott Aaronson, and Jonah Sinick for their helpful feedback on this post.
- First, Sandberg & Bostrom (2011) gathered the AI timeline predictions of 35 participants at a 2011 academic conference on human-level machine intelligence. Participants were asked by what year they thought there is a 10%, 50%, and 90% chance that AI will have been built, assuming that “no global catastrophe halts progress.” Five of the 35 respondents expressed varying degrees of confidence that human-level AI would never be achieved. The median figures, calculated from the views of the other 30 respondents, were: 2028 for “10% chance,” 2050 for “50% chance,” and 2150 for “90% chance.” Second, Baum et al. (2011) surveyed 21 participants at a 2009 academic conference on machine intelligence, and found estimates similar to those in Sandberg & Bostrom (2011). Third, Kruel (2012) has, as of May 7th, 2013, interviewed 34 people about AI timelines and risks via email, 33 of whom could be considered “experts” of one kind or another in AI or computer science (Richard Carrier is a historian). Of those 33 experts, 19 provided full, quantitative answers to Kruel’s question about AI timelines: “Assuming beneficial political and economic development and that no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of artificial intelligence that is roughly as good as humans (or better, perhaps unevenly) at science, mathematics, engineering and programming?” For those 19 experts, the median estimates for 10%, 50%, and 90% were 2025, 2035, and 2070, respectively (spreadsheet here). Fourth, Bainbridge (2005), surveying participants of 3 conferences on “Nano-Bio-Info-Cogno” technological convergence, found a median estimate of 2085 for “the computing power and scientific knowledge will exist to build machines that are functionally equivalent to the human brain.” However, the participants in these four surveys were disproportionately HLAI enthusiasts, and this introduces a significant sampling bias. The database of AI forecasts discussed in Armstrong & Sotala (2012) probably suffers from a similar problem: individuals who thought AI was imminent rather than distant were more likely to make public predictions of AI. ↩↩
- Shanteau (1992); Kahneman and Klein (2009). ↩↩
- Another survey was taken at the AI@50 conference in 2006. When participants were asked “When will computers be able to simulate every aspect of human intelligence?”, 41% said “More than 50 years” and another 41% said “Never.” Unfortunately, many of the survey participants were not AI experts but instead college students who were attending the conference. Moreover, the phrasing of the question may have introduced a bias. The “Never” answer may have been given as often as it was because some participants took “every aspect of human intelligence” to include consciousness, and many people have philosophical objections to the idea that machines could be conscious. Had they instead been asked “When will AIs replace humans in almost all jobs?”, I suspect the “Never” answer would have been far less common. As for myself, I don’t accept any of the in-principle objections to the possibility of AI. For replies to the most common of these objections, see Chalmers (1996), ch. 9, and Chalmers (2012). ↩↩
- Though, Muehlhauser & Salamon (2013) point out that “Hardware extrapolation may be a more useful method in a context where the intelligence software is already written: whole brain emulation [WBE]. Because WBE seems to rely mostly on scaling up existing technologies like microscopy and large-scale cortical simulation, WBE may be largely an “engineering” problem, and thus the time of its arrival may be more predictable than is the case for other kinds of AI.” However, it is especially difficult to forecast WBE while we do not even have a proof of concept via a simple organism like C. elegans (David Dalrymple is working on this). Moreover, much progress in neuroscience will be required (Sandberg & Bostrom 2011), and such progress is probably less predictable than hardware extrapolation. ↩↩
- I’m not sure what a general measure of software progress would look like, though we can certainly identify local examples of software progress. For example, Holdren et al. (2010) notes: “in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed… [For example] Martin Grötschel…, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later – in 2003 – this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008.” Muehlhauser & Salamon (2013) give another example: “For example, IBM’s Deep Blue played chess at the level of world champion Garry Kasparov in 1997 using about 1.5 trillion instructions per second (TIPS), but a program called Deep Junior did it in 2003 using only 0.015 TIPS. Thus, the computational eﬃciency of the chess algorithms increased by a factor of 100 in only six years (Richards and Shaw 2004).” A third example is Setty et al. (2012), which improved the efficiency of a probabilistically checkable proof method by 20 orders of magnitude with a single breakthrough. On the other hand, one can easily find examples of very slow progress, too (Davis 2012). ↩↩
- For example, see Good (1970). ↩↩
- As I wrote earlier: “Increases in computing power are pretty predictable, but for AI you probably need fundamental mathematical insights, and it’s damn hard to predict those. In 1900, David Hilbert posed 23 unsolved problems in mathematics. Imagine trying to predict when those would be solved.” Some of these problems were solved quickly, some of them required several decades to solve, and many of them remain unsolved. Even the order in which Hilbert’s problems would be solved was hard to predict. According to Erdős & Graham (1980), p. 7, “Hilbert lectured in the early 1920’s on problems in mathematics and said something like this: probably all of us will see the proof of the Riemann hypothesis, some of us… will see the proof of Fermat’s last theorem, but none of us will see the proof that √2√2 is transcendental.” In fact, these results came in the reverse order: the last was proved by Kusmin a few years later, Fermat’s last theorem was proved by Wiles in 1994, and the Riemann hypothesis still has not been proved or disproved. ↩↩
- According to Levy & Newborn (1991), one participant guessed the correct year (1997), thirteen participants guessed years from 1992-1995, twenty-eight participants guessed years from 1998-2056, and one participant guessed “Never.” Of the twenty-eight who guessed years from 1998-2056, eleven guessed year 2010 or later. ↩↩
- As Fuller & Millett (2011, p. 81) note, “When we talk about scaling computing performance, we implicitly mean to increase the computing performance that we can buy for each dollar we spend.” Most of us don’t really care whether our new computer has more transistors or some other structure; we just want it to do more stuff, more cheaply. Kurzweil (2012), ch. 10, footnote 10 shows “calculations per second per $1,000” growing exponentially from 1900 through 2010, including several data points after the serial speed version of Moore’s Law broke down in 2004. The continuation of this trend is confirmed by “instructions per second per dollar” data for 2006-2011, gathered from Intel and other sources by Chris Hallquist (spreadsheet here). Thus it seems that the computations per dollar form of Moore’s Law has continued unabated, at least for now. ↩↩
- One possible breakthrough here may be iterated embryo selection. See Miller (2012, ch. 9) for more details. ↩↩
- It is interesting, however, that the United States did not pursue extraordinary economic, technological and geopolitical advantage in the period during which it was the sole possessor of nuclear weapons. Also, it is worth noting that violence and aggression have steadily declined throughout human history (Pinker 2012). ↩↩
- Tetlock (2010) adds another recommendation: “adversarial collaboration” (Mellers et al. 2001). Tetlock explains: “The core idea is simple: rival epistemic and political camps would nominate experts to come together to reach agreements on how they disagree on North Korea or deficit reduction or global warming — and then would figure out how to resolve at least a subset of their factual disputes. The disputants would need to specify, ex ante, how much belief change each side would ‘owe’ the other if various agreed-upon empirical tests were to work out one way or the other. When adversarial collaboration works as intended, it shifts the epistemic incentives from favoring cognitive hubris (generating as many reasons why one’s own side is right and the other is wrong) and toward modesty (taking seriously the possibility that some of the other side’s objections might have some validity). This is so because there is nothing like the prospect of imminent falsification to motivate pundits to start scaling back their more grandiose generalizations: ‘I am not predicting that North Korea will become conciliatory in this time frame if we did x — I merely meant that they might become less confrontational in this wider time frame if we did x, y and z — and if there are no unexpected endogenous developments and no nasty exogenous shocks.'” Tetlock (2012), inspired by Gawande (2009), also tentatively recommends the use of checklists in forecasting: “The intelligence community has begun developing performance-appraisal checklists for analysts that nudge them in the direction of thinking more systematically about how they think. But it has yet — to our knowledge — taken the critical next step of checking the usefulness of the checklists against independent real-world performance criteria, such as the accuracy of current assessments and future projections. Our experience in the [ACE] IARPA forecasting tournament makes us cautiously optimistic that this next step is both feasible and desirable.” ↩↩
- But, let us not fool ourselves concerning the difficulty of this task. Good (1976) asserted that human-level performance in computer chess was a good signpost for human-level AI, writing that “a computer program of Grandmaster strength would bring us within an ace of [machine ultra-intelligence].” But of course this was not so. We may chuckle at this prediction today, but how obviously wrong was Good’s prediction in 1976? ↩↩
- E.g. Armstrong & Sotala (2012); MacGregor (2001); Lawrence et al. (2006). ↩↩