One of the most common objections we hear when talking about artificial general intelligence (AGI) is that “AGI is ill-defined, so you can’t really say much about it.”
In an earlier post, I pointed out that we often don’t have precise definitions for things while doing useful work on them, as was the case with the concepts of “number” and “self-driving car.”
Still, we must have some idea of what we’re talking about. Earlier I gave a rough working definition for “intelligence.” In this post, I explain the concept of AGI and also provide several possible operational definitions for the idea.
The idea of AGI
As discussed earlier, the concept of “general intelligence” refers to the capacity for efficient cross-domain optimization. Or as Ben Goertzel likes to say, “the ability to achieve complex goals in complex environments using limited computational resources.” Another idea often associated with general intelligence is the ability to transfer learning from one domain to other domains.
To illustrate this idea, let’s consider something that would not count as a general intelligence.
Computers show vastly superhuman performance at some tasks, roughly human-level performance at other tasks, and subhuman performance at still other tasks. If a team of researchers was able to combine many of the top-performing “narrow AI” algorithms into one system, as Google may be trying to do,1 they’d have a massive “Kludge AI” that was terrible at most tasks, mediocre at some tasks, and superhuman at a few tasks.
Like the Kludge AI, particular humans are terrible or mediocre at most tasks, and far better than average at just a few tasks.2 Another similarity is that the Kludge AI would probably show measured correlations between many different narrow cognitive abilities, just as humans do (hence the concepts of g and IQ3): if we gave the Kludge AI lots more hardware, it could use that hardware to improve its performance in many different narrow domains simultaneously.4
On the other hand, the Kludge AI would not (yet) have general intelligence, because it wouldn’t necessarily have the capacity to solve somewhat-arbitrary problems in somewhat-arbitrary environments, wouldn’t necessarily be able to transfer learning in one domain to another, and so on.
Operational definitions of AGI
Can we be more specific? This idea of general intelligence is difficult to operationalize. Below I consider four operational definitions for AGI, in (apparent) increasing order of difficulty.
The Turing test ($100,000 Loebner prize interpretation)
One specific interpretation is provided by the conditions for winning the $100,000 Loebner Prize. Since 1990, Hugh Loebner has offered $100,000 to the first AI program to pass this test at the annual Loebner Prize competition. Smaller prizes are given to the best-performing AI program each year, but no program has performed well enough to win the $100,000 prize.
The exact conditions for winning the $100,000 prize will not be defined until a program wins the $25,000 “silver” prize, which has not yet been done. However, we do know the conditions will look something like this: A program will win the $100,000 if it can fool half the judges into thinking it is human while interacting with them in a freeform conversation for 30 minutes and interpreting audio-visual input.
The coffee test
Goertzel et al. (2012) suggest a (probably) more difficult test — the “coffee test” — as a potential operational definition for AGI:
go into an average American house and figure out how to make coffee, including identifying the coffee machine, figuring out what the buttons do, finding the coffee in the cabinet, etc.
If a robot could do that, perhaps we should consider it to have general intelligence.5
The robot college student test
Goertzel (2012) suggests a (probably) more challenging operational definition, the “robot college student test”:
when a robot can enrol in a human university and take classes in the same way as humans, and get its degree, then I’ll [say] we’ve created [an]… artificial general intelligence.
The employment test
Machines exhibiting true human-level intelligence should be able to do many of the things humans are able to do. Among these activities are the tasks or “jobs” at which people are employed. I suggest we replace the Turing test by something I will call the “employment test.” To pass the employment test, AI programs must… [have] at least the potential [to completely automate] economically important jobs.6
To develop this operational definition more completely, one could provide a canonical list of “economically important jobs,” produce a special vocational exam for each job (e.g. both the written and driving exams required for a U.S. commercial driver’s license), and measure machines’ performance on those vocational exams.
This is a bit “unfair” because I doubt that any single human could pass such vocational exams for any long list of economically important jobs. On the other hand, it’s quite possible that many unusually skilled humans would be able to pass all or nearly all such vocational exams if they spent an entire lifetime training each skill, and an AGI — having near-perfect memory, faster thinking speed, no need for sleep, etc. — would presumably be able to train itself in all required skills much more quickly, if it possessed the kind of general intelligence we’re trying to operationally define.
The future is foggy
One or more of these operational definitions for AGI might seem compelling, but a look at history should teach us some humility.
Decades ago, several leading AI scientists seemed to think that human-level performance at chess could represent an achievement of AGI-proportions. Here are Newell et al. (1958):
Chess is the intellectual game par excellence… If one could devise a successful chess machine, one would seem to have penetrated to the core of human intellectual endeavor.7
As late as 1976, I.J. Good asserted that human-level performance in computer chess was a good signpost for AGI, writing that “a computer program of Grandmaster strength would bring us within an ace of [machine ultra-intelligence].”
But machines surpassed the best human chess players about 15 years ago, and we still seem to be several decades away from AGI.
The surprising success of self-driving cars may offer another lesson in humility. Had I been an AI scientist in the 1960s, I might well have thought that a self-driving car as capable as Google’s driverless car would indicate the arrival of AGI. After all, a self-driving car must act with high autonomy, at high speeds, in an extremely complex, dynamic, and uncertain environment: namely, the real world. It must also (on rare occasions) face genuine moral dilemmas such as the philosopher’s trolley problem. Instead, Google built its driverless car with a series of “cheats” I might not have conceived of in the 1960s — for example by mapping with high precision almost every road, freeway on-ramp, and parking lot in the country before it built its driverless car.
So, what’s a good operational definition for AGI? I personally lean toward Nilsson’s employment test, but you might have something else in mind when you talk about AGI.
I expect to pick a new working definition sometime in the next 20 years, as AGI draws nearer, but Nilsson’s operationalization will do for now.
My thanks to Carl Shulman, Ben Goertzel, and Eliezer Yudkowsky for their feedback on this post.
- In an interview with The Register, Google head of research Alfred Spector said, “We have the knowledge graph, [the] ability to parse natural language, neural network tech [and] enormous opportunities to gain feedback from users… If we combine all these things together with humans in the loop continually providing feedback our systems become … intelligent.” Spector calls this the “combination hypothesis.” ↩
- Though, there are probably many disadvantaged humans for which this is not true, because they do not show far-above-average performance on any tasks. ↩
- Psychologists now generally agree that there is a general intelligence factor in addition to more specific mental abilities. For an introduction to the modern synthesis, see Gottfredson (2011). For more detail, see the first few chapters of Sternberg & Kaufman (2011). If you’ve read Cosma Shalizi’s popular article “g, a Statistical Myth, please also read its refutation here and here. ↩
- In psychology, the factor analysis is done between humans. Here, I’m suggesting that a similar factor analysis could hypothetically be done between different Kludge AIs, with different Kludge AIs running basically the same software but having access to different amounts of computation. The analogy should not be taken too far, however. For example, it isn’t the case that higher-IQ humans have much larger brains than other humans. ↩
- The coffee test was inspired by Steve Wozniak’s prediction that we would never “build a robot that could walk into an unfamiliar house and make a cup of coffee” (Adams et al. 2011). Wozniak’s original prediction was made in a PC World piece from July 19, 2007 called Three Minutes with Steve Wozniak. ↩
- First, Nilsson proposes that to pass the employment test, “AI programs must be able to perform the jobs ordinarily performed by humans.” But later, he modifies this specification: “For the purposes of the employment test, we can ﬁnesse the matter of whether or not human jobs are actually automated. Instead, I suggest, we can test whether or not we have the capability to automate them.” In part, he suggests this modification because “many of today’s jobs will likely disappear — just as manufacturing buggy whips did.” ↩
- A bit later, they add a note of caution: “Now there might [be] a trick… something that [is] as the wheel to the human leg: a device quite different from humans in its methods, but supremely effective in its way, and perhaps very simple. Such a device might play excellent chess, but… fail to further our understanding of human intellectual processes. Such a prize, of course, would be worthy of discovery in its own right, but there are appears to be nothing of this sort in sight.” ↩