Co-authored with Jonah Sinick.
How big is the field of AI, and how big was it in the past?
This question is relevant to several issues in AGI safety strategy. To name just two examples:
- AI forecasting. Some people forecast AI progress by looking at how much has been accomplished for each calendar year of research. But as inputs to AI progress, (1) AI funding, (2) quality-adjusted researcher years (QARYs), and (3) computing power are more relevant than calendar years.1 To use these metrics to predict future AI progress, we need to know how many dollars and QARYs and computing cycles at various times in the past have been required to produce the observed progress in AI thus far.
- Leverage points. If most AI research funding comes from relatively few funders, or if most research is produced by relatively few research groups, then these may represent high-value leverage points through which one might influence the field as a whole, e.g. to be more concerned with the long-term social consequences of AI.
For these reasons and more, MIRI recently investigated the current size and past growth of the AI field. This blog post summarizes our initial findings, which are meant to provide a “quick and dirty” launchpad for future, more thorough research into the topic.
To begin, we tried to quantify the size and past growth of the field using metrics such as
- Number of researchers
- Number of journals
- Publication counts
- Number of conferences
- Number of organizations
- Famous prizes awarded for AI research
- Amount of funding
It’s difficult to interpret these figures, and they may be significantly less informative than an object level study of the research would be, but the figures still have some relevance:
- For the purpose of investigating growth, one can look at year-to-year percentage growth in the statistics, combining this with other measures of the amount of progress that has occurred in AI, in order to estimate the amount of AI research that will occur in the medium term future.
- For the purpose of investigating the current size of the AI field, one can look at the quantitative metrics relative to the corresponding metrics for computer science (CS)2 , and use these in conjunction with a holistic sense of the current size of the CS field to inform one’s holistic sense of the amount of progress that there’s been in AI.
The data that we were able to collect provide a decent picture of the size of the AI field relative to the size of the CS field, but they are insufficient to support a robust conclusion, and more investigation is warranted. Unless otherwise specified, see the spreadsheet “Current size & past growth of AI field” for the raw data on which this blog post is based.
The size of the AI field
According to a variety of metrics, the amount of AI research being done appears to be about 10% of the amount of computer science (CS) research being done. The metrics used, however, mostly capture research quantity rather than research quality, and thus may be a weak proxy for measuring how many QARYs have been invested. That said, the fact that roughly 10% of CS research prizes are awarded for AI work may indicate that research quality is similar in CS and AI.
We obtained many of the relevant figures from Microsoft Academic Search (MAS). MAS allows one to search under the headings:
- Computer science
- Artificial intelligence
- Natural language and speech
- Machine learning and pattern recognition
- Computer vision
One gets different figures depending on whether one counts the latter three subjects (hereafter referred to as “cognate disciplines”) as AI. Below, we give figures both for items that fall under the “artificial intelligence” heading alone, and for items that fall under the heading “artificial intelligence” or under the heading of one of the cognate disciplines.
Number of researchers
MAS gives number of authors in CS, AI, and the cognate disciplines of AI, but these figures don’t pick up on the amount of research done as well as publication count figures do.3
Some other relevant figures (which don’t paint a cohesive picture):
- According to the Bureau of Labor Statistics, there are 26,700 computer and information science researchers in the US.
- ACM’s Special Interest Group on Artificial Intelligence (SIGAI) has “more than 1,000 members.”
- The International Neural Network Society (INNS) has “more than 2,000 members.”
Number of journals
MAS lists 1360 CS journals, with 106 in AI, and 172 in either AI or one of AI’s cognate disciplines, so 8% and 13% respectively.4
Between 2005 and 2010, of those publications listed under MAS’s “CS” heading, about 10% were listed under “AI” and about 20% were listed under “AI” or one of its cognate disciplines.5 One sees roughly the same percentages if one looks at publications between 1990 and 1995, between 1995 and 2000, and between 2000 and 2005.6 Searching Google Scholar for “Computer Science” and “Artificial Intelligence,” one finds that the number of hits for the latter search is about 30% the number of hits for the former search,7 which could mean that the amount of AI research is significantly more than 10% the amount of CS research, but some papers that contain the phrase “artificial intelligence” are not artificial intelligence research, and some computer science papers may not contain the phrase “computer science.”
Number of conferences
MAS lists 3,519 “top conferences” in CS and 361 “top conferences” in AI, and the former number is about 10% of the latter number. There are 561 “top conferences” in AI or cognate disciplines, so 16% the number of CS conferences.8
Number of organizations
Microsoft Academic Search lists 11,338 organizations for CS and 7,125 organizations for AI, so 63%. If one counts cognate disciplines as AI, the number of AI organizations is 21,802, so 192% that of CS organizations.9 Taken in isolation, this would suggest that the amount of AI research is much greater than 10%.
“Number of organizations” seems likely to be a weaker metric of amount of research than “number of publications,” etc., so this should be discounted. Nevertheless, the fact that the ratio of AI organizations to CS organizations is so much higher than the other ratios that we looked at is a puzzle. Perhaps the difference comes from the CS community and the AI community having different cultural norms. Or, perhaps MAS is less consistent about how it counts organizations than how it counts publications.
Famous prizes awarded for AI research vs. CS research
Amount of funding
In 2011, the National Science Foundation (NSF) received $636 million for funding CS research (through CISE). Of this, $169 million went to Information and Intelligent Systems (IIS). IIS has three programs: Cyber-Human Systems (CHS), Information Integration and Informatics (III) and Robust Intelligence (RI). If roughly 1/3 of the funding went to each of these, then $56 million went to Robust Intelligence, so 9% of the total CS funding. (Some CISE funding may have gone to AI work outside of IIS — that is, via ACI, CCF, or CNS — but at a glance, non-IIS AI funding through CISE looks negligible.)
Other major U.S. funding sources for CS research include ONR, DARPA, and several companies (Microsoft, Google, IBM, etc.) but we have not investigated these funding sources yet. We also did not investigate non-U.S. funding sources.
The growth of the AI field
We did not investigate the growth rate of the number of AI researchers in sufficient depth to make meaningful estimates. However, the growth rate of the number of scientists and engineers in all fields might serve as a very weak proxy measure for the growth rates of AI or CS.
For example, the annual growth rate of science and engineering researchers in OECD countries, between 1995 and 2005, appears to be about 3.3%, corresponding to a doubling time of 23 years.12 This needs to be viewed in juxtaposition with indications that average researcher productivity (as measured by patents per researcher, amount of time spent training per researcher, the number of coauthors per paper, and the number of papers cited) has been decreasing.13 The NSF Budget for Information and Intelligent Systems (IIS) has generally increased between 4% and 20% per year since 1996, with a one-time percentage boost of 60% in 2003, for a total increase of 530% over the 15 year period between 1996 and 2011.14 “Robust Intelligence” is one of three program areas covered by this budget. According to MAS, the number of publications in AI grew by 100+% every 5 years between 1965 and 1995, but between 1995 and 2010 it has been growing by about 50% every 5 years. One sees a similar trend in machine learning and pattern recognition.15
Notes on further research
Future research on this topic could dig much deeper, and come to more robust conclusions. Our purpose here is to lay some groundwork for future research. With that in mind, here are some miscellaneous notes to future researchers investigating the current size and past growth of the AI field:
- If the papers being cited are newer, that could indicate more rapid progress. On the other hand, it could also indicate faddishness, and one would somehow need to differentiate between the two things.
- Some citation databases that could be useful for analyzing citation patterns are are Scopus, Web of Science, MS Academic Search, and Science Citation Index (SSI).16
- Some sources of noise in citation counts are: (a) Journal editors asking authors of submitted papers to add citations to other papers in the same journal in order to boost the journal’s impact factor & (b) Authors citing their own papers excessively in order to increase their citation counts.17
Our thanks to Sebastian Nickel for data-gathering, and to Carl Shulman for his feedback.
- Another important input metric is theoretical progress imported from other fields, e.g. methods from statistics. ↩
- It’s also worth noting the following point. Suppose that a source S can be used to generate an estimate E1 for a quantity Q1 having to do with AI and an estimate E2 having to do with CS. Then E1 and E2 may overstate or understate Q1 and Q2 (respectively). Let the factors by which E1 and E2 differ from Q1 and Q2 be F1 and F2. We don’t have good estimates for F1 and F2, but if we compute the ratio 1(E1)/(E2) we get [(Q1)/(Q2)]*[(F1)/(F2)]. The quantity (F1)/(F2) will be closer to 1 than F1 is to 1, because the some of the factors that lead E1 to deviate from Q1 to given degrees will also lead E2 to deviate from Q2 to similar degrees. So (E1)/(E2) is closer to (Q1)/(Q2) (in relative terms) than E1 is to Q1 (in relative terms). ↩
- MAS shows 1.6 million authors in CS and 0.26 authors million in AI, so 16%. If one adds up the listed number of authors in AI and cognate disciplines, the figure rises to 39%. However some authors publish in multiple disciplines (for example, an author might publish in both artificial intelligence and machine learning). ↩
- Cells B96 through B100 of the spreadsheet. ↩
- Some papers may be listed under multiple categories, making it unclear whether the 10% figure or the 20% figure is more representative. ↩
- Table with upper left hand corner A2 in the spreadsheet. ↩
- Google Scholar results:
Search term “computer science” (in quotes) yields 2,650,000 results
“artificial intelligence” -> 1,710,000
“machine intelligence” -> 655,000
Search term “computer science” (in quotes) yields 99,600 results
“artificial intelligence” -> 32,300
“machine intelligence” -> 11,600
Search term “computer science” (in quotes) yields 163,000 results
“artificial intelligence” -> 52,500
“machine intelligence” -> 22,600
Search term “computer science” (in quotes) yields 247,000 results
“artificial intelligence” -> 66,100
“machine intelligence” -> 23,000 ↩
- Cell B27 through Cell B31 of the spreadsheet. ↩
- Cell B119 through Cell B123 of the spreadsheet. ↩
- The annual Turing Prize was first awarded in 1966 (last prize 2012), so 46 prizes so far. Of those, 6 were for achievements in AI related research, namely:
• 1969 Marvin Minsky
• 1971 John McCarthy
• 1975 Newell & Simon
• 1991 Robin Milner (machine assisted proof construction)
• 1994 Edward Feigenbaum & Raj Reddy
• 2010 Leslie Valiant (Probably Approximately Correct Learning)
• 2011 Judea Pearl
8 of the 46 prizes were awarded to 2 people, and another 2 were awarded to 3 people, so the total number of recipients is 58, out of which 8 received the prize for AI-related achievements. ↩
- The Nevanlinna Prize has been awarded every 4 years since 1982; 8 times so far. ↩
- From U.S. National Science Foundation (NSF). Science and Engineering Indicators: 2010, Chapter 3. Science and Engineering Labor Force:
In the early 1960s, a prominent historian of science, Derek J. de Solla Price, examined the growth of science and the number of scientists over very long periods in history and summarized his findings in a book entitled Science Since Babylon (1961). Using a number of empirical measures (most over at least 300 years), Price found that science, and the number of scientists, tended to double about every 15 years, with measures of higher quality science and scientists tending to grow slower (doubling every 20 years) and measures of lower quality science and scientists tending to grow faster (every 10 years). According to Price (1961), one implication of this long-term exponential growth is that “80 to 90% of all the scientists that ever lived are alive today.” This insight follows from the likelihood that most of the scientists from the past 45 years (a period of three doublings) would still be alive. Price was interested in many implications of these growth patterns, but in particular, he was interested in the idea that this growth could not continue indefinitely and the number of scientists would reach “saturation.” Price was concerned in 1961 that saturation had already begun.
How different are the growth rates in the number of scientists and engineers in recent periods from what Price estimated for past centuries? Table 3-A shows growth rates for some measurements of the S&E labor force in the United States and elsewhere in the world for a period of available data. Of these measures, the number of S&E doctorate holders in the United States labor force showed the lowest average annual growth of 2.4% (doubling in 31 years if this growth rate were to continue). The number of doctorate holders employed in S&E occupations in the United States showed a faster average annual growth of 3.8% (doubling in 20 years if continued). There are no global counts of individuals in S&E, but counts of “researchers” in member countries of the Organisation for Economic Co-operation and Development (OECD) grew at an average annual rate of 3.3% (doubling in 23 years if continued). Data on the population of scientists and engineers in most developing countries are very limited, but OECD data for researchers in China show a 10.8% average annual growth rate (doubling in 8 years if continued). All these numbers are broadly consistent with a continuation of growth in S&E labor exceeding the rate of growth in the general labor force. ↩
- Below are some references on declining productivity per researcher. Our thanks to Gwern for compiling many of these in the article Scientific Stagnation:
• Machlup, Fritz. The Production and Distribution of Knowledge in the United States, Princeton, NJ: Princeton University Press, 1962, 170-176
• Segerstrom, Paul. Endogenous Growth Without Scale Effects, American Economic Review, December 1998, 88, 1290-1310
• Terman, F.E. A Brief History of Electrical Engineering Education, Proceedings of the IEEE, August 1998, 86 (8), 1792-1800
• Adams, James D., Black, Grant C., Clemmons, J.R., and Stephan, Paula E. Scientific Teams and Institutional Collaborations: Evidence from U.S. Universities, 1981-1999, NBER Working Paper #10640, July 2004
• Jones (2006), Age and Great Invention
• Jones, Benjamin F. The Burden of Knowledge and the Death of the Renaissance Man: Is Innovation Getting Harder? NBER Working Paper #11360, 2005
• National Research Council, On Time to the Doctorate: A Study of the Lengthening Time to Completion for Doctorates in Science and Engineering, Washington, DC: National Academy Press, 1990
Tilghman, Shirley (chair) et al. Trends in the Early Careers of Life Sciences, Washington, DC: National Academy Press, 1998
• Zuckerman, Harriet and Merton, Robert. Age, Aging, and Age Structure in Science, in Merton, Robert, The Sociology of Science, Chicago, IL: University of Chicago Press, 1973, 497-559
• Cronin et al, 2004 Visible, Less Visible, and Invisible Work: Patterns of Collaboration in 20th Century Chemistry, Journal of the American Society for Information Science and Technology, 2004, 55(2), 160-168
• Grossman, Jerry. The Evolution of the Mathematical Research Collaboration Graph, Congressus Numerantium, 2002, 158, 202-212
• Cronin, Blaise, Shaw, Debora, and La Barre, Kathryn. A Cast of Thousands: Coauthorship and Subauthorship Collaboration in the 20th Century as Manifested in the Scholarly Journal Literature of Psychology and Philosophy, Journal of the American Society for Information Science and Technology, 2003, 54(9), 855-871
• McDowell, John, and Melvin, Michael. The Determinants of Coauthorship: An Analysis of the Economics Literature, Review of Economics and Statistics, February 1983, 65, 155-160
• Hudson, John. Trends in Multi-Authored Papers in Economics, Journal of Economic Perspectives, Summer 1996, 10, 153-158
• Laband, David and Tollison, Robert. Intellectual Collaboration, Journal of Political Economy, June 2000, 108, 632-662
• Jones 2010. As Science Evolves, How Can Science Policy?
• The Collapse of the Soviet Union and the Productivity of American Mathematicians, by George J. Borjas and Kirk B. Doran, NBER Working Paper No. 17800, February 2012
- See table with upper left-hand corner A367 in the spreadsheet. ↩
- Table with upper left hand corner A2 in the spreadsheet. ↩
- A 2008 study compared PubMed, Scopus, Web of Science, and Google Scholar and concluded: “PubMed and Google Scholar are accessed for free […] Scopus offers about 20% more coverage than Web of Science, whereas Google Scholar offers results of inconsistent accuracy. PubMed remains an optimal tool in biomedical electronic research. Scopus covers a wider journal range […] but it is currently limited to recent articles (published after 1995) compared with Web of Science. Google Scholar, as for the Web in general, can help in the retrieval of even the most obscure information but its use is marred by inadequate, less often updated, citation information.” Larsen & von Ins (2010) claim that the coverage of SSI has been declining. ↩
- Here are some caveats about citations as a measure of quality: Wilhite and Fong (2012): “…impact factors continue to be a primary means by which academics “quantify the quality of science”. One side effect of impact factors is the incentive they create for editors to coerce authors to add citations to their journal. Coercive self-citation does not refer to the normal citation directions, given during a peer- review process, meant to improve a paper. Coercive self-citation refers to requests that (i) give no indication that the manuscript was lacking in attribution; (ii) make no suggestion as to specific articles, authors, or a body of work requiring review; and (iii) only guide authors to add citations from the editor’s journal.” And Storbeck (2012): “The [extent] of manipulation is amazing. For example, according to figures published by the Managing Editor of the ‘Review of Finance’, the impact factor of the ‘Journal of Banking and Finance’ – the fourth worst offender according to the study by Wilhite and Fong – dwindles if self-citations are excluded. While the raw impact factor of the journal is 2.731, the one without self-citations is just 0.748.” ↩