Last week, Nate Soares outlined his case for prioritizing long-term AI safety work:
1. Humans have a fairly general ability to make scientific and technological progress. The evolved cognitive faculties that make us good at organic chemistry overlap heavily with the evolved cognitive faculties that make us good at economics, which overlap heavily with the faculties that make us good at software engineering, etc.
2. AI systems will eventually strongly outperform humans in the relevant science/technology skills. To the extent these faculties are also directly or indirectly useful for social reasoning, long-term planning, introspection, etc., sufficiently powerful and general scientific reasoners should be able to strongly outperform humans in arbitrary cognitive tasks.
3. AI systems that are much better than humans at science, technology, and related cognitive abilities would have much more power and influence than humans. If such systems are created, their decisions and goals will have a decisive impact on the future.
4. By default, smarter-than-human AI technology will be harmful rather than beneficial. Specifically, it will be harmful if we exclusively work on improving the scientific capability of AI agents and neglect technical work that is specifically focused on safety requirements.
To which I would add:
- Intelligent, autonomous, and adaptive systems are already challenging to verify and validate; smarter-than-human scientific reasoners present us with extreme versions of the same challenges.
- Smarter-than-human systems would also introduce qualitatively new risks that can’t be readily understood in terms of our models of human agents or narrowly intelligent programs.
None of this, however, tells us when smarter-than-human AI will be developed. Soares has argued that we are likely to be able to make early progress on AI safety questions; but the earlier we start, the larger is the risk that we misdirect our efforts. Why not wait until human-equivalent decision-making machines are closer at hand before focusing our efforts on safety research?
One reason to start early is that the costs of starting too late are much worse than the costs of starting too early. Early work can also help attract more researchers to this area, and give us better models of alternative approaches. Here, however, I want to focus on a different reason to start work early: the concern that a number of factors may accelerate the development of smarter-than-human AI.
AI speedup thesis. AI systems that can match humans in scientific and technological ability will probably be the cause and/or effect of a period of unusually rapid improvement in AI capabilities.
If general scientific reasoners are invented at all, this probably won’t be an isolated event. Instead, it is likely to directly feed into the development of more advanced AI. Similar considerations suggest that such systems may be the result of a speedup in intelligence growth rates, as measured in the cognitive and technological output of humans and machines.
When AI capabilities work is likely to pick up speed more than AI safety work does, putting off safety work raises larger risks (because we may be failing to account for future speedup effects that give us less time than is apparent) and is less useful (because we have a shorter window of time between ‘we have improved AI algorithms we can use to inform our safety work’ and ‘our safety work needs to be ready for implementation’).
I’ll note four broad reasons to expect speedups:
1. Overlap between accelerators of AI progress and enablers/results of AI progress. In particular, progress in automating science and engineering work can include progress in automating AI work.
2. Overall difficulty of AI progress. If smarter-than-human AI is sufficiently difficult, its invention may require auxiliary technologies that effect a speedup. Alternatively, even if such technologies aren’t strictly necessary for AI, they may appear before AI if they are easier to develop.
3. Discontinuity of AI progress. Plausibly, AI development won’t advance at a uniform pace. There will sometimes be very large steps forward, such as new theoretical insights that resolve a number of problems in rapid succession. If a software bottleneck occurs while hardware progress continues, we can expect a larger speedup when a breakthrough occurs: Shulman and Sandberg argue that the availability of cheap computing resources in this scenario would make it much easier to quickly copy and improve on advanced AI software.
4. Increased interest in AI. As AI software increases in capability, we can expect increased investment in the field, especially if a race dynamic develops.
Intelligence explosion is an example of a speedup of the first type. In an intelligence explosion scenario, the ability of AI systems to innovate within the field of AI leads to a positive feedback loop of accelerating progress resulting in superintelligence.
Intelligence explosion and other forms of speedup are often conflated with the hypothesis that smarter-than-human AI is imminent; but some reasons to expect speedups (e.g., ‘overall difficulty of AI progress’ and ‘discontinuity of AI progress’) can equally imply that smarter-than-human AI systems are further off than many researchers expect.
Are there any factors that could help speed up safety work relative to capabilities work? Some have suggested that interest in safety is likely to increase as smarter-than-human AI draws nearer. However, this might coincide with a compensatory increase in AI capabilities investment. Since systems approaching superintelligence will have incentives to appear safe, it is also possible that safety work will erroneously appear less necessary when AI systems approach humans in intelligence, as in Nick Bostrom’s treacherous turn scenario.
We could also imagine outsourcing AI safety work to sufficiently advanced AI systems, just as we might outsource AI capabilities work. However, it is likely to take a special effort to reach the point where we can (safely) delegate a variety of safety tasks before we can delegate a comparable amount of capabilities work.
On the whole, capabilities speedup effects make it more difficult to make robust predictions about AI timelines. If rates of progress are discontinuous, highly capable AI systems may continue to appear about equally far off until shortly before their invention. This suggests that it would be unwise to wait until advanced AI appears to be near to begin investing in basic AI safety research.