Improvements in AI are resulting in the automation of increasingly complex and creative human behaviors. Given enough time, we should expect artificial reasoners to begin to rival humans in arbitrary domains, culminating in artificial general intelligence (AGI).
A machine would qualify as an ‘AGI’, in the intended sense, if it could adapt to a very wide range of situations to consistently achieve some goal or goals. Such a machine would behave intelligently when supplied with arbitrary physical and computational environments, in the same sense that Deep Blue behaves intelligently when supplied with arbitrary chess board configurations — consistently hitting its victory condition within that narrower domain.
Since generally intelligent software could help automate the process of thinking up and testing hypotheses in the sciences, AGI would be uniquely valuable for speeding technological growth. However, this wide-ranging productivity also makes AGI a unique challenge from a safety perspective. Knowing very little about the architecture of future AGIs, we can nonetheless make a few safety-relevant generalizations:
- Because AGIs are intelligent, they will tend to be complex, adaptive, and capable of autonomous action, and they will have a large impact where employed.
- Because AGIs are general, their users will have incentives to employ them in an increasingly wide range of environments. This makes it hard to construct valid sandbox tests and requirements specifications.
- Because AGIs are artificial, they will deviate from human agents, causing them to violate many of our natural intuitions and expectations about intelligent behavior.
Today’s AI software is already tough to verify and validate, thanks to its complexity and its uncertain behavior in the face of state space explosions. Menzies & Pecheur (2005) give a good overview of AI verification and validation (V&V) methods, noting that AI, and especially adaptive AI, will often yield undesired and unexpected behaviors.
An adaptive AI that acts autonomously, like a Mars rover that can’t be directly piloted from Earth, represents an additional large increase in difficulty. Autonomous safety-critical AI agents need to make irreversible decisions in dynamic environments with very low failure rates. The state of the art in safety research for autonomous systems is improving, but continues to lag behind system capabilities work. Hinchman et al. (2012) write:
As autonomous systems become more complex, the notion that systems can be fully tested and all problems will be found is becoming an impossible task. This is especially true in unmanned/autonomous systems. Full test is becoming increasingly challenging on complex system. As these systems react to more environmental [stimuli] and have larger decision spaces, testing all possible states and all ranges of the inputs to the system is becoming impossible. […] As systems become more complex, safety is really risk hazard analysis, i.e. given x amount of testing, the system appears to be safe. A fundamental change is needed. This change was highlighted in the 2010 Air Force Technology Horizon report, “It is possible to develop systems having high levels of autonomy, but it is the lack of suitable V&V methods that prevents all but relatively low levels of autonomy from being certified for use.” […]
The move towards more autonomous systems has lifted this need [for advanced verification and validation techniques and methodologies] to a national level.
AI acting autonomously in arbitrary domains, then, looks particularly difficult to verify. If AI methods continue to see rapid gains in efficiency and versatility, and especially if these gains further increase the opacity of AI algorithms to human inspection, AI safety engineering will become much more difficult in the future. In the absence of any reason to expect a development in the lead-up to AGI that would make high-assurance AGI easy (or AGI itself unlikely), we should be worried about the safety challenges of AGI, and that worry should inform our research priorities today.
Below, I’ll give reasons to doubt that AGI safety challenges are just an extension of narrow-AI safety challenges, and I’ll list some research avenues people at MIRI expect to be fruitful.
Read more »