Transparency in Safety-Critical Systems

 |   |  Analysis

In this post, I aim to summarize one common view on AI transparency and AI reliability. It’s difficult to identify the field’s “consensus” on AI transparency and reliability, so instead I will present a common view so that I can use it to introduce a number of complications and open questions that (I think) warrant further investigation.

Here’s a short version of the common view I summarize below:

Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.

The value of transparency in system design

Nusser (2009) writes:

…in the field of safety-related applications it is essential to provide transparent solutions that can be validated by domain experts. “Black box” approaches, like artificial neural networks, are regarded with suspicion – even if they show a very high accuracy on the available data – because it is not feasible to prove that they will show a good performance on all possible input combinations.

Unfortunately, there is often a tension between AI capability and AI transparency. Many of AI’s most powerful methods are also among its least transparent:

Methods that are known to achieve a high predictive performance — e.g. support vector machines (SVMs) or artificial neural networks (ANNs) — are usually hard to interpret. On the other hand, methods that are known to be well-interpretable — for example (fuzzy) rule systems, decision trees, or linear models — are usually limited with respect to their predictive performance.1

But for safety-critical systems — and especially for AGI — it is important to prioritize system reliability over capability. Again, here is Nusser (2009):

strict requirements [for system transparency] are necessary because a safety-related system is a system whose malfunction or failure can lead to serious consequences — for example environmental harm, loss or severe damage of equipment, harm or serious injury of people, or even death. Often, it is impossible to rectify a wrong decision within this domain.

The special need for transparency in AI has also been stressed by many others,2 including Boden (1977):

Members of the artificial intelligence community bear an ominous resemblance to… the Sorcerer’s Apprentice. The apprentice learnt just enough magic… to save himself the trouble of performing an onerous task, but not quite enough to stop the spellbound buckets and brooms from flooding the castle…

[One question I shall ask is] whether there are any ways of writing programs that would tend to keep control in human hands… [For one thing,] programs should be intelligible and explicit, so that “what is going on” is not buried in the code or implicitly embodied in procedures whose aim and effect are obscure.

A spectrum from black box to transparent

Non-transparent systems are sometimes called “black boxes”:

a black box is a device, system or object which can be viewed in terms of its input, output and transfer characteristics without any knowledge of its internal workings. Its implementation is “opaque” (black). Almost anything might be referred to as a black box: a transistor, an algorithm, or the human mind.

…[And] in practice some [technically transparent] systems are so complex that [they] might as well be [black boxes].3

The human brain is mostly a black box. We can observe its inputs (light, sound, etc.), its outputs (behavior), and some of its transfer characteristics (swinging a bat at someone’s eyes often results in ducking or blocking behavior), but we don’t know very much about how the brain works. We’ve begun to develop an algorithmic understanding of some of its functions (especially vision), but only barely.4

Many contemporary AI methods are effectively black box methods. As Whitby (1996) explains, the safety issues that arise in “GOFAI” (e.g. search-based problem solvers and knowledge-based systems) “are as nothing compared to the [safety] problems which must be faced by newer approaches to AI… Software that uses some sort of neural net or genetic algorithm must face the further problem that it seems, often almost by definition, to be ‘inscrutable’. By this, I mean that… we can know that it works and test it over a number of cases but we will not in the typical case ever be able to know exactly how.”

Other methods, however, are relatively transparent, as we shall see below.

This post cannot survey the transparency of all AI methods; there are too many. Instead, I will focus on three major “families” of AI methods.

Examining the transparency of three families of AI methods

Machine learning

Machine learning is perhaps the largest and most active subfield in AI, encompassing a wide variety of methods by which machines learn from data. For an overview of the field, see Flach (2012). For a quick video intro, see here.

Unfortunately, machine learning methods tend not to be among the most transparent methods.5 Nusser (2009) explains:

machine learning approaches are regarded with suspicion by domain experts in safety-related application fields because it is often infeasible to sufficiently interpret and validate the learned solutions.

For now, let’s consider one popular machine learning method in particular: artificial neural networks (ANNs). (For a concise video introduction, see here.) As Rodvold (1999) explains, ANNs are typically black boxes:

the intelligence of neural networks is contained in a collection of numeric synaptic weights, connections, transfer functions, and other network definition parameters. In general, inspection of these quantities yields little explicit information to enlighten developers as to why a certain result is being produced.

Also, Kurd (2005):

it is common for typical ANNs to be treated as black-boxes… because ANN behaviour is scattered across its weights and links with little meaning to an observer. As a result of this unstructured and unorganised representation of behaviour, it is often not feasible to completely understand and predict their function and operation… The interpretation problems associated with ANNs impede their use in safety critical contexts…6

Deep learning is another popular machine learning technique. It, too, tends to be non-transparent — like ANNs, deep learning methods were inspired by how parts of the brain work, in particular the visual system.7

Some machine learning methods are more transparent than others. Bostrom & Yudkowsky (2013) explain:

If [a] machine learning algorithm is based on a complicated neural network… then it may prove nearly impossible to understand why, or even how, the algorithm [made its judgments]. On the other hand, a machine learner based on decision trees or Bayesian networks is much more transparent to programmer inspection (Hastie et al. 2001), which may enable an auditor to discover [why the algorithm made the judgments it did].8

Moreover, recent work has attempted to make some machine learning methods more transparent, and thus perhaps more suitable for safety-critical applications. For example, Taylor (2005) suggests methods for extracting rules (which refer to human-intuitive concepts) from neural networks, so that researchers can perform formal safety analyses of the extracted rules. These methods are still fairly primitive, and are not yet widely applicable or widely used, but further research could make these methods more useful and popular.9

Evolutionary algorithms

Evolutionary algorithms (EAs) are often categorized as a machine learning method, but here they will be considered separately. EAs use methods inspired by evolution to produce candidate solutions to problems. For example, watch this video of software robots evolving to “walk” quickly.

Because evolutionary algorithms use a process of semi-random mutation and recombination to produce candidate solutions, complex candidate solutions tend not to be transparent — just like the evolultionarily-produced brain. Mitchell (1998), p. 40 writes:

Understanding the results of [genetic algorithm] evolution is a general problem — typically the [genetic algorithm] is asked to find [candidate solutions] that achieve high fitness but is not told how that high fitness is to be attained. One could say that this is analogous to the difficulty biologists have in understanding the products of natural evolution (e.g., us)… In many cases… it is difficult to understand exactly how an evolved high−fitness [candidate solution] works. In genetic programming, for example, the evolved programs are often very long and complicated, with many irrelevant components attached to the core program performing the desired computation. It is usually a lot of work — and sometimes almost impossible — to figure out by hand what that core program is.

Fleming & Purshouse (2002) add:

Mission-critical and safety-critical applications do not appear, initially, to be favourable towards EA usage due to the stochastic nature of the evolutionary algorithm. No guarantee is provided that the results will be of sufficient quality for use on-line.

Logical methods

Logical methods in AI are implemented widely in safety-critical applications (e.g. medicine), but see far less application in general compared to machine learning methods.

In a logic-based AI, the AI’s knowledge and its systems for reasoning are written out in logical statements. These statements are typically hand-coded, and the meaning of each statement has a precise meaning determined by the axioms of the logical system being used (e.g. first-order logic). Russell & Norvig (2009), the leading AI textbook, describes the logical approach to AI in chapter 7. It describes a popular application of logical AI, called “classical planning,” in chapter 10. Also see Thomason (2012) and Minker (2000).

Galliers (1988, p. 88-89) explains the transparency advantages of logic-based methods in AI:

A theory expressed as a set of logical axioms is evident; it is open to examination. This assists the process of determining whether any parts of the theory are inconsistent, or do not behave as had been anticipated when they were expressed in English… Logics are languages with precise semantics [and therefore] there can be no ambiguities of interpretation… By expressing the properties of agents… as logical axioms and theorems… the theory is transparent; properties, interrelationships and inferences are open to examination… This contrasts with the use of computer code [where] it is frequently the case that computer systems concerned with… problem-solving are in fact designed such that properties of the interacting agents are implicit properties of the entire system, and it is impossible to investigate the role or effects of any individual aspect.10

Another transparency advantage of logical methods in AI comes from logic languages’ capacity to represent different kinds of machines, including machines that can reflect on themselves and the reasons for their beliefs. For example, they can pass assumptions around with each datum. E.g. see the “domino” agent in Fox & Das (2000).

Moreover, some logic-based approaches are amenable to formal methods, for example formal verification: mathematically proving that a system will perform correctly with respect to a formal specification.11 Formal methods complement empirical testing of software, e.g. by identifying “corner bugs” that are difficult to find when using empirical methods only — see Mitra 2008.

Formal verification is perhaps best known for its use in verifying hardware components (especially since the FDIV bug that cost Intel $500 million), but it is also used to verify a variety of software programs (in part or in whole), including flight control systems (Miller et al. 2005), rail control systems (Platzer & Quesel 2009), pacemakers (Tuan et al. 2010), compilers (Leroy 2009), operating system kernels (Andronick 2011), multi-agent systems (Raimondi 2006), outdoor robots (Proetzsch et al. 2007), and swarm robotics (Dixon et al. 2012).

Unfortunately, formal methods face severe limitations. Fox (1993) explains:

there are severe limitations on the capability of formal design techniques to completely prevent hazardous situations from arising. Current formal design methods are difficult to use and time-consuming, and may only be practical for relatively modest applications. Even if we reserve formal techniques for the safety-critical elements of the system we have seen that the soundness guaranteed by the techniques can only be as good as the specifier’s ability to anticipate the conditions and possible hazards that can hold at the time of use… These problems are difficult enough for ‘closed systems’ in which the designer can be confident, in principle, of knowing all the parameters which can affect system performance… Unfortunately all systems are to a greater or lesser extent ‘open’; they operate in an environment which cannot be exhaustively monitored and in which unpredictable events will occur. Furthermore, reliance on specification and verification methods assumes that the operational environment will not compromise the correct execution of software. In fact of course software errors can be caused by transient faults causing data loss or corruption; user errors; interfacing problems with external systems (such as databases and instruments); incompatibilities between software versions; and so on.12

Bowen & Hinchey (1995) concur:

There are many… areas where, although possible, formalization is just not practical from a resource, time, or financial aspect. Most successful formal methods projects involve the application of formal methods to critical portions of system development. Only rarely are formal methods, and formal methods alone, applied to all aspects of system development. Even within the CICS project, which is often cited as a major application of formal methods… only about a tenth of the entire system was actually subjected to formal techniques…

[We suggest] the following maxim: System development should be as formal as possible, but not more formal.

For more on the use of formal methods for AI safety, see Rushby & Whitehurst (1989); Bowen & Stavridou (1993); Harper (2000); Spears (2006); Fischer et al (2013).13

Some complications and open questions

The common view of transparency and AI safety articulated above suggests an opportunity for differential technological development. To increase the odds that future AI systems are safe and reliable, we can invest disproportionately in transparent AI methods, and also in techniques for increasing the transparency of typically opaque AI methods.

But this common view comes with some serious caveats, and some difficult open questions. For example:

  1. How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
  2. Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
  3. How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?14


My thanks to John Fox, Jacob Steinhardt, Paul Christiano, Carl Shulman, Eliezer Yudkowsky, and others for their helpful feedback.

  1. Quote from Nusser (2009). Emphasis added. The original text contains many citations which have been removed in this post for readability. Also see Schultz & Cronin (2003), which makes this point by graphing four AI methods along two axes: robustness and transparency. Their graph is available here. In their terminology, a method is “robust” to the degree that it is flexible and useful on a wide variety of problems and data sets. On the graph, GA means “genetic algorithms,” NN means “neural networks,” PCA means “principal components analysis,” PLS means “partial least squares,” and MLR means “multiple linear regression.” In this sample of AI methods, the trend is clear: the most robust methods tend to be the least transparent. Schultz & Cronin graphed only a tiny sample of AI methods, but the trend holds more broadly. 
  2. I will share some additional quotes on the importance of transparency in intelligent systems. Kröske et al. (2009) write that, to trust a machine intelligence, “human operators need to be able to understand [its] reasoning process and the factors that precipitate certain actions.” Similarly, Fox (1993) writes: “Many branches of engineering have moved beyond purely empirical testing [for safety]… because they have established strong design theories… The consequence is that designers can confidently predict failure modes, performance boundary conditions and so forth before the systems are implemented… A promising approach to [getting these benefits in AI] may be to use well-defined specification languages and verification procedures. Van Harmelen & Balder (1992) [list some] advantages of using formal languages… [including] the removal of ambiguity… [and] the ability to derive properties of the design in the absence of an implementation.” In their preface, Fox & Das (2000) write: “Our first obligation is to try to ensure that the designs of our systems are sound. We need to ask not only ‘do they work?’ but also ‘do they work for good reasons?’ Unfortunately, conventional software design is frequently ad hoc, and AI software design is little better and possibly worse… Consequently, we place great emphasis on clear design principles, strong mathematical foundations for these principles, and effective development tools that support and verify the integrity of the system… We are creating a powerful technology [AI], possibly more quickly than we think, that has unprecedented potential to create havoc as well as benefit. We urge the community to embark on a vigorous discussion of the issues and the creation of an explicit ‘safety culture’ in the field.” 
  3. Emphasis added. The first paragraph is from Wikipedia’s black box page; the second paragraph is from its white box page. The term “grey box” is sometimes used to refer to methods that are intermediate in transparency between “fully black box” and “fully transparent” methods: see e.g. Sohlberg (2003)
  4. Thus, if we could build a whole brain emulation today, it would also be mostly a black box system, even though all its bits of information would be stored in a computer and accessible to database search tools and so on. But we’ll probably make lots of progress in cognitive neuroscience before WBE is actually built, and a also working WBE would probably enable quick advances in cognitive neuroscience, and therefore the human brain would rapidly become more transparent to us. 
  5. For more discussion of how machine learning can be used for relatively “transparent” ends, for example to learn the structure of a Bayesian network, see Murphy (2012), ch. 26. 
  6. Li & Peng (2006) make the same point: “conventional neural networks… lack transparency, as their activation functions (AFs) and their associated neural parameters bear very little physical meaning.” See also Woodman et al. (2012)‘s comments on this issue in the context of personal robotics: “Among the requirements of autonomous robots… is a certain degree of robustness. This means being able to handle errors and to continue operation during abnormal conditions… in a dynamic environment, the robot will frequently find itself in a wide range of previously unseen situations. To date, the majority of research in this area has addressed this issue by using learning algorithms, often implemented as artificial neural networks (ANNs)… However, as Nehmzow et al. (2004) identify, these implementations, although seemingly effective, are difficult to analyse due to the inherent opacity of connection-based algorithms. This means that it is difficult to produce an intelligible model of the system structure that could be used in safety analysis.” 
  7. Murphy (2012), p. 995, writes that “when we look at the brain, we seem many levels of processing. It is believed that each level is learning features or representations at increasing levels of abstraction. For example, the standard model of the visual cortex… suggests that (roughly speaking) the brain first extracts edges, then patches, then surfaces, then objects, etc… This observation has inspired a recent trend in machine learning known as deep learning… which attempts to replicate this kind of architecture in a computer.” 
  8. It is generally accepted that Bayesian networks are more transparent than ANNs, but this is only true up to a point. A Bayesian network with hundreds of nodes that are not associated with human-intuitive concepts is not necessarily any more transparent than a large ANN. 
  9. For an overview of this work, see Nusser (2009), section 2.2.3. Also see Pulina & Tacchella (2011). Finally, Ng (2011), sec. 4, notes that we can get a sense of what function an ANN has learned by asking what which inputs would maximize the activation of particular nodes. In his example, Ng uses this technique to visualize which visual features have been learned by a sparse autoencoder trained on image data. 
  10. Wooldridge (2003) concurs, writing that “Transparency is another advantage [of logical approaches].” 
  11. For recent overviews of formal methods in general, see Bozzano & Villafiorita (2010), Woodcock et al. (2009); Gogolla (2006); Bowen & Hinchey (2006). For more on the general application of safety engineering theory to AI, see Fox (1993); Yampolskiy & Fox (2013); Yampolskiy (2013)
  12. Another good point Fox makes is that normal AI safety engineering techniques rely on the design team’s ability to predict all circumstances that might hold in the future: “…one might conclude that using a basket of safety methods (hazard analysis, formal specification and verification, rigorous empirical testing, fault tolerant design) will significantly decrease the likelihood of hazards and disasters. However, there is at least one weakness common to all these methods. They rely on the design team being able to make long-range predictions about all the… circumstances that may hold when the system is in use. This is unrealistic, if only because of the countless interactions that can occur… [and] the scope for unforseeable interactions is vast.” 
  13. See also this program at Bristol University. 
  14. As an aside, I’ll briefly remark that user interface confusion has contributed to many computer-related failures in the past. For example, Neumann (1994) reports on the case of Iran Air Flight 655, which was shot down U.S. forces due (partly) to the unclear user interface of the USS Vincennes’ Aegis missile system. Changes to the interface were subsequently recommended. For other UI-related disasters, see Neumann’s extensive page on Illustrative Risks to the Public in the Use of Computer Systems and Related Technology