Transparency in Safety-Critical Systems

 |   |  Analysis

In this post, I aim to summarize one common view on AI transparency and AI reliability. It’s difficult to identify the field’s “consensus” on AI transparency and reliability, so instead I will present a common view so that I can use it to introduce a number of complications and open questions that (I think) warrant further investigation.

Here’s a short version of the common view I summarize below:

Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.

The value of transparency in system design

Nusser (2009) writes:

…in the field of safety-related applications it is essential to provide transparent solutions that can be validated by domain experts. “Black box” approaches, like artificial neural networks, are regarded with suspicion – even if they show a very high accuracy on the available data – because it is not feasible to prove that they will show a good performance on all possible input combinations.

Unfortunately, there is often a tension between AI capability and AI transparency. Many of AI’s most powerful methods are also among its least transparent:

Methods that are known to achieve a high predictive performance — e.g. support vector machines (SVMs) or artificial neural networks (ANNs) — are usually hard to interpret. On the other hand, methods that are known to be well-interpretable — for example (fuzzy) rule systems, decision trees, or linear models — are usually limited with respect to their predictive performance.1

But for safety-critical systems — and especially for AGI — it is important to prioritize system reliability over capability. Again, here is Nusser (2009):

strict requirements [for system transparency] are necessary because a safety-related system is a system whose malfunction or failure can lead to serious consequences — for example environmental harm, loss or severe damage of equipment, harm or serious injury of people, or even death. Often, it is impossible to rectify a wrong decision within this domain.

Read more »


  1. Quote from Nusser (2009). Emphasis added. The original text contains many citations which have been removed in this post for readability. Also see Schultz & Cronin (2003), which makes this point by graphing four AI methods along two axes: robustness and transparency. Their graph is available here. In their terminology, a method is “robust” to the degree that it is flexible and useful on a wide variety of problems and data sets. On the graph, GA means “genetic algorithms,” NN means “neural networks,” PCA means “principal components analysis,” PLS means “partial least squares,” and MLR means “multiple linear regression.” In this sample of AI methods, the trend is clear: the most robust methods tend to be the least transparent. Schultz & Cronin graphed only a tiny sample of AI methods, but the trend holds more broadly. 

Holden Karnofsky on Transparent Research Analyses

 |   |  Conversations

Holden Karnofsky is the co-founder of GiveWell, which finds outstanding giving opportunities and publishes the full details of its analysis to help donors decide where to give. GiveWell tracked ~$9.6 million in donations made on the basis of its recommendations in 2012. It has historically sought proven, cost-effective, scalable giving opportunities, but its new initiative, GiveWell Labs, is more broadly researching the question of how to give as well as possible.

Luke Muehlhauser: GiveWell has gained respect for its high-quality analyses of some difficult-to-quantify phenomena: the impacts of particular philanthropic interventions. You’ve written about your methods for facing this challenge in several blog posts, for example (1) Futility of standardized metrics: an example, (2) In defense of the streetlight effect, (3) Why we can’t take expected value estimates literally, (4) What it takes to evaluate impact, (5) Some considerations against more investment in cost-effectiveness estimates, (6) Maximizing cost-effectiveness via critical inquiry, (7) Some history behind our shifting approach to research, (8) Our principles for assessing research, (9) Surveying the research on a topic, (10) How we evaluate a study, and (11) Passive vs. rational vs. quantified.

In my first question I’d like to ask about one particular thing you’ve done to solve one particular problem with analyses of difficult-to-quantify phenomena. The problem I have in mind is that it’s often difficult for readers to know how much they should trust a given analysis of a difficult-to-quantify phenomenon. In mathematics research it’s often pretty straightforward for other mathematicians to tell what’s good and what’s not. But what about analyses that combine intuitions, expert opinion, multiple somewhat-conflicting scientific studies, general research in a variety of “soft” sciences, and so on? In such cases it can be difficult for readers to distinguish high-quality and low-quality analyses, and it can be hard for readers to tell whether the analysis is biased in particular ways.

Read more »

2013 Summer Matching Challenge Completed!

 |   |  News

Thanks to the generosity of dozens of donors, on August 15th we successfully completed the largest fundraiser in MIRI’s history. All told, we raised $400,000, which will fund our research going forward.

This fundraiser came “right down to the wire.” At 8:45pm Pacific time, with only a few hours left before the deadline, we announced on our Facebook page that we had only $555 more to raise to meet our goal. At 8:53pm, Benjamin Hoffman donated exactly $555, finishing the drive.

Our deepest thanks to all our supporters!

Luke at Quixey on Tuesday (Aug. 20th)

 |   |  News

EA & EotW

This coming Tuesday, MIRI’s Executive Director Luke Muehlhauser will give a talk at Quixey titled Effective Altruism and the End of the World. If you’re in or near the South Bay, you should come! Snacks will be provided.

Time: Tuesday, August 20th. Doors open at 7:30pm. Talk starts at 8pm. Q&A starts at 8:30pm.

Place: Quixey Headquarters, 278 Castro St., Mountain View, CA. (Google Maps)

Entrance: You cannot enter Quixey from Castro St. Instead, please enter through the back door, from the parking lot at the corner of Dana & Bryant.

August Newsletter: New Research and Expert Interviews

 |   |  Newsletters



Greetings from the Executive Director

Dear friends,

My personal thanks to everyone who has contributed to our ongoing fundraiser. We are 74% of the way to our goal!

I’ve been glad to hear from many of you that you’re thrilled with the progress we’ve made in the past two years — progress both as an organization and as a research institute. I’m thrilled, too! And to see a snapshot of where MIRI is headed, take a look at the participant lineup for our upcoming December workshop. Some top-notch folks there, including John Baez.

We’re also preparing for the anticipated media interest in James Barrat’s forthcoming book, Our Final Invention: Artificial Intelligence and the End of the Human Era. The book reads like a detective novel, and discusses our research extensively. Our Final Invention will be released on October 1st by a division of St. Martin’s Press, one of the largest publishers in the world.

If you’re happy with the direction we’re headed in, and you haven’t contributed to our fundraiser yet, please donate now to show your support. Even small donations can make a difference. This newsletter is ~9,860 subscribers strong, and ~200 of you have contributed during the current fundraiser. If just 21% of the other 9,660 subscribers give $25 as soon as they finish reading this sentence, then we’ll meet our goal will those funds alone!

Thank you,

Luke Muehlhauser

Executive Director

Read more »

What is AGI?

 |   |  Analysis

android looking upOne of the most common objections we hear when talking about artificial general intelligence (AGI) is that “AGI is ill-defined, so you can’t really say much about it.”

In an earlier post, I pointed out that we often don’t have precise definitions for things while doing useful work on them, as was the case with the concepts of “number” and “self-driving car.”

Still, we must have some idea of what we’re talking about. Earlier I gave a rough working definition for “intelligence.” In this post, I explain the concept of AGI and also provide several possible operational definitions for the idea.

The idea of AGI

As discussed earlier, the concept of “general intelligence” refers to the capacity for efficient cross-domain optimization. Or as Ben Goertzel likes to say, “the ability to achieve complex goals in complex environments using limited computational resources.” Another idea often associated with general intelligence is the ability to transfer learning from one domain to other domains.

To illustrate this idea, let’s consider something that would not count as a general intelligence.

Computers show vastly superhuman performance at some tasks, roughly human-level performance at other tasks, and subhuman performance at still other tasks. If a team of researchers was able to combine many of the top-performing “narrow AI” algorithms into one system, as Google may be trying to do,1 they’d have a massive “Kludge AI” that was terrible at most tasks, mediocre at some tasks, and superhuman at a few tasks.

Like the Kludge AI, particular humans are terrible or mediocre at most tasks, and far better than average at just a few tasks.2 Another similarity is that the Kludge AI would probably show measured correlations between many different narrow cognitive abilities, just as humans do (hence the concepts of g and IQ3): if we gave the Kludge AI lots more hardware, it could use that hardware to improve its performance in many different narrow domains simultaneously.4

On the other hand, the Kludge AI would not (yet) have general intelligence, because it wouldn’t necessarily have the capacity to solve somewhat-arbitrary problems in somewhat-arbitrary environments, wouldn’t necessarily be able to transfer learning in one domain to another, and so on.

Read more »


  1. In an interview with The Register, Google head of research Alfred Spector said, “We have the knowledge graph, [the] ability to parse natural language, neural network tech [and] enormous opportunities to gain feedback from users… If we combine all these things together with humans in the loop continually providing feedback our systems become … intelligent.” Spector calls this the “combination hypothesis.” 
  2. Though, there are probably many disadvantaged humans for which this is not true, because they do not show far-above-average performance on any tasks. 
  3. Psychologists now generally agree that there is a general intelligence factor in addition to more specific mental abilities. For an introduction to the modern synthesis, see Gottfredson (2011). For more detail, see the first few chapters of Sternberg & Kaufman (2011). If you’ve read Cosma Shalizi’s popular article “g, a Statistical Myth, please also read its refutation here and here
  4. In psychology, the factor analysis is done between humans. Here, I’m suggesting that a similar factor analysis could hypothetically be done between different Kludge AIs, with different Kludge AIs running basically the same software but having access to different amounts of computation. The analogy should not be taken too far, however. For example, it isn’t the case that higher-IQ humans have much larger brains than other humans. 

Benja Fallenstein on the Löbian Obstacle to Self-Modifying Systems

 |   |  Conversations

Benja Fallenstein researches mathematical models of human and animal behavior at Bristol University, as part of the MAD research group and the decision-making research group.

Before that, she graduated from University of Vienna with a BSc in Mathematics. In her spare time, Benja studies questions relevant to AI impacts and Friendly AI, including: AI forecasting, intelligence explosion microeconomics, reflection in logic, and decision algorithms.

Benja has attended two of MIRI’s research workshops, and is scheduled to attend another in December.

Luke Muehlhauser: Since you’ve attended two MIRI research workshops on “Friendly AI math,” I’m hoping you can explain to our audience what that work is all about. To provide a concrete example, I’d like to talk about the Löbian obstacle to self-modifying artificial intelligence, which is one of the topics that MIRI’s recent workshops have focused on. To start with, could you explain to our readers what this problem is and why you think it is important?

Read more »

“Algorithmic Progress in Six Domains” Released

 |   |  Papers

algorithmic progressToday we released a new technical report by visiting researcher Katja Grace called “Algorithmic Progress in Six Domains.” The report summarizes data on algorithmic progress – that is, better performance per fixed amount of computing hardware – in six domains:

  • SAT solvers,
  • Chess and Go programs,
  • Physics simulations,
  • Factoring,
  • Mixed integer programming, and
  • Some forms of machine learning.

Our purpose for collecting these data was to shed light on the question of intelligence explosion microeconomics, though we suspect the report will be of broad interest within the software industry and computer science academia.

One finding from the report was previously discussed by Robin Hanson here. (Robin saw an early draft on the intelligence explosion microeconomics mailing list.)

The preferred page for discussing the report in general is here.

Summary:

In recent boolean satisfiability (SAT) competitions, SAT solver performance has increased 5–15% per year, depending on the type of problem. However, these gains have been driven by widely varying improvements on particular problems. Retrospective surveys of SAT performance (on problems chosen after the fact) display significantly faster progress.

Chess programs have improved by around 50 Elo points per year over the last four decades. Estimates for the significance of hardware improvements are very noisy, but are consistent with hardware improvements being responsible for approximately half of progress. Progress has been smooth on the scale of years since the 1960s, except for the past five. Go programs have improved about one stone per year for the last three decades. Hardware doublings produce diminishing Elo gains, on a scale consistent with accounting for around half of progress.

Improvements in a variety of physics simulations (selected after the fact to exhibit performance increases due to software) appear to be roughly half due to hardware progress.

The largest number factored to date has grown by about 5.5 digits per year for the last two decades; computing power increased 10,000-fold over this period, and it is unclear how much of the increase is due to hardware progress.

Some mixed integer programming (MIP) algorithms, run on modern MIP instances with modern hardware, have roughly doubled in speed each year. MIP is an important optimization problem, but one which has been called to attention after the fact due to performance improvements. Other optimization problems have had more inconsistent (and harder to determine) improvements.

Various forms of machine learning have had steeply diminishing progress in percentage accuracy over recent decades. Some vision tasks have recently seen faster progress.