AI Risk and the Security Mindset

 

 |   |  Analysis

In 2008, security expert Bruce Schneier wrote about the security mindset:

Security requires a particular mindset. Security professionals… see the world differently. They can’t walk into a store without noticing how they might shoplift. They can’t use a computer without wondering about the security vulnerabilities. They can’t vote without trying to figure out how to vote twice…

SmartWater is a liquid with a unique identifier linked to a particular owner. “The idea is for me to paint this stuff on my valuables as proof of ownership,” I wrote when I first learned about the idea. “I think a better idea would be for me to paint it on your valuables, and then call the police.”

…This kind of thinking is not natural for most people. It’s not natural for engineers. Good engineering involves thinking about how things can be made to work; the security mindset involves thinking about how things can be made to fail. It involves thinking like an attacker, an adversary or a criminal. You don’t have to exploit the vulnerabilities you find, but if you don’t see the world that way, you’ll never notice most security problems.

with folded handsA recurring problem in much of the literature on “machine ethics” or “AGI ethics” or “AGI safety” is that researchers and commenters often appear to be asking the question “How will this solution work?” rather than “How will this solution fail?”

Here’s an example of the security mindset at work when thinking about AI risk. When presented with the suggestion that an AI would be safe if it “merely” (1) was very good at prediction and (2) gave humans text-only answers that it predicted would result in each stated goal being achieved, Viliam Bur pointed out a possible failure mode (which was later simplified):

Example question: “How should I get rid of my disease most cheaply?” Example answer: “You won’t. You will die soon, unavoidably. This report is 99.999% reliable”. Predicted human reaction: Decides to kill self and get it over with. Success rate: 100%, the disease is gone. Costs of cure: zero. Mission completed.

This security mindset is one of the traits we look for in researchers we might hire or collaborate with. Such researchers show a tendency to ask “How will this fail?” and “Why might this formalism not quite capture what we really care about?” and “Can I find a way to break this result?”

That said, there’s no sense in being infinitely skeptical of results that may help with AI security, safety, reliability, or “friendliness.” As always, we must think with probabilities.

Also see:

  • roystgnr

    To be fair to engineers, trying to predict failure modes is a big deal there too. The most obvious example is that engineers don’t use the layman redefinition of “failsafe”, “it won’t fail”, because that’s impossible; the engineering definition of “failsafe” is “when it does fail, the failure mode won’t cause harm”.

    There definitely is a different attitude between “there is always failure risk because imperfectly intelligent engineers make mistakes and unexpected forces of nature exceed predictions” and “there is always failure risk because unexpected intelligent enemies exploit mistakes and use excessive force”, though.

    • M_1

      “Failure mode” is exactly the right answer here. We don’t know how AI works yet, and you can’t really ask many concretely useful questions about how something fails without having some idea about how it works. “How does it fail” isn’t the opposite of “how does it work,” it’s an undesirable subset of how something works.

  • jring281

    More than failures must be considered. The better question is “What is the envelope or locus of the dynamic and integrity limits of this design or system?”

As featured in:     Business Insider   The Chronicle of Higher Education   Technology Review   Reason   TIME