Security Mindset and the Logistic Success Curve

 |   |  Analysis

Follow-up to:   Security Mindset and Ordinary Paranoia


(Two days later, Amber returns with another question.)


AMBER:  Uh, say, Coral. How important is security mindset when you’re building a whole new kind of system—say, one subject to potentially adverse optimization pressures, where you want it to have some sort of robustness property?

CORAL:  How novel is the system?

AMBER:  Very novel.

CORAL:  Novel enough that you’d have to invent your own new best practices instead of looking them up?

AMBER:  Right.

CORAL:  That’s serious business. If you’re building a very simple Internet-connected system, maybe a smart ordinary paranoid could look up how we usually guard against adversaries, use as much off-the-shelf software as possible that was checked over by real security professionals, and not do too horribly. But if you’re doing something qualitatively new and complicated that has to be robust against adverse optimization, well… mostly I’d think you were operating in almost impossibly dangerous territory, and I’d advise you to figure out what to do after your first try failed. But if you wanted to actually succeed, ordinary paranoia absolutely would not do it.

AMBER:  In other words, projects to build novel mission-critical systems ought to have advisors with the full security mindset, so that the advisor can say what the system builders really need to do to ensure security.

CORAL:  (laughs sadly)  No.


Read more »

Security Mindset and Ordinary Paranoia

 |   |  Analysis

The following is a fictional dialogue building off of AI Alignment: Why It’s Hard, and Where to Start.


(AMBER, a philanthropist interested in a more reliable Internet, and CORAL, a computer security professional, are at a conference hotel together discussing what Coral insists is a difficult and important issue: the difficulty of building “secure” software.)


AMBER:  So, Coral, I understand that you believe it is very important, when creating software, to make that software be what you call “secure”.

CORAL:  Especially if it’s connected to the Internet, or if it controls money or other valuables. But yes, that’s right.

AMBER:  I find it hard to believe that this needs to be a separate topic in computer science. In general, programmers need to figure out how to make computers do what they want. The people building operating systems surely won’t want them to give access to unauthorized users, just like they won’t want those computers to crash. Why is one problem so much more difficult than the other?

CORAL:  That’s a deep question, but to give a partial deep answer: When you expose a device to the Internet, you’re potentially exposing it to intelligent adversaries who can find special, weird interactions with the system that make the pieces behave in weird ways that the programmers did not think of. When you’re dealing with that kind of problem, you’ll use a different set of methods and tools.

AMBER:  Any system that crashes is behaving in a way the programmer didn’t expect, and programmers already need to stop that from happening. How is this case different?

CORAL:  Okay, so… imagine that your system is going to take in one kilobyte of input per session. (Although that itself is the sort of assumption we’d question and ask what happens if it gets a megabyte of input instead—but never mind.) If the input is one kilobyte, then there are 28,000 possible inputs, or about 102,400 or so. Again, for the sake of extending the simple visualization, imagine that a computer gets a billion inputs per second. Suppose that only a googol, 10100, out of the 102,400 possible inputs, cause the system to behave a certain way the original designer didn’t intend.

If the system is getting inputs in a way that’s uncorrelated with whether the input is a misbehaving one, it won’t hit on a misbehaving state before the end of the universe. If there’s an intelligent adversary who understands the system, on the other hand, they may be able to find one of the very rare inputs that makes the system misbehave. So a piece of the system that would literally never in a million years misbehave on random inputs, may break when an intelligent adversary tries deliberately to break it.

AMBER:  So you’re saying that it’s more difficult because the programmer is pitting their wits against an adversary who may be more intelligent than themselves.

CORAL:  That’s an almost-right way of putting it. What matters isn’t so much the “adversary” part as the optimization part. There are systematic, nonrandom forces strongly selecting for particular outcomes, causing pieces of the system to go down weird execution paths and occupy unexpected states. If your system literally has no misbehavior modes at all, it doesn’t matter if you have IQ 140 and the enemy has IQ 160—it’s not an arm-wrestling contest. It’s just very much harder to build a system that doesn’t enter weird states when the weird states are being selected-for in a correlated way, rather than happening only by accident. The weirdness-selecting forces can search through parts of the larger state space that you yourself failed to imagine. Beating that does indeed require new skills and a different mode of thinking, what Bruce Schneier called “security mindset”.

AMBER:  Ah, and what is this security mindset?

CORAL:  I can say one or two things about it, but keep in mind we are dealing with a quality of thinking that is not entirely effable. If I could give you a handful of platitudes about security mindset, and that would actually cause you to be able to design secure software, the Internet would look very different from how it presently does. That said, it seems to me that what has been called “security mindset” can be divided into two components, one of which is much less difficult than the other. And this can fool people into overestimating their own safety, because they can get the easier half of security mindset and overlook the other half. The less difficult component, I will call by the term “ordinary paranoia”.

AMBER:  Ordinary paranoia?

CORAL:  Lots of programmers have the ability to imagine adversaries trying to threaten them. They imagine how likely it is that the adversaries are able to attack them a particular way, and then they try to block off the adversaries from threatening that way. Imagining attacks, including weird or clever attacks, and parrying them with measures you imagine will stop the attack; that is ordinary paranoia.

AMBER:  Isn’t that what security is all about? What do you claim is the other half?

CORAL:  To put it as a platitude, I might say… defending against mistakes in your own assumptions rather than against external adversaries.
Read more »

Announcing “Inadequate Equilibria”

 |   |  News

MIRI Senior Research Fellow Eliezer Yudkowsky has a new book out today: Inadequate Equilibria: Where and How Civilizations Get Stuck, a discussion of societal dysfunction, exploitability, and self-evaluation. From the preface:

Inadequate Equilibria is a book about a generalized notion of efficient markets, and how we can use this notion to guess where society will or won’t be effective at pursuing some widely desired goal.An efficient market is one where smart individuals should generally doubt that they can spot overpriced or underpriced assets. We can ask an analogous question, however, about the “efficiency” of other human endeavors.

Suppose, for example, that someone thinks they can easily build a much better and more profitable social network than Facebook, or easily come up with a new treatment for a widespread medical condition. Should they question whatever clever reasoning led them to that conclusion, in the same way that most smart individuals should question any clever reasoning that causes them to think AAPL stock is underpriced? Should they question whether they can “beat the market” in these areas, or whether they can even spot major in-principle improvements to the status quo? How “efficient,” or adequate, should we expect civilization to be at various tasks?

There will be, as always, good ways and bad ways to reason about these questions; this book is about both.

The book is available from Amazon (in print and Kindle), on iBooks, as a pay-what-you-want digital download, and as a web book at The book has also been posted to Less Wrong 2.0.

The book’s contents are:

1.  Inadequacy and Modesty

A comparison of two “wildly different, nearly cognitively nonoverlapping” approaches to thinking about outperformance: modest epistemology, and inadequacy analysis.

2.  An Equilibrium of No Free Energy

How, in principle, can society end up neglecting obvious low-hanging fruit?

3.  Moloch’s Toolbox

Why does our civilization actually end up neglecting low-hanging fruit?

4.  Living in an Inadequate World

How can we best take into account civilizational inadequacy in our decision-making?

5.  Blind Empiricism

Three examples of modesty in practical settings.

6.  Against Modest Epistemology

An argument against the “epistemological core” of modesty: that we shouldn’t take our own reasoning and meta-reasoning at face value in cases in the face of disagreements or novelties.

7.  Status Regulation and Anxious Underconfidence

On causal accounts of modesty.

Although Inadequate Equilibria isn’t about AI, I consider it one of MIRI’s most important nontechnical publications to date, as it helps explain some of the most basic tools and background models we use when we evaluate how promising a potential project, research program, or general strategy is.

A major grant from the Open Philanthropy Project

 |   |  News

I’m thrilled to announce that the Open Philanthropy Project has awarded MIRI a three-year $3.75 million general support grant ($1.25 million per year). This grant is, by far, the largest contribution MIRI has received to date, and will have a major effect on our plans going forward.

This grant follows a $500,000 grant we received from the Open Philanthropy Project in 2016. The Open Philanthropy Project’s announcement for the new grant notes that they are “now aiming to support about half of MIRI’s annual budget”.1 The annual $1.25 million represents 50% of a conservative estimate we provided to the Open Philanthropy Project of the amount of funds we expect to be able to usefully spend in 2018–2020.

This expansion in support was also conditional on our ability to raise the other 50% from other supporters. For that reason, I sincerely thank all of the past and current supporters who have helped us get to this point.

The Open Philanthropy Project has expressed openness to potentially increasing their support if MIRI is in a position to usefully spend more than our conservative estimate, if they believe that this increase in spending is sufficiently high-value, and if we are able to secure additional outside support to ensure that the Open Philanthropy Project isn’t providing more than half of our total funding.

We’ll be going into more details on our future organizational plans in a follow-up post December 1, where we’ll also discuss our end-of-the-year fundraising goals.

In their write-up, the Open Philanthropy Project notes that they have updated favorably about our technical output since 2016, following our logical induction paper:

We received a very positive review of MIRI’s work on “logical induction” by a machine learning researcher who (i) is interested in AI safety, (ii) is rated as an outstanding researcher by at least one of our close advisors, and (iii) is generally regarded as outstanding by the ML community. As mentioned above, we previously had difficulty evaluating the technical quality of MIRI’s research, and we previously could find no one meeting criteria (i) – (iii) to a comparable extent who was comparably excited about MIRI’s technical research. While we would not generally offer a comparable grant to any lab on the basis of this consideration alone, we consider this a significant update in the context of the original case for the [2016] grant (especially MIRI’s thoughtfulness on this set of issues, value alignment with us, distinctive perspectives, and history of work in this area). While the balance of our technical advisors’ opinions and arguments still leaves us skeptical of the value of MIRI’s research, the case for the statement “MIRI’s research has a nontrivial chance of turning out to be extremely valuable (when taking into account how different it is from other research on AI safety)” appears much more robust than it did before we received this review.

The announcement also states, “In the time since our initial grant to MIRI, we have made several more grants within this focus area, and are therefore less concerned that a larger grant will signal an outsized endorsement of MIRI’s approach.”

We’re enormously grateful for the Open Philanthropy Project’s support, and for their deep engagement with the AI safety field as a whole. To learn more about our discussions with the Open Philanthropy Project and their active work in this space, see the group’s previous AI safety grants, our conversation with Daniel Dewey on the Effective Altruism Forum, and the research problems outlined in the Open Philanthropy Project’s recent AI fellows program description.

  1. The Open Philanthropy Project usually prefers not to provide more than half of an organization’s funding, to facilitate funder coordination and ensure that organizations it supports maintain their independence. From a March blog post: “We typically avoid situations in which we provide >50% of an organization’s funding, so as to avoid creating a situation in which an organization’s total funding is ‘fragile’ as a result of being overly dependent on us.” []

November 2017 Newsletter

 |   |  Newsletters

Eliezer Yudkowsky has written a new book on civilizational dysfunction and outperformance: Inadequate Equilibria: Where and How Civilizations Get Stuck. The full book will be available in print and electronic formats November 16. To preorder the ebook or sign up for updates, visit

We’re posting the full contents online in stages over the next two weeks. The first two chapters are:

  1. Inadequacy and Modesty (discussion: LessWrong, EA Forum, Hacker News)
  2. An Equilibrium of No Free Energy (discussion: LessWrong, EA Forum)


Research updates

General updates

News and links

New paper: “Functional Decision Theory”

 |   |  Papers

Functional Decision Theory

MIRI senior researcher Eliezer Yudkowsky and executive director Nate Soares have a new introductory paper out on decision theory: “Functional decision theory: A new theory of instrumental rationality.”


This paper describes and motivates a new decision theory known as functional decision theory (FDT), as distinct from causal decision theory and evidential decision theory.

Functional decision theorists hold that the normative principle for action is to treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?” Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decision-theoretic and game-theoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem.

In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decision-making.

Our previous introductory paper on FDT, “Cheating Death in Damascus,” focused on comparing FDT’s performance to that of CDT and EDT in fairly high-level terms. Yudkowsky and Soares’ new paper puts a much larger focus on FDT’s mechanics and motivations, making “Functional Decision Theory” the most complete stand-alone introduction to the theory.1

Read more »

  1. “Functional Decision Theory” was originally drafted prior to “Cheating Death in Damascus,” and was significantly longer before we received various rounds of feedback from the philosophical community. “Cheating Death in Damascus” was produced from material that was cut from early drafts; other cut material included a discussion of proof-based decision theory, and some Death in Damascus variants left on the cutting room floor for being needlessly cruel to CDT. []

AlphaGo Zero and the Foom Debate

 |   |  Analysis

AlphaGo Zero uses 4 TPUs, is built entirely out of neural nets with no handcrafted features, doesn’t pretrain against expert games or anything else human, reaches a superhuman level after 3 days of self-play, and is the strongest version of AlphaGo yet.

The architecture has been simplified. Previous AlphaGo had a policy net that predicted good plays, and a value net that evaluated positions, both feeding into lookahead using MCTS (random probability-weighted plays out to the end of a game). AlphaGo Zero has one neural net that selects moves and this net is trained by Paul-Christiano-style capability amplification, playing out games against itself to learn new probabilities for winning moves.

As others have also remarked, this seems to me to be an element of evidence that favors the Yudkowskian position over the Hansonian position in my and Robin Hanson’s AI-foom debate.

As I recall and as I understood:

  • Hanson doubted that what he calls “architecture” is much of a big deal, compared to (Hanson said) elements like cumulative domain knowledge, or special-purpose components built by specialized companies in what he expects to be an ecology of companies serving an AI economy.
  • When I remarked upon how it sure looked to me like humans had an architectural improvement over chimpanzees that counted for a lot, Hanson replied that this seemed to him like a one-time gain from allowing the cultural accumulation of knowledge.

I emphasize how all the mighty human edifice of Go knowledge, the joseki and tactics developed over centuries of play, the experts teaching children from an early age, was entirely discarded by AlphaGo Zero with a subsequent performance improvement. These mighty edifices of human knowledge, as I understand the Hansonian thesis, are supposed to be the bulwark against rapid gains in AI capability across multiple domains at once. I said, “Human intelligence is crap and our accumulated skills are crap,” and this appears to have been borne out.

Similarly, single research labs like DeepMind are not supposed to pull far ahead of the general ecology, because adapting AI to any particular domain is supposed to require lots of components developed all over the place by a market ecology that makes those components available to other companies. AlphaGo Zero is much simpler than that. To the extent that nobody else can run out and build AlphaGo Zero, it’s either because Google has Tensor Processing Units that aren’t generally available, or because DeepMind has a silo of expertise for being able to actually make use of existing ideas like ResNets, or both.

Sheer speed of capability gain should also be highlighted here. Most of my argument for FOOM in the Yudkowsky-Hanson debate was about self-improvement and what happens when an optimization loop is folded in on itself. Though it wasn’t necessary to my argument, the fact that Go play went from “nobody has come close to winning against a professional” to “so strongly superhuman they’re not really bothering any more” over two years just because that’s what happens when you improve and simplify the architecture, says you don’t even need self-improvement to get things that look like FOOM.

Yes, Go is a closed system allowing for self-play. It still took humans centuries to learn how to play it. Perhaps the new Hansonian bulwark against rapid capability gain can be that the environment has lots of empirical bits that are supposed to be very hard to learn, even in the limit of AI thoughts fast enough to blow past centuries of human-style learning in 3 days; and that humans have learned these vital bits over centuries of cultural accumulation of knowledge, even though we know that humans take centuries to do 3 days of AI learning when humans have all the empirical bits they need; and that AIs cannot absorb this knowledge very quickly using “architecture”, even though humans learn it from each other using architecture. If so, then let’s write down this new world-wrecking assumption (that is, the world ends if the assumption is false) and be on the lookout for further evidence that this assumption might perhaps be wrong.

AlphaGo clearly isn’t a general AI. There’s obviously stuff humans do that make us much more general than AlphaGo, and AlphaGo obviously doesn’t do that. However, if even with the human special sauce we’re to expect AGI capabilities to be slow, domain-specific, and requiring feed-in from a big market ecology, then the situation we see without human-equivalent generality special sauce should not look like this.

To put it another way, I put a lot of emphasis in my debate on recursive self-improvement and the remarkable jump in generality across the change from primate intelligence to human intelligence. It doesn’t mean we can’t get info about speed of capability gains without self-improvement. It doesn’t mean we can’t get info about the importance and generality of algorithms without the general intelligence trick. The debate can start to settle for fast capability gains before we even get to what I saw as the good parts; I wouldn’t have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did on the Yudkowsky-Hanson spectrum.

(Reply from Robin Hanson.)

October 2017 Newsletter

 |   |  Newsletters

“So far as I can presently estimate, now that we’ve had AlphaGo and a couple of other maybe/maybe-not shots across the bow, and seen a huge explosion of effort invested into machine learning and an enormous flood of papers, we are probably going to occupy our present epistemic state until very near the end.

“[…I]t’s hard to guess how many further insights are needed for AGI, or how long it will take to reach those insights. After the next breakthrough, we still won’t know how many more breakthroughs are needed, leaving us in pretty much the same epistemic state as before. […] You can either act despite that, or not act. Not act until it’s too late to help much, in the best case; not act at all until after it’s essentially over, in the average case.”

Read more in a new blog post by Eliezer Yudkowsky: “There’s No Fire Alarm for Artificial General Intelligence.” (Discussion on LessWrong 2.0, Hacker News.)

Research updates

General updates

News and links