A new MIRI research program with a machine learning focus

 |   |  MIRI Strategy

I’m happy to announce that MIRI is beginning work on a new research agenda, “value alignment for advanced machine learning systems.” Half of MIRI’s team — Patrick LaVictoire, Andrew Critch, and I — will be spending the bulk of our time on this project over at least the next year. The rest of our time will be spent on our pre-existing research agenda.

MIRI’s research in general can be viewed as a response to Stuart Russell’s question for artificial intelligence researchers: “What if we succeed?” There appear to be a number of theoretical prerequisites for designing advanced AI systems that are robust and reliable, and our research aims to develop them early.

Our general research agenda is agnostic about when AI systems are likely to match and exceed humans in general reasoning ability, and about whether or not such systems will resemble present-day machine learning (ML) systems. Recent years’ impressive progress in deep learning suggests that relatively simple neural-network-inspired approaches can be very powerful and general. For that reason, we are making an initial inquiry into a more specific subquestion: “What if techniques similar in character to present-day work in ML succeed in creating AGI?”.

Much of this work will be aimed at improving our high-level theoretical understanding of task-directed AI. Unlike what Nick Bostrom calls “sovereign AI,” which attempts to optimize the world in long-term and large-scale ways, task AI is limited to performing instructed tasks of limited scope, satisficing but not maximizing. Our hope is that investigating task AI from an ML perspective will help give information about both the feasibility of task AI and the tractability of early safety work on advanced supervised, unsupervised, and reinforcement learning systems.

To this end, we will begin by investigating eight relevant technical problems:

Read more »

New papers dividing logical uncertainty into two subproblems

 |   |  Papers

I’m happy to announce two new technical results related to the problem of logical uncertainty, perhaps our most significant results from the past year. In brief, these results split the problem of logical uncertainty into two distinct subproblems, each of which we can now solve in isolation. The remaining problem, in light of these results, is to find a unified set of methods that solve both at once.

The solutions for each subproblem are available in two new papers, based on work spearheaded by Scott Garrabrant: “Inductive coherence1 and “Asymptotic convergence in online learning with unbounded delays.”2

To give some background on the problem: Modern probability theory models reasoners’ empirical uncertainty, their uncertainty about the state of a physical environment, e.g., “What’s behind this door?” However, it can’t represent reasoners’ logical uncertainty, their uncertainty about statements like “this Turing machine halts” or “the twin prime conjecture has a proof that is less than a gigabyte long.”3

Roughly speaking, if you give a classical probability distribution variables for statements that could be deduced in principle, then the axioms of probability theory force you to put probability either 0 or 1 on those statements, because you’re not allowed to assign positive probability to contradictions. In other words, modern probability theory assumes that all reasoners know all the consequences of all the things they know, even if deducing those consequences is intractable.

We want a generalization of probability theory that allows us to model reasoners that have uncertainty about statements that they have not yet evaluated. Furthermore, we want to understand how to assign “reasonable” probabilities to claims that are too expensive to evaluate.

Imagine an agent considering whether to use quicksort or mergesort to sort a particular dataset. They might know that quicksort typically runs faster than mergesort, but that doesn’t necessarily apply to the current dataset. They could in principle figure out which one uses fewer resources on this dataset, by running both of them and comparing, but that would defeat the purpose. Intuitively, they have a fair bit of knowledge that bears on the claim “quicksort runs faster than mergesort on this dataset,” but modern probability theory can’t tell us which information they should use and how.4

What does it mean for a reasoner to assign “reasonable probabilities” to claims that they haven’t computed, but could compute in principle? Without probability theory to guide us, we’re reduced to using intuition to identify properties that seem desirable, and then investigating which ones are possible. Intuitively, there are at least two properties we would want logically non-omniscient reasoners to exhibit:

1. They should be able to notice patterns in what is provable about claims, even before they can prove or disprove the claims themselves. For example, consider the claims “this Turing machine outputs an odd number” and “this Turing machine outputs an even number.” A good reasoner thinking about those claims should eventually recognize that they are mutually exclusive, and assign them probabilities that sum to at most 1, even before they can run the relevant Turing machine.

2. They should be able to notice patterns in sentence classes that are true with a certain frequency. For example, they should assign roughly 10% probability to “the 10100th digit of pi is a 7” in lieu of any information about the digit, after observing (but not proving) that digits of pi tend to be uniformly distributed.

MIRI’s work on logical uncertainty this past year can be very briefly summed up as “we figured out how to get these two properties individually, but found that it is difficult to get both at once.” Read more »

  1. This work was originally titled “Uniform coherence”. This post has been updated to reflect the new terminology. 
  2. Garrabrant’s IAFF forum posts provide a record of how these results were originally developed, as a response to Ray Solomonoff’s theory of algorithmic probability. Concrete Failure of the Solomonoff Approach and The Entangled Benford Test lay groundwork for the “Asymptotic convergence…” problem, a limited early version of which was featured in the “Asymptotic logical uncertainty and the Benford test” report. Inductive coherence is defined in Uniform Coherence 2, and an example of an inductively coherent predictor is identified in The Modified Demski Prior is Uniformly Coherent
  3. This type of uncertainty is called “logical uncertainty” mainly for historical reasons. I think of it like this: We care about agents’ ability to reason about software systems, e.g., “this program will halt.” Those claims can be expressed in sentences of logic. The question “what probability does the agent assign to this machine halting?” then becomes “what probability does this agent assign to this particular logical sentence?” The truth of these statements could be determined in principle, but the agent may not have the resources to compute the answers in practice. 
  4. For more background on logical uncertainty, see Gaifman’s “Concerning measures in first-order calculi,” Garber’s “Old evidence and logical omniscience in Bayesian confirmation theory,” Hutter, Lloyd, Ng, and Uther’s “Probabilities on sentences in an expressive logic,” and Aaronson’s “Why philosophers should care about computational complexity.” 

April 2016 Newsletter

 |   |  Newsletters

Research updates

General updates

News and links

New paper on bounded Löb and robust cooperation of bounded agents

 |   |  Papers

Robust CooperationMIRI Research Fellow Andrew Critch has written a new paper on cooperation between software agents in the Prisoner’s Dilemma, available on arXiv: “Parametric bounded Löb’s theorem and robust cooperation of bounded agents.” The abstract reads:

Löb’s theorem and Gödel’s theorem make predictions about the behavior of systems capable of self-reference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of self-reference will have limited memory and processing speed, so in this paper we introduce an effective version of Löb’s theorem which is applicable given such bounded resources. These results have powerful implications for the game theory of bounded agents who are able to write proofs about themselves and one another, including the capacity to out-perform classical Nash equilibria and correlated equilibria, attaining mutually cooperative program equilibrium in the Prisoner’s Dilemma. Previous cooperative program equilibria studied by Tennenholtz and Fortnow have depended on tests for program equality, a fragile condition, whereas “Löbian” cooperation is much more robust and agnostic of the opponent’s implementation.

Tennenholtz (2004) showed that cooperative equilibria exist in the Prisoner’s Dilemma between agents with transparent source code. This suggested that a number of results in classical game theory, where it is a commonplace that mutual defection is rational, might fail to generalize to settings where agents have strong guarantees about each other’s conditional behavior.

Tennenholtz’s version of program equilibrium, however, only established that rational cooperation was possible between agents with identical source code. Patrick LaVictoire and other researchers at MIRI supplied the additional result that more robust cooperation was possible between non-computable agents, and that it is possible to efficiently determine the outcomes of such games. However, some readers objected to the infinitary nature of the methods (for example, the use of halting oracles) and worried that not all of the results would carry over to finite computations.

Critch’s report demonstrates that robust cooperative equilibria exist for bounded agents. In the process, Critch proves a new generalization of Löb’s theorem, and therefore of Gödel’s second incompleteness theorem. This parametric version of Löb’s theorem holds for proofs that can be written out in n or fewer characters, where the parameter n can be set to any number. For more background on the result’s significance, see LaVictoire’s “Introduction to Löb’s theorem in MIRI research.”

The new Löb result shows that bounded agents face obstacles to self-referential reasoning similar to those faced by unbounded agents, and can also reap some of the same benefits. Importantly, this lemma will likely allow us to discuss many other self-referential phenomena going forward using finitary examples rather than infinite ones.



Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.

MIRI has a new COO: Malo Bourgon

 |   |  News

Malo BourgonI’m happy to announce that Malo Bourgon, formerly a program management analyst at MIRI, has taken on a new leadership role as our chief operating officer.

As MIRI’s second-in-command, Malo will be taking over a lot of the hands-on work of coordinating our day-to-day activities: supervising our ops team, planning events, managing our finances, and overseeing internal systems. He’ll also be assisting me in organizational strategy and outreach work.

Prior to joining MIRI, Malo studied electrical, software, and systems engineering at the University of Guelph in Ontario. His professional interests included climate change mitigation, and during his master’s, he worked on a project to reduce waste through online detection of inefficient electric motors. Malo started working for us shortly after completing his master’s in early 2012, which makes him MIRI’s longest-standing team member next to Eliezer Yudkowsky.

Read more »

Announcing a new colloquium series and fellows program

 |   |  News

The Machine Intelligence Research Institute is accepting applicants to two summer programs: a three-week AI robustness and reliability colloquium series (co-run with the Oxford Future of Humanity Institute), and a two-week fellows program focused on helping new researchers contribute to MIRI’s technical agenda (co-run with the Center for Applied Rationality).

The Colloquium Series on Robust and Beneficial AI (CSRBAI), running from May 27 to June 17, is a new gathering of top researchers in academia and industry to tackle the kinds of technical questions featured in the Future of Life Institute’s long-term AI research priorities report and project grants, including transparency, error-tolerance, and preference specification in software systems.

The goal of the event is to spark new conversations and collaborations between safety-conscious AI scientists with a variety of backgrounds and research interests. Attendees will be invited to give and attend talks at MIRI’s Berkeley, California offices during Wednesday/Thursday/Friday colloquia, to participate in hands-on Saturday/Sunday workshops, and to drop by for open discussion days:



Scheduled speakers include Stuart Russell (May 27), UC Berkeley Professor of Computer Science and co-author of Artificial Intelligence: A Modern Approach, Tom Dietterich (May 27), AAAI President and OSU Director of Intelligent Systems, and Bart Selman (June 3), Cornell Professor of Computer Science.

Apply here to attend any portion of the event, as well as to propose a talk or discussion topic:


Application Form


The 2016 MIRI Summer Fellows program, running from June 19 to July 3, doubles as a workshop for developing new problem-solving skills and mathematical intuitions, and a crash course on MIRI’s active research projects.

This is a smaller and more focused version of the Summer Fellows program we ran last year, which resulted in multiple new hires for us. As such, the program also functions as a high-intensity research retreat where MIRI staff and potential collaborators can get to know each other and work together on important open problems in AI. Apply here to attend the program:


Application Form


Both programs are free of charge, including free room and board for all MIRI Summer Fellows program participants, free lunches and dinners for CSRBAI participants, and additional partial accommodations and travel assistance for select attendees. For additional information, see the CSRBAI event page and the MIRI Summer Fellows event page.

Seeking Research Fellows in Type Theory and Machine Self-Reference

 |   |  News

The Machine Intelligence Research Institute (MIRI) is accepting applications for a full-time research fellow to develop theorem provers with self-referential capabilities, beginning by implementing a strongly typed language within that very language. The goal of this research project will be to help us understand autonomous systems that can prove theorems about systems with similar deductive capabilities. Applicants should have experience programming in functional programming languages, with a preference for languages with dependent types, such as Agda, Coq, or Lean.

MIRI is a mathematics and computer science research institute specializing in long-term AI safety and robustness work. Our offices are in Berkeley, California, near the UC Berkeley campus.

Read more »

March 2016 Newsletter

 |   |  Newsletters

Research updates

General updates

  • MIRI and other Future of Life Institute (FLI) grantees participated in a AAAI workshop on AI safety this month.
  • MIRI researcher Eliezer Yudkowsky discusses Ray Kurzweil, the Bayesian brain hypothesis, and an eclectic mix of other topics in a new interview.
  • Alexei Andreev and Yudkowsky are seeking investors for Arbital, a new technology for explaining difficult topics in economics, mathematics, computer science, and other disciplines. As a demo, Yudkowsky has written a new and improved guide to Bayes’s Rule.

News and links