Two new researchers join MIRI

 |   |  News

MIRI’s research team is growing! I’m happy to announce that we’ve hired two new research fellows to contribute to our work on AI alignment: Sam Eisenstat and Marcello Herreshoff, both from Google.

 

Sam EisenstatSam Eisenstat studied pure mathematics at the University of Waterloo, where he carried out research in mathematical logic. His previous work was on the automatic construction of deep learning models at Google.

Sam’s research focus is on questions relating to the foundations of reasoning and agency, and he is especially interested in exploring analogies between current theories of logical uncertainty and Bayesian reasoning. He has also done work on decision theory and counterfactuals. His past work with MIRI includes “Asymptotic Decision Theory,” “A Limit-Computable, Self-Reflective Distribution,” and “A Counterexample to an Informal Conjecture on Proof Length and Logical Counterfactuals.”

 

Marcello Herreshoff Marcello Herreshoff studied at Stanford, receiving a B.S. in Mathematics with Honors and getting two honorable mentions in the Putnam Competition, the world’s most highly regarded university-level math competition. Marcello then spent five years as a software engineer at Google, gaining a background in machine learning.

Marcello is one of MIRI’s earliest research collaborators, and attended our very first research workshop alongside Eliezer Yudkowsky, Paul Christiano, and Mihály Bárász. Marcello has worked with us in the past to help produce results such as “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem,” “Definability of Truth in Probabilistic Logic,” and “Tiling Agents for Self-Modifying AI.” His research interests include logical uncertainty and the design of reflective agents.

 

Sam and Marcello will be starting with us in the first two weeks of April. This marks the beginning of our first wave of new research fellowships since 2015, though we more recently added Ryan Carey to the team on an assistant research fellowship (in mid-2016).

We have additional plans to expand our research team in the coming months, and will soon be hiring for a more diverse set of technical roles at MIRI — details forthcoming!

2016 in review

 |   |  MIRI Strategy

It’s time again for my annual review of MIRI’s activities.1 In this post I’ll provide a summary of what we did in 2016, see how our activities compare to our previously stated goals and predictions, and reflect on how our strategy this past year fits into our mission as an organization. We’ll be following this post up in April with a strategic update for 2017.

After doubling the size of the research team in 2015,2 we slowed our growth in 2016 and focused on integrating the new additions into our team, making research progress, and writing up a backlog of existing results.

2016 was a big year for us on the research front, with our new researchers making some of the most notable contributions. Our biggest news was Scott Garrabrant’s logical inductors framework, which represents by a significant margin our largest progress to date on the problem of logical uncertainty. We additionally released “Alignment for Advanced Machine Learning Systems” (AAMLS), a new technical agenda spearheaded by Jessica Taylor.

We also spent this last year engaging more heavily with the wider AI community, e.g., through the month-long Colloquium Series on Robust and Beneficial Artificial Intelligence we co-ran with the Future of Humanity Institute, and through talks and participation in panels at many events through the year.

Read more »


  1. See our previous reviews: 2015, 2014, 2013
  2. From 2015 in review: “Patrick LaVictoire joined in March, Jessica Taylor in August, Andrew Critch in September, and Scott Garrabrant in December. With Nate transitioning to a non-research role, overall we grew from a three-person research team (Eliezer, Benya, and Nate) to a six-person team.” 

New paper: “Cheating Death in Damascus”

 |   |  Papers

Cheating Death in DamascusMIRI Executive Director Nate Soares and Rutgers/UIUC decision theorist Ben Levinstein have a new paper out introducing functional decision theory (FDT), MIRI’s proposal for a general-purpose decision theory.

The paper, titled “Cheating Death in Damascus,” considers a wide range of decision problems. In every case, Soares and Levinstein show that FDT outperforms all earlier theories in utility gained. The abstract reads:

Evidential and Causal Decision Theory are the leading contenders as theories of rational action, but both face fatal counterexamples. We present some new counterexamples, including one in which the optimal action is causally dominated. We also present a novel decision theory, Functional Decision Theory (FDT), which simultaneously solves both sets of counterexamples.

Instead of considering which physical action of theirs would give rise to the best outcomes, FDT agents consider which output of their decision function would give rise to the best outcome. This theory relies on a notion of subjunctive dependence, where multiple implementations of the same mathematical function are considered (even counterfactually) to have identical results for logical rather than causal reasons. Taking these subjunctive dependencies into account allows FDT agents to outperform CDT and EDT agents in, e.g., the presence of accurate predictors. While not necessary for considering classic decision theory problems, we note that a full specification of FDT will require a non-trivial theory of logical counterfactuals and algorithmic similarity.

“Death in Damascus” is a standard decision-theoretic dilemma. In it, a trustworthy predictor (Death) promises to find you and bring your demise tomorrow, whether you stay in Damascus or flee to Aleppo. Fleeing to Aleppo is costly and provides no benefit, since Death, having predicted your future location, will then simply come for you in Aleppo instead of Damascus.

In spite of this, causal decision theory often recommends fleeing to Aleppo — for much the same reason it recommends defecting in the one-shot twin prisoner’s dilemma and two-boxing in Newcomb’s problem. CDT agents reason that Death has already made its prediction, and that switching cities therefore can’t cause Death to learn your new location. Even though the CDT agent recognizes that Death is inescapable, the CDT agent’s decision rule forbids taking this fact into account in reaching decisions. As a consequence, the CDT agent will happily give up arbitrary amounts of utility in a pointless flight from Death.

Causal decision theory fails in Death in Damascus, Newcomb’s problem, and the twin prisoner’s dilemma — and also in the “random coin,” “Death on Olympus,” “asteroids,” and “murder lesion” dilemmas described in the paper — because its counterfactuals only track its actions’ causal impact on the world, and not the rest of the world’s causal (and logical, etc.) structure.

While evidential decision theory succeeds in these dilemmas, it fails in a new decision problem, “XOR blackmail.”1 FDT consistently outperforms both of these theories, providing an elegant account of normative action for the full gamut of known decision problems.

Read more »


  1. Just as the variants on Death in Damascus in Soares and Levinstein’s paper help clarify CDT’s particular point of failure, XOR blackmail drills down more exactly on EDT’s failure point than past decision problems have. In particular, EDT cannot be modified to avoid XOR blackmail in the ways it can be modified to smoke in the smoking lesion problem. 

March 2017 Newsletter

 |   |  Newsletters

Research updates

General updates

  • Why AI Safety?: A quick summary (originally posted during our fundraiser) of the case for working on AI risk, including notes on distinctive features of our approach and our goals for the field.
  • Nate Soares attended “Envisioning and Addressing Adverse AI Outcomes,” an event pitting red-team attackers against defenders in a variety of AI risk scenarios.
  • We also attended an AI safety strategy retreat run by the Center for Applied Rationality.

News and links

Using machine learning to address AI risk

 |   |  Analysis, Video

At the EA Global 2016 conference, I gave a talk on “Using Machine Learning to Address AI Risk”:

It is plausible that future artificial general intelligence systems will share many qualities in common with present-day machine learning systems. If so, how could we ensure that these systems robustly act as intended? We discuss the technical agenda for a new project at MIRI focused on this question.

A recording of my talk is now up online:

 

 

The talk serves as a quick survey (for a general audience) of the kinds of technical problems we’re working on under the “Alignment for Advanced ML Systems” research agenda. Included below is a version of the talk in blog post form.1

Talk outline:

1. Goal of this research agenda

2. Six potential problems with highly capable AI systems

2.1. Actions are hard to evaluate
2.2. Ambiguous test examples
2.3. Difficulty imitating human behavior
2.4. Difficulty specifying goals about the real world
2.5. Negative side-effects
2.6. Edge cases that still satisfy the goal

3. Technical details on one problem: inductive ambiguity identification

3.1. KWIK learning
3.2. A Bayesian view of the problem

4. Other agendas

Read more »


  1. I also gave a version of this talk at the MIRI/FHI Colloquium on Robust and Beneficial AI. 

February 2017 Newsletter

 |   |  Newsletters

Following up on a post outlining some of the reasons MIRI researchers and OpenAI researcher Paul Christiano are pursuing different research directions, Jessica Taylor has written up the key motivations for MIRI’s highly reliable agent design research.

 

Research updates

 

General updates

  • We attended the Future of Life Institute’s Beneficial AI conference at Asilomar. See Scott Alexander’s recap. MIRI executive director Nate Soares was on a technical safety panel discussion with representatives from DeepMind, OpenAI, and academia (video), also featuring a back-and-forth with Yann LeCun, the head of Facebook’s AI research group (at 22:00).
  • MIRI staff and a number of top AI researchers are signatories on FLI’s new Asilomar AI Principles, which include cautions regarding arms races, value misalignment, recursive self-improvement, and superintelligent AI.
  • The Center for Applied Rationality recounts MIRI researcher origin stories and other cases where their workshops have been a big assist to our work, alongside examples of CFAR’s impact on other groups.
  • The Open Philanthropy Project has awarded a $32,000 grant to AI Impacts.
  • Andrew Critch spoke at Princeton’s ENVISION conference (video).
  • Matthew Graves has joined MIRI as a staff writer. See his first piece for our blog, a reply to “Superintelligence: The Idea That Eats Smart People.”
  • The audio version of Rationality: From AI to Zombies is temporarily unavailable due to the shutdown of Castify. However, fans are already putting together a new free recording of the full collection.

 

News and links

  • An Asilomar panel on superintelligence (video) gathers Elon Musk (OpenAI), Demis Hassabis (DeepMind), Ray Kurzweil (Google), Stuart Russell and Bart Selman (CHCAI), Nick Bostrom (FHI), Jaan Tallinn (CSER), Sam Harris, and David Chalmers.
  • Also from Asilomar: Russell on corrigibility (video), Bostrom on openness in AI (video), and LeCun on the path to general AI (video).
  • From MIT Technology Review‘s “AI Software Learns to Make AI Software”:
    Companies must currently pay a premium for machine-learning experts, who are in short supply. Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

CHCAI/MIRI research internship in AI safety

 |   |  News

We’re looking for talented, driven, and ambitious technical researchers for a summer research internship with the Center for Human-Compatible AI (CHCAI) and the Machine Intelligence Research Institute (MIRI).

About the research:

CHCAI is a research center based at UC Berkeley with PIs including Stuart Russell, Pieter Abbeel and Anca Dragan. CHCAI describes its goal as “to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems”.

MIRI is an independent research nonprofit located near the UC Berkeley campus with a mission of helping ensure that smarter-than-human AI has a positive impact on the world.

CHCAI’s research focus includes work on inverse reinforcement learning and human-robot cooperation (link), while MIRI’s focus areas include task AI and computational reflection (link). Both groups are also interested in theories of (bounded) rationality that may help us develop a deeper understanding of general-purpose AI agents.

To apply:

1. Fill in the form here: https://goo.gl/forms/bDe6xbbKwj1tgDbo1

2. Send an email to beth.m.barnes@gmail.com with the subject line “AI safety internship application”, attaching your CV, a piece of technical writing on which you were the primary author, and your research proposal.

Read more »

New paper: “Toward negotiable reinforcement learning”

 |   |  Papers

Toward Negotiable Reinforcement LearningMIRI Research Fellow Andrew Critch has developed a new result in the theory of conflict resolution, described in “Toward negotiable reinforcement learning: Shifting priorities in Pareto optimal sequential decision-making.”

Abstract:

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time.

Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from naïve linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

Read more »