February 2017 Newsletter

 |   |  Newsletters

Following up on a post outlining some of the reasons MIRI researchers and OpenAI researcher Paul Christiano are pursuing different research directions, Jessica Taylor has written up the key motivations for MIRI’s highly reliable agent design research.

 

Research updates

 

General updates

  • We attended the Future of Life Institute’s Beneficial AI conference at Asilomar. See Scott Alexander’s recap. MIRI executive director Nate Soares was on a technical safety panel discussion with representatives from DeepMind, OpenAI, and academia (video), also featuring a back-and-forth with Yann LeCun, the head of Facebook’s AI research group (at 22:00).
  • MIRI staff and a number of top AI researchers are signatories on FLI’s new Asilomar AI Principles, which include cautions regarding arms races, value misalignment, recursive self-improvement, and superintelligent AI.
  • The Center for Applied Rationality recounts MIRI researcher origin stories and other cases where their workshops have been a big assist to our work, alongside examples of CFAR’s impact on other groups.
  • The Open Philanthropy Project has awarded a $32,000 grant to AI Impacts.
  • Andrew Critch spoke at Princeton’s ENVISION conference (video).
  • Matthew Graves has joined MIRI as a staff writer. See his first piece for our blog, a reply to “Superintelligence: The Idea That Eats Smart People.”
  • The audio version of Rationality: From AI to Zombies is temporarily unavailable due to the shutdown of Castify. However, fans are already putting together a new free recording of the full collection.

 

News and links

  • An Asilomar panel on superintelligence (video) gathers Elon Musk (OpenAI), Demis Hassabis (DeepMind), Ray Kurzweil (Google), Stuart Russell and Bart Selman (CHCAI), Nick Bostrom (FHI), Jaan Tallinn (CSER), Sam Harris, and David Chalmers.
  • Also from Asilomar: Russell on corrigibility (video), Bostrom on openness in AI (video), and LeCun on the path to general AI (video).
  • From MIT Technology Review‘s “AI Software Learns to Make AI Software”:
    Companies must currently pay a premium for machine-learning experts, who are in short supply. Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

CHCAI/MIRI research internship in AI safety

 |   |  News

We’re looking for talented, driven, and ambitious technical researchers for a summer research internship with the Center for Human-Compatible AI (CHCAI) and the Machine Intelligence Research Institute (MIRI).

About the research:

CHCAI is a research center based at UC Berkeley with PIs including Stuart Russell, Pieter Abbeel and Anca Dragan. CHCAI describes its goal as “to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems”.

MIRI is an independent research nonprofit located near the UC Berkeley campus with a mission of helping ensure that smarter-than-human AI has a positive impact on the world.

CHCAI’s research focus includes work on inverse reinforcement learning and human-robot cooperation (link), while MIRI’s focus areas include task AI and computational reflection (link). Both groups are also interested in theories of (bounded) rationality that may help us develop a deeper understanding of general-purpose AI agents.

To apply:

1. Fill in the form here: https://goo.gl/forms/bDe6xbbKwj1tgDbo1

2. Send an email to beth.m.barnes@gmail.com with the subject line “AI safety internship application”, attaching your CV, a piece of technical writing on which you were the primary author, and your research proposal.

Read more »

New paper: “Toward negotiable reinforcement learning”

 |   |  Papers

Toward Negotiable Reinforcement LearningMIRI Research Fellow Andrew Critch has developed a new result in the theory of conflict resolution, described in “Toward negotiable reinforcement learning: Shifting priorities in Pareto optimal sequential decision-making.”

Abstract:

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time.

Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from naïve linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

Read more »

Response to Cegłowski on superintelligence

 |   |  Analysis

Web developer Maciej Cegłowski recently gave a talk on AI safety (video, text) arguing that we should be skeptical of the standard assumptions that go into working on this problem, and doubly skeptical of the extreme-sounding claims, attitudes, and policies these premises appear to lead to. I’ll give my reply to each of these points below.

First, a brief outline: this will mirror the structure of Cegłowski’s talk in that first I try to put forth my understanding of the broader implications of Cegłowski’s talk, then deal in detail with the inside-view arguments as to whether or not the core idea is right, then end by talking some about the structure of these discussions.
 

Read more »

January 2017 Newsletter

 |   |  Newsletters

Eliezer Yudkowsky’s new introductory talk on AI safety is out, in text and video forms: “The AI Alignment Problem: Why It’s Hard, and Where to Start.” Other big news includes the release of version 1 of Ethically Aligned Design, an IEEE recommendations document with a section on artificial general intelligence that we helped draft.

Research updates

General updates

News and links

New paper: “Optimal polynomial-time estimators”

 |   |  Papers

Optimal Polynomial-Time EstimatorsMIRI Research Associate Vadim Kosoy has developed a new framework for reasoning under logical uncertainty, “Optimal polynomial-time estimators: A Bayesian notion of approximation algorithm.” Abstract:

The concept of an “approximation algorithm” is usually only applied to optimization problems, since in optimization problems the performance of the algorithm on any given input is a continuous parameter. We introduce a new concept of approximation applicable to decision problems and functions, inspired by Bayesian probability. From the perspective of a Bayesian reasoner with limited computational resources, the answer to a problem that cannot be solved exactly is uncertain and therefore should be described by a random variable. It thus should make sense to talk about the expected value of this random variable, an idea we formalize in the language of average-case complexity theory by introducing the concept of “optimal polynomial-time estimators.” We prove some existence theorems and completeness results, and show that optimal polynomial-time estimators exhibit many parallels with “classical” probability theory.

Kosoy’s optimal estimators framework attempts to model general-purpose reasoning under deductive limitations from a different angle than Scott Garrabrant’s logical inductors framework, putting more focus on computational efficiency and tractability.

Read more »

AI Alignment: Why It’s Hard, and Where to Start

 |   |  Analysis, Video

Back in May, I gave a talk at Stanford University for the Symbolic Systems Distinguished Speaker series, titled “The AI Alignment Problem: Why It’s Hard, And Where To Start.” The video for this talk is now available on Youtube:

 

 

We have an approximately complete transcript of the talk and Q&A session here, slides here, and notes and references here. You may also be interested in a shorter version of this talk I gave at NYU in October, “Fundamental Difficulties in Aligning Advanced AI.”

In the talk, I introduce some open technical problems in AI alignment and discuss the bigger picture into which they fit, as well as what it’s like to work in this relatively new field. Below, I’ve provided an abridged transcript of the talk, with some accompanying slides.

Talk outline:

1. Agents and their utility functions

1.1. Coherent decisions imply a utility function
1.2. Filling a cauldron

2. Some AI alignment subproblems

2.1. Low-impact agents
2.2. Agents with suspend buttons
2.3. Stable goals in self-modification

3. Why expect difficulty?

3.1. Why is alignment necessary?
3.2. Why is alignment hard?
3.3. Lessons from NASA and cryptography

4. Where we are now

4.1. Recent topics
4.2. Older work and basics
4.3. Where to start

Read more »

December 2016 Newsletter

 |   |  Newsletters

We’re in the final weeks of our push to cover our funding shortfall, and we’re now halfway to our $160,000 goal. For potential donors who are interested in an outside perspective, Future of Humanity Institute (FHI) researcher Owen Cotton-Barratt has written up why he’s donating to MIRI this year. (Donation page.)

Research updates

General updates

  • We teamed up with a number of AI safety researchers to help compile a list of recommended AI safety readings for the Center for Human-Compatible AI. See this page if you would like to get involved with CHCAI’s research.
  • Investment analyst Ben Hoskin reviews MIRI and other organizations involved in AI safety.

News and links

  • The Off-Switch Game“: Dylan Hadfield-Manell, Anca Dragan, Pieter Abbeel, and Stuart Russell show that an AI agent’s corrigibility is closely tied to the uncertainty it has about its utility function.
  • Russell and Allan Dafoe critique an inaccurate summary by Oren Etzioni of a new survey of AI experts on superintelligence.
  • Sam Harris interviews Russell on the basics of AI risk (video). See also Russell’s new Q&A on the future of AI.
  • Future of Life Institute co-founder Viktoriya Krakovna and FHI researcher Jan Leike join Google DeepMind’s safety team.
  • GoodAI sponsors a challenge to “accelerate the search for general artificial intelligence”.
  • OpenAI releases Universe, “a software platform for measuring and training an AI’s general intelligence across the world’s supply of games”. Meanwhile, DeepMind has open-sourced their own platform for general AI research, DeepMind Lab.
  • Staff at GiveWell and the Centre for Effective Altruism, along with others in the effective altruism community, explain where they’re donating this year.
  • FHI is seeking AI safety interns, researchers, and admins: jobs page.