August 2018 Newsletter

 |   |  Newsletters

July 2018 Newsletter

 |   |  Newsletters

New paper: “Forecasting using incomplete models”

 |   |  Papers

Forecasting Using Incomplete ModelsMIRI Research Associate Vadim Kosoy has a paper out on issues in naturalized induction: “Forecasting using incomplete models”. Abstract:

We consider the task of forecasting an infinite sequence of future observations based on some number of past observations, where the probability measure generating the observations is “suspected” to satisfy one or more of a set of incomplete models, i.e., convex sets in the space of probability measures.

This setting is in some sense intermediate between the realizable setting where the probability measure comes from some known set of probability measures (which can be addressed using e.g. Bayesian inference) and the unrealizable setting where the probability measure is completely arbitrary.

We demonstrate a method of forecasting which guarantees that, whenever the true probability measure satisfies an incomplete model in a given countable set, the forecast converges to the same incomplete model in the (appropriately normalized) Kantorovich-Rubinstein metric. This is analogous to merging of opinions for Bayesian inference, except that convergence in the Kantorovich-Rubinstein metric is weaker than convergence in total variation.

Kosoy’s work builds on logical inductors to create a cleaner (purely learning-theoretic) formalism for modeling complex environments, showing that the methods developed in “Logical induction” are useful for applications in classical sequence prediction unrelated to logic.

“Forecasting using incomplete models” also shows that the intuitive concept of an “incomplete” or “partial” model has an elegant and useful formalization related to Knightian uncertainty. Additionally, Kosoy shows that using incomplete models to generalize Bayesian inference allows an agent to make predictions about environments that can be as complex as the agent itself, or more complex — as contrasted with classical Bayesian inference.

For more of Kosoy’s research, see “Optimal polynomial-time estimators” and the Intelligent Agent Foundations Forum.

Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.


June 2018 Newsletter

 |   |  Newsletters

May 2018 Newsletter

 |   |  Newsletters

Challenges to Christiano’s capability amplification proposal

 |   |  Analysis

The following is a basically unedited summary I wrote up on March 16 of my take on Paul Christiano’s AGI alignment approach (described in “ALBA” and “Iterated Distillation and Amplification”). Where Paul had comments and replies, I’ve included them below.

I see a lot of free variables with respect to what exactly Paul might have in mind. I've sometimes tried presenting Paul with my objections and then he replies in a way that locally answers some of my question but I think would make other difficulties worse. My global objection is thus something like, "I don't see any concrete setup and consistent simultaneous setting of the variables where this whole scheme works." These difficulties are not minor or technical; they appear to me quite severe. I try to walk through the details below.

It should be understood at all times that I do not claim to be able to pass Paul’s ITT for Paul’s view and that this is me criticizing my own, potentially straw misunderstanding of what I imagine Paul might be advocating.

Read more »

April 2018 Newsletter

 |   |  Newsletters

2018 research plans and predictions

 |   |  MIRI Strategy

Update Nov. 23: This post was edited to reflect Scott’s terminology change from “naturalized world-models” to “embedded world-models.” For a full introduction to these four research problems, see Scott Garrabrant and Abram Demski’s “Embedded Agency.”

Scott Garrabrant is taking over Nate Soares’ job of making predictions about how much progress we’ll make in different research areas this year. Scott divides MIRI’s alignment research into five categories:

embedded world-models — Problems related to modeling large, complex physical environments that lack a sharp agent/environment boundary. Central examples of problems in this category include logical uncertainty, naturalized induction, multi-level world models, and ontological crises.

Introductory resources: “Formalizing Two Problems of Realistic World-Models,” “Questions of Reasoning Under Logical Uncertainty,” “Logical Induction,” “Reflective Oracles

Examples of recent work: “Hyperreal Brouwer,” “An Untrollable Mathematician,” “Further Progress on a Bayesian Version of Logical Uncertainty

decision theory — Problems related to modeling the consequences of different (actual and counterfactual) decision outputs, so that the decision-maker can choose the output with the best consequences. Central problems include counterfactuals, updatelessness, coordination, extortion, and reflective stability.

Introductory resources: “Cheating Death in Damascus,” “Decisions Are For Making Bad Outcomes Inconsistent,” Functional Decision Theory

Examples of recent work: Cooperative Oracles,” “Smoking Lesion Steelman” (1, 2), “The Happy Dance Problem,” “Reflective Oracles as a Solution to the Converse Lawvere Problem

robust delegation — Problems related to building highly capable agents that can be trusted to carry out some task on one’s behalf. Central problems include corrigibility, value learning, informed oversight, and Vingean reflection.

Introductory resources: The Value Learning Problem,” “Corrigibility,” “Problem of Fully Updated Deference,” “Vingean Reflection,” “Using Machine Learning to Address AI Risk

Examples of recent work: “Categorizing Variants of Goodhart’s Law,” “Stable Pointers to Value

subsystem alignment — Problems related to ensuring that an AI system’s subsystems are not working at cross purposes, and in particular that the system avoids creating internal subprocesses that optimize for unintended goals. Central problems include benign induction.

Introductory resources: What Does the Universal Prior Actually Look Like?”, “Optimization Daemons,” “Modeling Distant Superintelligences

Examples of recent work: Some Problems with Making Induction Benign

other — Alignment research that doesn’t fall into the above categories. If we make progress on the open problems described in Alignment for Advanced ML Systems,” and the progress is less connected to our agent foundations work and more ML-oriented, then we’ll likely classify it here.

Read more »