New paper: “Optimal polynomial-time estimators”

 |   |  Papers

Optimal Polynomial-Time EstimatorsMIRI Research Associate Vadim Kosoy has developed a new framework for reasoning under logical uncertainty, “Optimal polynomial-time estimators: A Bayesian notion of approximation algorithm.” Abstract:

The concept of an “approximation algorithm” is usually only applied to optimization problems, since in optimization problems the performance of the algorithm on any given input is a continuous parameter. We introduce a new concept of approximation applicable to decision problems and functions, inspired by Bayesian probability. From the perspective of a Bayesian reasoner with limited computational resources, the answer to a problem that cannot be solved exactly is uncertain and therefore should be described by a random variable. It thus should make sense to talk about the expected value of this random variable, an idea we formalize in the language of average-case complexity theory by introducing the concept of “optimal polynomial-time estimators.” We prove some existence theorems and completeness results, and show that optimal polynomial-time estimators exhibit many parallels with “classical” probability theory.

Kosoy’s optimal estimators framework attempts to model general-purpose reasoning under deductive limitations from a different angle than Scott Garrabrant’s logical inductors framework, putting more focus on computational efficiency and tractability.

Read more »

AI Alignment: Why It’s Hard, and Where to Start

 |   |  Analysis, Video

Back in May, I gave a talk at Stanford University for the Symbolic Systems Distinguished Speaker series, titled “The AI Alignment Problem: Why It’s Hard, And Where To Start.” The video for this talk is now available on Youtube:



We have an approximately complete transcript of the talk and Q&A session here, slides here, and notes and references here. You may also be interested in a shorter version of this talk I gave at NYU in October, “Fundamental Difficulties in Aligning Advanced AI.”

In the talk, I introduce some open technical problems in AI alignment and discuss the bigger picture into which they fit, as well as what it’s like to work in this relatively new field. Below, I’ve provided an abridged transcript of the talk, with some accompanying slides.

Talk outline:

1. Agents and their utility functions

1.1. Coherent decisions imply a utility function
1.2. Filling a cauldron

2. Some AI alignment subproblems

2.1. Low-impact agents
2.2. Agents with suspend buttons
2.3. Stable goals in self-modification

3. Why expect difficulty?

3.1. Why is alignment necessary?
3.2. Why is alignment hard?
3.3. Lessons from NASA and cryptography

4. Where we are now

4.1. Recent topics
4.2. Older work and basics
4.3. Where to start

Read more »

December 2016 Newsletter

 |   |  Newsletters

We’re in the final weeks of our push to cover our funding shortfall, and we’re now halfway to our $160,000 goal. For potential donors who are interested in an outside perspective, Future of Humanity Institute (FHI) researcher Owen Cotton-Barratt has written up why he’s donating to MIRI this year. (Donation page.)

Research updates

General updates

  • We teamed up with a number of AI safety researchers to help compile a list of recommended AI safety readings for the Center for Human-Compatible AI. See this page if you would like to get involved with CHCAI’s research.
  • Investment analyst Ben Hoskin reviews MIRI and other organizations involved in AI safety.

News and links

  • The Off-Switch Game“: Dylan Hadfield-Manell, Anca Dragan, Pieter Abbeel, and Stuart Russell show that an AI agent’s corrigibility is closely tied to the uncertainty it has about its utility function.
  • Russell and Allan Dafoe critique an inaccurate summary by Oren Etzioni of a new survey of AI experts on superintelligence.
  • Sam Harris interviews Russell on the basics of AI risk (video). See also Russell’s new Q&A on the future of AI.
  • Future of Life Institute co-founder Viktoriya Krakovna and FHI researcher Jan Leike join Google DeepMind’s safety team.
  • GoodAI sponsors a challenge to “accelerate the search for general artificial intelligence”.
  • OpenAI releases Universe, “a software platform for measuring and training an AI’s general intelligence across the world’s supply of games”. Meanwhile, DeepMind has open-sourced their own platform for general AI research, DeepMind Lab.
  • Staff at GiveWell and the Centre for Effective Altruism, along with others in the effective altruism community, explain where they’re donating this year.
  • FHI is seeking AI safety interns, researchers, and admins: jobs page.

November 2016 Newsletter

 |   |  Newsletters

Post-fundraiser update: Donors rallied late last month to get us most of the way to our first fundraiser goal, but we ultimately fell short. This means that we’ll need to make up the remaining $160k gap over the next month if we’re going to move forward on our 2017 plans. We’re in a good position to expand our research staff and trial a number of potential hires, but only if we feel confident about our funding prospects over the next few years.

Since we don’t have an official end-of-the-year fundraiser planned this time around, we’ll be relying more on word-of-mouth to reach new donors. To help us with our expansion plans, donate at — and spread the word!

Research updates

General updates

News and links

Post-fundraiser update

 |   |  News

We concluded our 2016 fundraiser eleven days ago. Progress was slow at first, but our donors came together in a big way in the final week, nearly doubling our final total. In the end, donors raised $589,316 over six weeks, making this our second-largest fundraiser to date. I’m heartened by this show of support, and extremely grateful to the 247 distinct donors who contributed.

We made substantial progress toward our immediate funding goals, but ultimately fell short of our $750,000 target by about $160k. We have a number of hypotheses as to why, but our best guess at the moment is that we missed our target because more donors than expected are waiting until the end of the year to decide whether (and how much) to give.

We were experimenting this year with running just one fundraiser in the fall (replacing the summer and winter fundraisers we’ve run in years past) and spending less time over the year on fundraising. Our fundraiser ended up looking more like recent summer funding drives, however. This suggests that either many donors are waiting to give in November and December, or we’re seeing a significant decline in donor support:

Looking at our donor database, preliminary data weakly suggests that many traditionally-winter donors are holding off, but it’s still hard to say.

This dip in donations so far is offset by the Open Philanthropy Project’s generous $500k grant, which raises our overall 2016 revenue from $1.23M to $1.73M. However, $1.73M would still not be enough to cover our 2016 expenses, much less our expenses for the coming year:

(2016 and 2017 expenses are projected, and our 2016 revenue is as of November 11.)

To a first approximation, this level of support means that we can continue to move forward without scaling back our plans too much, but only if donors come together to fill what’s left of our $160k gap as the year draws to a close:








We’ve reached our minimum target!


In practical terms, closing this gap will mean that we can likely trial more researchers over the coming year, spend less senior staff time on raising funds, and take on more ambitious outreach and researcher-pipeline projects. E.g., an additional expected $75k / year would likely cause us to trial one extra researcher over the next 18 months (maxing out at 3-5 trials).

Currently, we’re in a situation where we have a number of potential researchers that we would like to give a 3-month trial, and we lack the funding to trial all of them. If we don’t close the gap this winter, then it’s also likely that we’ll need to move significantly more slowly on hiring and trialing new researchers going forward.

Our main priority in fundraisers is generally to secure stable, long-term flows of funding to pay for researcher salaries — “stable” not necessarily at the level of individual donors, but at least at the level of the donor community at large. If we make up our shortfall in November and December, then this will suggest that we shouldn’t expect big year-to-year fluctuations in support, and therefore we can fairly quickly convert marginal donations into AI safety researchers. If we don’t make up our shortfall soon, then this will suggest that we should be generally more prepared for surprises, which will require building up a bigger runway before growing the team very much.

Although we aren’t officially running a fundraiser, we still have quite a bit of ground to cover, and we’ll need support from a lot of new and old donors alike to get the rest of the way to our $750k target. Visit to donate toward this goal, and do spread the word to people who may be interested in supporting our work.

You have my gratitude, again, for helping us get this far. It isn’t clear yet whether we’re out of the woods, but we’re now in a position where success in our 2016 fundraising is definitely a realistic option, provided that we put some work into it over the next two months. Thank you.

Update December 22: We have now hit our $750k goal, with help from end-of-the-year donors. Many thanks to everyone who helped pitch in over the last few months! We’re still funding-constrained with respect to how many researchers we’re likely to trial, as described above — but it now seems clear that 2016 overall won’t be an unusually bad year for us funding-wise, and that we can seriously consider (though not take for granted) more optimistic growth possibilities over the next couple of years.

December/January donations will continue to have a substantial effect on our 2017–2018 hiring plans and strategy as we try to assess our future prospects. For some external endorsements of MIRI as a good place to give this winter, see a suite of recent evaluations by Daniel Dewey, Nick Beckstead, Owen Cotton-Barratt, and Ben Hoskin.

White House submissions and report on AI safety

 |   |  News

In May, the White House Office of Science and Technology Policy (OSTP) announced “a new series of workshops and an interagency working group to learn more about the benefits and risks of artificial intelligence.” They hosted a June Workshop on Safety and Control for AI (videos), along with three other workshops, and issued a general request for information on AI (see MIRI’s primary submission here).

The OSTP has now released a report summarizing its conclusions, “Preparing for the Future of Artificial Intelligence,” and the result is very promising. The OSTP acknowledges the ongoing discussion about AI risk, and recommends “investing in research on longer-term capabilities and how their challenges might be managed”:

General AI (sometimes called Artificial General Intelligence, or AGI) refers to a notional future AI system that exhibits apparently intelligent behavior at least as advanced as a person across the full range of cognitive tasks. A broad chasm seems to separate today’s Narrow AI from the much more difficult challenge of General AI. Attempts to reach General AI by expanding Narrow AI solutions have made little headway over many decades of research. The current consensus of the private-sector expert community, with which the NSTC Committee on Technology concurs, is that General AI will not be achieved for at least decades.14

People have long speculated on the implications of computers becoming more intelligent than humans. Some predict that a sufficiently intelligent AI could be tasked with developing even better, more intelligent systems, and that these in turn could be used to create systems with yet greater intelligence, and so on, leading in principle to an “intelligence explosion” or “singularity” in which machines quickly race far ahead of humans in intelligence.15

In a dystopian vision of this process, these super-intelligent machines would exceed the ability of humanity to understand or control. If computers could exert control over many critical systems, the result could be havoc, with humans no longer in control of their destiny at best and extinct at worst. This scenario has long been the subject of science fiction stories, and recent pronouncements from some influential industry leaders have highlighted these fears.

A more positive view of the future held by many researchers sees instead the development of intelligent systems that work well as helpers, assistants, trainers, and teammates of humans, and are designed to operate safely and ethically.

The NSTC Committee on Technology’s assessment is that long-term concerns about super-intelligent General AI should have little impact on current policy. The policies the Federal Government should adopt in the near-to-medium term if these fears are justified are almost exactly the same policies the Federal Government should adopt if they are not justified. The best way to build capacity for addressing the longer-term speculative risks is to attack the less extreme risks already seen today, such as current security, privacy, and safety risks, while investing in research on longer-term capabilities and how their challenges might be managed. Additionally, as research and applications in the field continue to mature, practitioners of AI in government and business should approach advances with appropriate consideration of the long-term societal and ethical questions – in additional to just the technical questions – that such advances portend. Although prudence dictates some attention to the possibility that harmful superintelligence might someday become possible, these concerns should not be the main driver of public policy for AI.

Later, the report discusses “methods for monitoring and forecasting AI developments”:

One potentially useful line of research is to survey expert judgments over time. As one example, a survey of AI researchers found that 80 percent of respondents believed that human-level General AI will eventually be achieved, and half believed it is at least 50 percent likely to be achieved by the year 2040. Most respondents also believed that General AI will eventually surpass humans in general intelligence.50 While these particular predictions are highly uncertain, as discussed above, such surveys of expert judgment are useful, especially when they are repeated frequently enough to measure changes in judgment over time. One way to elicit frequent judgments is to run “forecasting tournaments” such as prediction markets, in which participants have financial incentives to make accurate predictions.51 Other research has found that technology developments can often be accurately predicted by analyzing trends in publication and patent data52. […]

When asked during the outreach workshops and meetings how government could recognize milestones of progress in the field, especially those that indicate the arrival of General AI may be approaching, researchers tended to give three distinct but related types of answers:

1. Success at broader, less structured tasks: In this view, the transition from present Narrow AI to an eventual General AI will occur by gradually broadening the capabilities of Narrow AI systems so that a single system can cover a wider range of less structured tasks. An example milestone in this area would be a housecleaning robot that is as capable as a person at the full range of routine housecleaning tasks.

2. Unification of different “styles” of AI methods: In this view, AI currently relies on a set of separate methods or approaches, each useful for different types of applications. The path to General AI would involve a progressive unification of these methods. A milestone would involve finding a single method that is able to address a larger domain of applications that previously required multiple methods.

3. Solving specific technical challenges, such as transfer learning: In this view, the path to General AI does not lie in progressive broadening of scope, nor in unification of existing methods, but in progress on specific technical grand challenges, opening up new ways forward. The most commonly cited challenge is transfer learning, which has the goal of creating a machine learning algorithm whose result can be broadly applied (or transferred) to a range of new applications.

The report also discusses the open problems outlined in “Concrete Problems in AI Safety” and cites the MIRI paper “The Errors, Insights and Lessons of Famous AI Predictions – and What They Mean for the Future.”

In related news, Barack Obama recently answered some questions about AI risk and Nick Bostrom’s Superintelligence in a Wired interview. After saying that “we’re still a reasonably long way away” from general AI (video) and that his directive to his national security team is to worry more about near-term security concerns (video), Obama adds:

Now, I think, as a precaution — and all of us have spoken to folks like Elon Musk who are concerned about the superintelligent machine — there’s some prudence in thinking about benchmarks that would indicate some general intelligence developing on the horizon. And if we can see that coming, over the course of three decades, five decades, whatever the latest estimates are — if ever, because there are also arguments that this thing’s a lot more complicated than people make it out to be — then future generations, or our kids, or our grandkids, are going to be able to see it coming and figure it out.

Read more »

MIRI AMA, and a talk on logical induction

 |   |  News, Video

Nate, Malo, Jessica, Tsvi, and I will be answering questions tomorrow at the Effective Altruism Forum. If you’ve been curious about anything related to our research, plans, or general thoughts, you’re invited to submit your own questions in the comments below or at Ask MIRI Anything.

We’ve also posted a more detailed version of our fundraiser overview and case for MIRI at the EA Forum.

In other news, we have a new talk out with an overview of “Logical Induction,” our recent paper presenting (as Critch puts it) “a financial solution to the computer science problem of metamathematics”:



This version of the talk goes into more technical detail than our previous talk on logical induction.

For some recent discussions of the new framework, see Shtetl-Optimized, n-Category Café, and Hacker News.

October 2016 Newsletter

 |   |  Newsletters

Our big announcement this month is our paper “Logical Induction,” introducing an algorithm that learns to assign reasonable probabilities to mathematical, empirical, and self-referential claims in a way that outpaces deduction. MIRI’s 2016 fundraiser is also live, and runs through the end of October.


Research updates

General updates

  • We wrote up a more detailed fundraiser post for the Effective Altruism Forum, outlining our research methodology and the basic case for MIRI.
  • We’ll be running an “Ask MIRI Anything” on the EA Forum this Wednesday, Oct. 12.
  • The Open Philanthropy Project has awarded MIRI a one-year $500,000 grant to expand our research program. See also Holden Karnofsky’s account of how his views on EA and AI have changed.

News and links