# CSRBAI talks on agent models and multi-agent dilemmas

|   |  News, Video

We’ve uploaded the final set of videos from our recent Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office, co-hosted with the Future of Humanity Institute. A full list of CSRBAI talks with public video or slides:

# MIRI’s 2016 Fundraiser

|   |  News

Update December 22: Our donors came together during the fundraiser to get us most of the way to our $750,000 goal. In all, 251 donors contributed$589,248, making this our second-biggest fundraiser to date. Although we fell short of our target by 160,000, we have since made up this shortfall thanks to November/December donors. I’m extremely grateful for this support, and will plan accordingly for more staff growth over the coming year. As described in our post-fundraiser update, we are still fairly funding-constrained. December/January donations will have an especially large effect on our 2017–2018 hiring plans and strategy, as we try to assess our future prospects. For some external endorsements of MIRI as a good place to give this winter, see recent evaluations by Daniel Dewey, Nick Beckstead, Owen Cotton-Barratt, and Ben Hoskin. Our 2016 fundraiser is underway! Unlike in past years, we’ll only be running one fundraiser in 2016, from Sep. 16 to Oct. 31. Our progress so far (updated live): Employer matching and pledges to give later this year also count towards the total. Click here to learn more. MIRI is a nonprofit research group based in Berkeley, California. We do foundational research in mathematics and computer science that’s aimed at ensuring that smarter-than-human AI systems have a positive impact on the world. 2016 has been a big year for MIRI, and for the wider field of AI alignment research. Our 2016 strategic update in early August reviewed a number of recent developments: We also published new results in decision theory and logical uncertainty, including “Parametric bounded Löb’s theorem and robust cooperation of bounded agents” and “A formal solution to the grain of truth problem.” For a survey of our research progress and other updates from last year, see our 2015 review. In the last three weeks, there have been three more major developments: • We released a new paper, “Logical induction,” describing a method for learning to assign reasonable probabilities to mathematical conjectures and computational facts in a way that outpaces deduction. • The Open Philanthropy Project awarded MIRI a one-year500,000 grant to scale up our research program, with a strong chance of renewal next year.
• The Open Philanthropy Project is supporting the launch of the new UC Berkeley Center for Human-Compatible AI, headed by Stuart Russell.

Things have been moving fast over the last nine months. If we can replicate last year’s fundraising successes, we’ll be in an excellent position to move forward on our plans to grow our team and scale our research activities.

# New paper: “Logical induction”

|   |  Papers

MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, myself, and Jessica Taylor. Readers may wish to start with the abridged version.

Consider a setting where a reasoner is observing a deductive process (such as a community of mathematicians and computer programmers) and waiting for proofs of various logical claims (such as the abc conjecture, or “this computer program has a bug in it”), while making guesses about which claims will turn out to be true. Roughly speaking, our paper presents a computable (though inefficient) algorithm that outpaces deduction, assigning high subjective probabilities to provable conjectures and low probabilities to disprovable conjectures long before the proofs can be produced.

This algorithm has a large number of nice theoretical properties. Still speaking roughly, the algorithm learns to assign probabilities to sentences in ways that respect any logical or statistical pattern that can be described in polynomial time. Additionally, it learns to reason well about its own beliefs and trust its future beliefs while avoiding paradox. Quoting from the abstract:

Coming on the heels of a $300,000 donation by Blake Borgeson, this support will help us continue on the growth trajectory we outlined in our summer and winter fundraisers last year and effect another doubling of the research team. These growth plans assume continued support from other donors in line with our fundraising successes last year; we’ll be discussing our remaining funding gap in more detail in our 2016 fundraiser, which we’ll be kicking off later this month. The Open Philanthropy Project is a joint initiative run by staff from the philanthropic foundation Good Ventures and the charity evaluator GiveWell. Open Phil has recently made it a priority to identify opportunities for researchers to address potential risks from advanced AI, and we consider their early work in this area promising: grants to Stuart Russell, Robin Hanson, and the Future of Life Institute, plus a stated interest in funding work related to “Concrete Problems in AI Safety,” a recent paper co-authored by four Open Phil technical advisers, Christopher Olah (Google Brain), Dario Amodei (OpenAI), Paul Christiano (UC Berkeley), and Jacob Steinhardt (Stanford), along with John Schulman (OpenAI) and Dan Mané (Google Brain). Open Phil’s grant isn’t a full endorsement, and they note a number of reservations about our work in an extensive writeup detailing the thinking that went into the grant decision. Separately, Open Phil Executive Director Holden Karnofsky has written some personal thoughts about how his views of MIRI and the effective altruism community have evolved in recent years. # September 2016 Newsletter | | Newsletters  Research updates New at IAFF: Modeling the Capabilities of Advanced AI Systems as Episodic Reinforcement Learning; Simplified Explanation of Stratification New at AI Impacts: Friendly AI as a Global Public Good We ran two research workshops this month: a veterans’ workshop on decision theory for long-time collaborators and staff, and a machine learning workshop focusing on generalizable environmental goals, impact measures, and mild optimization. AI researcher Abram Demski has accepted a research fellowship at MIRI, pending the completion of his PhD. He’ll be starting here in late 2016 / early 2017. Data scientist Ryan Carey is joining MIRI’s ML-oriented team this month as an assistant research fellow. General updates MIRI’s 2016 strategy update outlines how our research plans have changed in light of recent developments. We also announce a generous$300,000 gift — our second-largest single donation to date. We’ve uploaded nine talks from CSRBAI’s robustness and preference specification weeks, including Jessica Taylor on “Alignment for Advanced Machine Learning Systems” (video), Jan Leike on “General Reinforcement Learning” (video), Paul Christiano on “Training an Aligned RL Agent” (video), and Dylan Hadfield-Menell on “The Off-Switch” (video). MIRI COO Malo Bourgon has been co-chairing a committee of IEEE’s Global Initiative for Ethical Considerations in the Design of Autonomous Systems. He recently moderated a workshop on general AI and superintelligence at the initiative’s first meeting. We had a great time at Effective Altruism Global, and taught at SPARC. We hired two new admins: Office Manager Aaron Silverbook, and Communications and Development Strategist Colm Ó Riain. News and links The Open Philanthropy Project awards $5.6 million to Stuart Russell to launch an academic AI safety research institute: the Center for Human-Compatible AI. “Who Should Control Our Thinking Machines?“: Jack Clark interviews DeepMind’s Demis Hassabis. Elon Musk explains: “I think the biggest risk is not that the AI will develop a will of its own, but rather that it will follow the will of people that establish its utility function, or its optimization function. And that optimization function, if it is not well-thought-out — even if its intent is benign, it could have quite a bad outcome.” Modeling Intelligence as a Project-Specific Factor of Production: Ben Hoffman compares different AI takeoff scenarios. Clopen AI: Viktoriya Krakovna weighs the advantages of closed vs. open AI. Google X director Astro Teller expresses optimism about the future of AI in a Medium post announcing the first report of the Stanford AI100 study. Buzzfeed reports on efforts to prevent the development of lethal autonomous weapons systems. In controlled settings, researchers find ways to detect keystrokes via distortions in WiFi signals and jump air-gaps using hard drive actuator noises. Solid discussions on the EA Forum: Should Donors Make Commitments About Future Donations? and Should You Switch Away From Earning to Give? # CSRBAI talks on preference specification | | News, Video We’ve uploaded a third set of videos from our recent Colloquium Series on Robust and Beneficial AI (CSRBAI), co-hosted with the Future of Humanity Institute. These talks were part of the week focused on preference specification in AI systems, including the difficulty of specifying safe and useful goals, or specifying safe and useful methods for learning human preferences. All released videos are available on the CSRBAI web page. Tom Everitt, a PhD student at the Australian National University, spoke about his paper “Avoiding wireheading with value reinforcement learning,” written with Marcus Hutter (slides). Abstract: How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward — the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading. Our VRL agent offers the ease of control of RL agents and avoids the incentive for wireheading. # CSRBAI talks on robustness and error-tolerance | | News, Video We’ve uploaded a second set of videos from our recent Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office, co-hosted with the Future of Humanity Institute. These talks were part of the week focused on robustness and error-tolerance in AI systems, and how to ensure that when AI system fail, they fail gracefully and detectably. All released videos are available on the CSRBAI web page. Bart Selman, professor of computer science at Cornell University, spoke about machine reasoning and planning (slides). Excerpt: I’d like to look at what I call “non-human intelligence.” It does get less attention, but the advances also have been very interesting, and they’re in reasoning and planning. It’s actually partly not getting as much attention in the AI world because it’s more used in software verification, program synthesis, and automating science and mathematical discoveries – other areas related to AI but not a central part of AI that are using these reasoning technologies. Especially the software verification world – Microsoft, Intel, IBM – push these reasoning programs very hard, and that’s why there’s so much progress, and I think it will start feeding back into AI in the near future. # MIRI strategy update: 2016 | | MIRI Strategy This post is a follow-up to Malo’s 2015 review, sketching out our new 2016-2017 plans. Briefly, our top priorities (in decreasing order of importance) are to (1) make technical progress on the research problems we’ve identified, (2) expand our team, and (3) build stronger ties to the wider research community. As discussed in a previous blog post, the biggest update to our research plans is that we’ll be splitting our time going forward between our 2014 research agenda (the “agent foundations” agenda) and a new research agenda oriented toward machine learning work led by Jessica Taylor: “Alignment for Advanced Machine Learning Systems.” Three additional news items: 1. I’m happy to announce that MIRI has received support from a major new donor: entrepreneur and computational biologist Blake Borgeson, who has made a$300,000 donation to MIRI. This is the second-largest donation MIRI has received in its history, beaten only by Jed McCaleb’s 2013 cryptocurrency donation. As a result, we’ve been able to execute on our growth plans with more speed, confidence, and flexibility.

2. This year, instead of running separate summer and winter fundraisers, we’re merging them into one more ambitious fundraiser, which will take place in September.

3. I’m also pleased to announce that Abram Demski has accepted a position as a MIRI research fellow. Additionally, Ryan Carey has accepted a position as an assistant research fellow, and we’ve hired some new administrative staff.

I’ll provide more details on these and other new developments below.