New report: “Corrigibility”

 |   |  News

CorrigibilityToday we release a report describing a new problem area in Friendly AI research we call corrigibility. The report (PDF) is co-authored by MIRI’s Friendly AI research team (Eliezer Yudkowsky, Benja Fallenstein, Nate Soares) and also Stuart Armstrong from the Future of Humanity Institute at Oxford University.

The abstract reads:

As artificially intelligent systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system “corrigible” if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences. We introduce the notion of corrigibility and analyze utility functions that attempt to make an agent shut down safely if a shutdown button is pressed, while avoiding incentives to prevent the button from being pressed or cause the button to be pressed, and while ensuring propagation of the shutdown behavior as it creates new subsystems or self-modifies. While some proposals are interesting, none have yet been demonstrated to satisfy all of our intuitive desiderata, leaving this simple problem in corrigibility wide-open.

AGI outcomes and civilizational competence

 |   |  Analysis

David Victor

The [latest IPCC] report says, “If you put into place all these technologies and international agreements, we could still stop warming at [just] 2 degrees.” My own assessment is that the kinds of actions you’d need to do that are so heroic that we’re not going to see them on this planet.

—David Victor,1 professor of international relations at UCSD

 

A while back I attended a meeting of “movers and shakers” from science, technology, finance, and politics. We were discussing our favorite Big Ideas for improving the world. One person’s Big Idea was to copy best practices between nations. For example when it’s shown that nations can dramatically improve organ donation rates by using opt-out rather than opt-in programs, other countries should just copy that solution.

Everyone thought this was a boring suggestion, because it was obviously a good idea, and there was no debate to be had. Of course, they all agreed it was also impossible and could never be established as standard-practice. So we moved on to another Big Idea that was more tractable.

Later, at a meeting with a similar group of people, I told some economists that their recommendations on a certain issue were “straightforward econ 101,” and I didn’t have any objections to share. Instead, I asked, “But how can we get policy-makers to implement econ 101 solutions?” The economists laughed and said, “Well, yeah, we have no idea. We probably can’t.”

How do I put this? This is not a civilization that should be playing with self-improving AGIs.2

The backhoe is a powerful, labor-saving invention, but I wouldn’t put a two-year-old in the driver’s seat. That’s roughly how I feel about letting 21st century humans wield something as powerful as self-improving AGI.3 I wish we had more time to grow up first. I think the kind of actions we’d need to handle self-improving AGI successfully “are so heroic that we’re not going to see them on this planet,” at least not anytime soon.4

But I suspect we won’t all resist the temptation to build AGI for long, and neither do most top AI scientists. ((See the AI timeline predictions for the TOP100 poll in Müller & Bostrom (2014). The authors asked a sample of the top-cited living AI scientists: “For the purposes of this question, assume that human scientific activity continues without major negative disruption. By what year would you see a (10% / 50% / 90%) probability for [an AGI] to exist?” The median reply for each confidence level was 2024, 2050, and 2070, respectively. Read more »


  1. Quote taken from the Radiolab episode titled “In the Dust of This Planet.” 
  2. In Superintelligence, Bostrom made the point this way (p. 259):

    Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct… For a child with an undetonated bomb in its hands, a sensible thing to do would be to put it down gently, quickly back out of the room, and contact the nearest adult. Yet what we have here is not one child but many, each with access to an independent trigger mechanism. The chances that we will all find the sense to put down the dangerous stuff seem almost negligible… Nor can we attain safety by running away, for the blast of an intelligence explosion would bring down the entire firmament. Nor is there a grown-up in sight.

     

  3. By “AGI” I mean a computer system that could pass something like Nilsson’s employment test (see What is AGI?). By “self-improving AGI” I mean an AGI that improves its own capabilities via its own original computer science and robotics research (and not solely by, say, gathering more data about the world or acquiring more computational resources). By “its own capabilities” I mean to include the capabilities of successor systems that the AGI itself creates to further its goals. In this article I typically mean “AGI” and “self-improving AGI” interchangeably, not because all AGIs will necessarily be self-improving in a strong sense, but because I expect that even if the first AGIs are not self-improving for some reason, self-improving AGIs will follow in a matter of decades if not sooner. From a cosmological perspective, such a delay is but a blink. 
  4. I purposely haven’t pinned down exactly what about our civilization seems inadequate to meet the challenge of AGI control; David Victor made the same choice when he made his comment about civilizational competence in the face of climate change. I think our civilizational competence is insufficient for the challenge for many reasons, but I also have varying degrees of uncertainty about each those reasons and which parts of the problem they apply to, and those details are difficult to express. 

MIRI’s October Newsletter

 |   |  Newsletters

 

 

Machine Intelligence Research Institute

Research Updates

  • Our major project last month was our Friendly AI technical agenda overview and supporting papers, the former of which is now in late draft form but not yet ready for release.
  • 4 new expert interviews, including John Fox on AI safety.
  • MIRI research fellow Nate Soares has begun to explain some of the ideas motivating MIRI’s current research agenda at his blog. See especially Newcomblike problems are the norm.

News Updates

As always, please don’t hesitate to let us know if you have any questions or comments.

Best,
Luke Muehlhauser
Executive Director

 

 

Kristinn Thórisson on constructivist AI

 |   |  Conversations

krisDr. Kristinn R. Thórisson is an Icelandic Artificial Intelligence researcher, founder of the Icelandic Institute for Intelligent Machines (IIIM) and co-founder and former co-director of CADIA: Center for Analysis and Design of Intelligent Agents. Thórisson is one of the leading proponents of artificial intelligence systems integration. Other proponents of this approach are researchers such as Marvin Minsky, Aaron Sloman and Michael A. Arbib. Thórisson is a proponent of Artificial General Intelligence (AGI) (also referred to as Strong AI) and has proposed a new methodology for achieving artificial general intelligence. A demonstration of this constructivist AI methodology has been given in the FP-7 funded HUMANOBS project HUMANOBS project, where an artificial system autonomously learned how to do spoken multimodal interviews by observing humans participate in a TV-style interview. The system, called AERA, autonomously expands its capabilities through self-reconfiguration. Thórisson has also worked extensively on systems integration for artificial intelligence systems in the past, contributing architectural principles for infusing dialogue and human-interaction capabilities into the Honda ASIMO robot.

Kristinn R. Thórisson is currently managing director for the Icelandic Institute for Intelligent Machines and an associate professor at the School of Computer Science at Reykjavik University. He was co-founder of semantic web startup company Radar Networks, and served as its Chief Technology Officer 2002-2003.

 

Luke Muehlhauser: In some recent articles (1, 2, 3) you contrast “constructionist” and “constructivist” approaches in AI. Constructionist AI builds systems piece by piece, by hand, whereas constructivist AI builds and grows systems largely by automated methods.

Constructivist AI seems like a more general form of the earlier concept of “seed AI.” How do you see the relation between the two concepts?


Kristinn Thorisson: We sometimes use “seed AI”, or even “developmental AI”, when we describe what we are doing – it is often a difficult task to find a good term for an interdisciplinary research program, because each term will bring various things up in the mind of people depending on their background. There are subtle differences between both the meanings and histories of these terms that each bring along several pros and cons for each one.

I had been working on integrated constructionist systems for close to two decades, where the main focus was on how to integrate many things into a coherent system. When my collaborators and I started to seriously think about how to achieve artificial general intelligence we tired to explain, among other things, how transversal functions – functions of mind that seem to touch pretty much everything in a mind, such as attention, reasoning, and learning – could efficiently and sensibly be implemented in a single AI system. We also looked deeper into autonomy than I had done previously. This brought up all sorts of questions that were new to me, like: What is needed for implementing a system that can act relatively autonomously *after it leaves the lab*, without the constant intervention of its designers, and is capable of learning a pretty broad range of relatively unrelated things, on its own, and deal with new tasks, scenarios and environments – that were relatively unforeseen by the system’s designers? Read more »

Nate Soares speaking at Purdue University

 |   |  News

Fowler Hall, Purdue UniversityOn Thursday, September 18th Purdue University is hosting the seminar Dawn or Doom: The New Technology Explosion. Speakers include James Barrat, author of Our Final Invention, and MIRI research fellow Nate Soares.

Nate’s talk title and abstract are:

Why ain’t you rich?: Why our current understanding of “rational choice” isn’t good enough for superintelligence.

The fate of humanity could one day depend upon the choices of a superintelligent AI. How will those choices be made? Philosophers have long attempted to define what it means to make rational decisions, but in the context of machine intelligence, these theories turn out to have undesirable consequences.

For example, there are many games where modern decision theories lose systematically. New decision procedures are necessary in order to fully capture an idealization of the way we make decisions.

Furthermore, existing decision theories are not stable under reflection: a self-improving machine intelligence using a modern decision theory would tend to modify itself to use a different decision theory instead. It is not yet clear what sort of decision process it would end up using, nor whether the end result would be desirable. This indicates that our understanding of decision theories is inadequate for the construction of a superintelligence.

Can we find a formal theory of “rationality” that we would want a superintelligence to use? This talk will introduce the concepts above in more detail, discuss some recent progress in the design of decision theories, and then give a brief overview of a few open problems.

For details on how to attend Nate’s talk and others, see here.

Ken Hayworth on brain emulation prospects

 |   |  Conversations

Kenneth Hayworth portraitKenneth Hayworth is president of the Brain Preservation Foundation (BPF), an organization formed to skeptically evaluate cryonic and other potential human preservation technologies by examining how well they preserve the brain’s neural circuitry at the nanometer scale. Hayworth is also a Senior Scientist at the HHMI’s Janelia Farm Research Campus where he is currently researching ways to extend Focused Ion Beam Scanning Electron Microscopy (FIBSEM) of brain tissue to encompass much larger volumes than are currently possible. Hayworth is co-inventor of the ATUM-SEM process for high-throughput volume imaging of neural circuits at the nanometer scale and he designed and built several automated machines to implement this process. Hayworth received his PhD in Neuroscience from the University of Southern California for research into how the human visual system encodes spatial relations among objects. Hayworth is a vocal advocate for brain preservation and mind uploading and, through the BPF’s Brain Preservation Prize, he has challenged scientists and medical researchers to develop a reliable, scientifically verified surgical procedure which can demonstrate long-term ultrastructure preservation across an entire human brain. Once won, Hayworth advocates for the widespread implementation of such a surgical procedure in hospitals. Several research labs are currently attempting to win this prize.

 

Luke Muehlhauser: One interesting feature of your own thinking (Hayworth 2012) about whole brain emulation (WBE) is that you are more concerned with modeling high-level cognitive functions accurately than is e.g. Sandberg (2013). Whereas Sandberg expects WBE will be achieved by modeling low-level brain function in exact detail (at the level of scale separation, wherever that is), you instead lean heavily on modeling higher-level cognitive processes using a cognitive architecture called ACT-R. Is that because you think this will be easier than Sandberg’s approach, or for some other reason?


Kenneth Hayworth: I think the key distinction is that philosophers are focused on whether mind uploading (a term I prefer to WBE) is possible in principle, and, to a lesser extent, on whether it is of such technical difficulty as to put its achievement off so far into the future that its possibility can be safely ignored for today’s planning. With these motivations, philosophers tend to gravitate toward arguments with the fewest possible assumptions, i.e. modeling low-level brain functions in exact detail.

As a practicing cognitive and neuroscientist I have fundamentally different motivations. From my training, I am already totally convinced that the functioning of the brain can be understood at a fully mechanistic level, with sufficient precision to allow for mind uploading. I just want to work toward making mind uploading happen in reality. To do this I need to start with an understanding of the requirements, not based on the fewest assumptions, but instead based on the field’s current best theories. Read more »

Friendly AI Research Help from MIRI

 |   |  News

Earlier this year, a student told us he was writing an honors thesis on logical decisions theories such as TDT and UDT — one of MIRI’s core research areas. Our reply was “Why didn’t you tell us this earlier? When can we fly you to Berkeley to help you with it?”

So we flew Danny Hintze to Berkeley and he spent a couple days with Eliezer Yudkowsky to clarify the ideas for the thesis. Then Danny went home and wrote what is probably the best current introduction to logical decision theories.

Inspired by this success, today we are launching the Friendly AI Research Help program, which encourages students of mathematics, computer science, and formal philosophy to collaborate and consult with our researchers to help steer and inform their work.

Apply for research help here.

thesishelpheadersmall

As featured in:     Business Week   CQ   MSNBC   Technology Review   New Statesman