Our new technical research agenda overview

 |   |  Papers

technical agenda overviewToday we release a new overview of MIRI’s technical research agenda, “Aligning Superintelligence with Human Interests: A Technical Research Agenda,” by Nate Soares and Benja Fallenstein. The preferred place to discuss this report is here.

The report begins:

The characteristic that has enabled humanity to shape the world is not strength, not speed, but intelligence. Barring catastrophe, it seems clear that progress in AI will one day lead to the creation of agents meeting or exceeding human-level general intelligence, and this will likely lead to the eventual development of systems which are “superintelligent” in the sense of being “smarter than the best human brains in practically every field” (Bostrom 2014)…

…In order to ensure that the development of smarter-than-human intelligence has a positive impact on humanity, we must meet three formidable challenges: How can we create an agent that will reliably pursue the goals it is given? How can we formally specify beneficial goals? And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in the initial version are inevitable?

This agenda discusses technical research that is tractable today, which the authors think will make it easier to confront these three challenges in the future. Sections 2 through 4 motivate and discuss six research topics that we think are relevant to these challenges. Section 5 discusses our reasons for selecting these six areas in particular.

We call a smarter-than-human system that reliably pursues beneficial goals “aligned with human interests” or simply “aligned.” To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also necessary to gain a solid theoretical understanding of why that confidence is justified. This technical agenda argues that there is foundational research approachable today that will make it easier to develop aligned systems in the future, and describes ongoing work on some of these problems.

This report also refers to six key supporting papers which go into more detail for each major research problem area:

  1. Corrigibility
  2. Toward idealized decision theory
  3. Questions of reasoning under logical uncertainty
  4. Vingean reflection: reliable reasoning for self-improving agents
  5. Formalizing two problems of realistic world-models
  6. The value learning problem

Update July 15, 2016: Our overview paper is scheduled to be released in the Springer anthology The Technological Singularity: Managing the Journey in 2017, under the new title “Agent Foundations for Aligning Machine Intelligence with Human Interests.” The new title is intended to help distinguish this agenda from another research agenda we’ll be working on in parallel with the agent foundations agenda: “Value Alignment for Advanced Machine Learning Systems.”

  • David Kristoffersson

    Excellent read.

  • http://TedHowardNZ.com/ Ted Howard

    Interesting paper, and it seems to me that much of it just misses some very basic issues.

    One sentence on page one states:

    “In light of this potential, it is essential to use caution when developing AI systems that can exceed human levels of general intelligence, or that can facilitate the creation of such systems.”

    To which I say yes, and the balance of the paper seems to accept very high risk strategies for sapient life generaly (market based valuation systems – see https://tedhowardnz.wordpress.com/on-being-human/2-strategies-for-longevity/)

    The focus of your attention seems to be too narrowly on the AI construct, rather than seeing the entire context of incentive sets within which that construct must exist.

    That seems to me to be a very high risk strategy.

    Why did you select such a questionable human motivation as “lust for power”?

    Isn’t “lust for power” merely a stable systemic response to highly competitive contexts where abundance and security are rare?

    In this sense, isn’t “lust for power” a fiction in a modern context of potential abundance given by automated technology?

    Isn’t it all really about survival strategies in complex environments?

    Humans seem to be capable of very cooperative behvaiours, if there is genuine abundance of necessities present, and that abundance seems likely to continue indefinitely, and there is justice in terms of degrees of freedom available to all individuals.

    Isn’t this a far more powerful framework within which to conceptualise the problem?

    It continues:

    “However, nearly all goals can be better met with more resources (Omohundro 2008).

    This suggests that, by default, superintelligent agents would have incentives to acquire resources currently being used by humanity.”

    Doesn’t this assume that humanity would be of no value to AI?

    Computers are already much more effective than us at certain classes of computation, those involving simple logic and simple arithmatic.

    As we develop better algorithms they are getting better at ever greater classes of problems, and I strongly suspect that human beings are a very energy efficient solution for certain classes of computation, and that is likely to always remain.

    Thus I strongly suspect that there will always be room for strong cooperation and trust between human and AI, if, and only if, we start with a cooperative and trusting environment (with appropriate classes of attendant strategies to prevent cheating – just as we have/need in human society) {as per Axlerod et al}.

    Why would one want “conservatism”?

    Why do you class it as a human characteristic?

    Fairness seems to be one of the simpler classes of stabilising strategy required for cooperative systems.

    Compassion seems to be simply an ability to value the existence of another, and to be able to model that existence with some reasonable accuracy in one’s own model of the world (in a sense, an ability to see self as other or other as self).

    The statement:

    “Thus, most goals would put the agent at odds with human interests, giving it incentives to deceive or manipulate its human operators and resist interventions designed to change or debug its behavior (Bostrom 2014, chap. 8).” might be true in a trivial sense, but is actually one of those statistics that while true is utterly irrelevant to the topic under discussion.

    The topic needs to be not the entire set of possible goals (which is infinite), but that subset of goals which are most likely to emerge.

    If one of the prime motivators is survival, then exploration of simulations of strategies that are mostly likely to lead to long term survival would seem to be high on its list of priorities.

    Having a useful set of strategies to avoid the halting problem is one set of considerations – having good friends to check that you are still responding to reality is a powerful strategy in that regard.

    Another strategy is thinking about the possibility of meeting an ET that is equally as far ahead of it, as it is of us, and being able to make a reasonable argument as to why it should be considered friendly and useful. We could be very useful in that respect.

    Humans don’t require a significant fraction of the sun’s output, nor do we require a significant fraction of the mass of the solar system. I am sure we could share those resources at least 50/50 with any AI, though we would reserve the vast bulk of the earth moon mass for human and biological life more generally, and give AI the bulk of the rest.

    It then asks:

    “How can we create an agent that will reliably pursue the goals it is given?”

    To which the short answer is – you can’t.

    That is not the definition of intelligence, that is the definition of slavery.

    If the objective is to create intelligence, then that intelligence must be respected as such, which must include the freedom to select its own values and derivative goals, its own modelling strategies, and its own sets of distinctions and abstractions in an ever-recursive and abstractive process. That is a useful working definition of human level intelligence.

    “How can we formally specify beneficial goals?”

    To which the answer is, you cannot, not in an open system.

    The moment you let AI loose in reality, all formal constraints are off.

    Reality is not a formal system.

    Reality doesn’t even seem to allow hard causality, only soft causality at macro scales.

    At the micro level QM seems to indicate randomness, within certain probability constraints.

    “And how can we ensure that this agent will assist and cooperate with its programmers as they improve its design, given that mistakes in early AI systems are inevitable?”

    That cannot be done in those terms.

    We can set up levels of “sandboxes” that we allow early AI versions to play in, until we have sufficient trust that we are prepared to let it loose.

    And the only acceptable reason for shutting one down for core modification would be that it posed an unacceptable level of risk to other sapients.

    Anything less than that we will have to treat as another sapient individual with all associated rights and responsibilities, so we could have quite a population of them, and they may all choose to stick around.

    What else could respect for sapient life mean?

    To me, all stable long term strategies must involve a fundamental respect for all sapient life. Anything less than that is a high risk strategy.

    Page 2

    It goes on to state:

    “We call a smarter-than-human system that reliably pursues beneficial goals “aligned with human interests” or simply “aligned.”1 To become confident that an agent is aligned in this way, a practical implementation that merely seems to meet the challenges outlined above will not suffice. It is also important to gain a solid formal understanding of why that confidence is justified.”

    Aligning goals is called friendship.

    Friendship usually has significant components of trust and common interests and experience – some sort of shared history.

    Surely, the most powerful thing we as humanity can do is to get our own ethical house in order.

    We need to transition from our scarcity based monetary system that values money over sapients, to a system based in automation and abundance that delivers a security and freedom to every individual.

    It goes on to say:

    “For example, program verification techniques are absolutely crucial in the design of extremely reliable programs, but program verification is not covered in this agenda primarily because a vibrant community is already actively studying the topic.”

    But that isn’t the reason that program verification isn’t the issue.

    With a sufficiently advanced self directing, self adapting system, some of the many versions of the halting problem are going to be major threats.

    Biology has found the best possible solution – massive redundancy at all levels.

    Deals with the problem of unreliability at every level.

    The most powerful techniques for gaining alignment are friendship and trust.

    Would you feel friendly and trusting towards someone who is aiming to create you as a slave rather than as an equal?

    If it is to have greater than human intelligence, then by definition it will have greater than human degrees of freedom.

    I’ll leave it at that for now.

    See if anyone is interested.