MIRI strategy conversation with Steinhardt, Karnofsky, and Amodei

 

 |   |  Conversations, MIRI Strategy

On October 27th, 2013, MIRI met with three additional members of the effective altruism community to discuss MIRI’s organizational strategy. The participants were:

We recorded and transcribed much of the conversation, and then edited and paraphrased the transcript for clarity, conciseness, and to protect the privacy of some content. The resulting edited transcript is available in full here (62 pages).

Our conversation located some disagreements between the participants; these disagreements are summarized below. This summary is not meant to present arguments with all their force, but rather to serve as a guide to the reader for locating more information about these disagreements. For each point, a page number has been provided for the approximate start of that topic of discussion in the transcript, along with a phrase that can be searched for in the text. In all cases, the participants would likely have quite a bit more to say on the topic if engaged in a discussion on that specific point.

 

Summary of disagreements

Page 7, starting at “the difficulty is with context changes”:

  • Jacob: Statistical approaches can be very robust and need not rely on strong assumptions, and logical approaches are unlikely to scale up to human-level AI.
  • Eliezer: FAI will have to rely on lawful probabilistic reasoning combined with a transparent utility function, rather than our observing that previously executed behaviors seemed ‘nice’ and trying to apply statistical guarantees directly to that series of surface observations.

Page 10, starting at “a nice concrete example”

  • Eliezer: Consider an AI that optimizes for the number of smiling faces rather than for human happiness, and thus tiles the universe with smiling faces. This example illustrates a class of failure modes that are worrying.
  • Jacob & Dario: This class of failure modes seems implausible to us.

Page 14, starting at “I think that as people want”:

  • Jacob: There isn’t a big difference between learning utility functions from a parameterized family vs. arbitrary utility functions.
  • Eliezer: Unless ‘parameterized’ is Turing complete it would be extremely hard to write down a set of parameters such that human ‘right thing to do’ or CEV or even human selfish desires were within the hypothesis space.

Page 16, starting at “Sure, but some concepts are”:

  • Jacob, Holden, & Dario: “Is Terry Schiavo a person” is a natural category.
  • Eliezer: “Is Terry Schiavo a person” is not a natural category.

Page 21, starting at “I would go between the two”:

  • Holden: Many of the most challenging problems relevant to FAI, if in fact they turn out to be relevant, will be best solved at a later stage of technological development, when we have more advanced “tool-style” AI (possibly including AGI) in order to assist us with addressing these problems.
  • Eliezer: Development may be faster and harder-to-control than we would like; by the time our tools are much better we might not have the time or ability to make progress before UFAI is an issue; and it’s not clear that we’ll be able to develop AIs that are extremely helpful for these problems while also being safe.

Page 24, starting at “I think the difference in your mental models”:

  • Jacob & Dario: An “oracle-like” question-answering system is relatively plausible.
  • Eliezer: An “oracle-like” question-answering system is really hard.

Page 24, starting at “I don’t know how to build”:

  • Jacob: Pre-human-level AIs will not have a huge impact on the development of subsequent AIs.
  • Eliezer: Building a very powerful AGI involves the AI carrying out goal-directed (consequentialist) internal optimization on itself.

Page 27, starting at “The Oracle AI makes a”:

  • Jacob & Dario: It should not be too hard to examine the internal state of an oracle AI.
  • Eliezer: While AI progress can be either pragmatically or theoretically driven, internal state of the program is often opaque to humans at first and rendered partially transparent only later.

Page 38, starting at “And do you believe that within having”:

  • Eliezer: I’ve observed that novices who try to develop FAI concepts don’t seem to be self-critical at all or ask themselves what could go wrong with their bright ideas.
  • Jacob & Holden: This is irrelevant to the question of whether academics are well-equipped to work on FAI, both because this is not the case in more well-developed fields of research, and because attacking one’s own ideas is not necessarily an integral part of the research process compared to other important skills.

Page 40, starting at “That might be true, but something”:

  • Holden: The major FAI-related characteristic that academics lack is cause neutrality. If we can get academics to work on FAI despite this, then we will have many good FAI researchers.
  • Eliezer: Many different things are going wrong in the individuals and in academia which add up to a near-total absence of attempted — let alone successful — FAI research.

Page 53, starting at “I think the best path is to try”:

  • Holden & Dario: It’s relatively easy to get people to rally (with useful action) behind safety issues.
  • Eliezer: No, it is hard.

Page 56, starting at “My response would be that’s the wrong thing”:

  • Jacob & Dario: How should we present problems to academics? An English-language description is sufficient; academics are trained to formalize problems once they understand them.
  • Eliezer: I treasure such miracles when somebody shows up who can perform them, but I don’t intend to rely on it and certainly don’t think it’s the default case for academia. Hence I think in terms of MIRI needing to crispify problems to the point of being 80% or 50% solved before they can really be farmed out anywhere.

This summary was produced by the following process: Jacob attempted a summary, and Eliezer felt that his viewpoint was poorly expressed on several points and wrote back with his proposed versions. Rather than try to find a summary both sides would be happy with, Jacob stuck with his original statements and included Eliezer’s responses mostly as-is, and Eliezer later edited them for clarity and conciseness. A Google Doc of the summary was then produced by Luke and shared with all participants, with Luke bringing up several points for clarification with each of the other participants. A couple points in the summary were also removed because it was difficult to find consensus about their phrasing. The summary was published once all participants were happy with the Google Doc.

  • Chris Warburton

    I agree that purely statistical methods can’t give us the confidence we would like in predicting as far ahead as we would like, there is also an important flaw in purely logical methods; namely that they’re chaotic. Flipping a single bit can cause most theories to ‘explode’ (become inconsistent and allow everything).
    There are some interesting logics which try to address this, eg. defeasible reasoning. Personally I think it would be interesting to develop a logic with some kind of ‘surprise’ or ‘convincingness’ factor, where propositions are less convincing if we can only prove them in a long, tenuous way. If we can make many ‘different’ short proofs (ie. tackling the problem in a different way, rather than just adding no-ops), the proposition is more convincing. This would be less chaotic and could prevent errors from dominating the system.

  • Andrei

    What’s this, a transparency policy?

  • ESRogs

    Trying to understand the format — are Eliezer’s later additions inserted into the flow of the discussion?

  • Jacob Steinhardt

    @ESRogs:
    Do you mean in the summary? I typed out most of the “Jacob/Dario/Holden” parts first, then Eliezer typed out his in response. But there was also some post-editing to make things fit together / clarify certain points.

  • Andrew

    Thanks for posting this. It was a pleasure to read a detailed back-and-forth conversation rather than a published paper, though published papers are great too.

    “Eliezer: I expect the work to be 10 percent of getting the meta-utility function right and 90 percent about getting the rest of the AI right so that the meta-utility function actually continues to work throughout self-improvement.”

    I don’t understand this expectation and I would love to see it defended. My guess is the opposite, that 90% of the problem is specifying humane values (or pointing to them) and 10% of the work is for everything else. There doesn’t seem to be much, if any progress happening on this either. Luke has mentioned that these philosophical issues might be better dealt with later and though that might be true, I don’t understand why MIRI-folk (seem to) think it’s easier than getting goal-stability and the other problems right.

As featured in:     Forbes   NY Times   Scientific American   SF Weekly   Technology Review