Recent AI control brainstorming by Stuart Armstrong

 |   |  News

Oxford_Stuart-ArmstrongMIRI recently sponsored Oxford researcher Stuart Armstrong to take a solitary retreat and brainstorm new ideas for AI control. This brainstorming generated 16 new control ideas, of varying usefulness and polish. During the past month, he has described each new idea, and linked those descriptions from his index post: New(ish) AI control ideas.

He also named each AI control idea, and then drew a picture to represent (very roughly) how the new ideas related to each other. In the picture below, an arrow Y→X can mean “X depends on Y”, “Y is useful for X”, “X complements Y on this problem” or “Y inspires X.” The underlined ideas are the ones Stuart currently judges to be most important or developed.

Newish AI control ideas

Previously, Stuart developed the AI control idea of utility indifference, which plays a role in MIRI’s paper Corrigibility (Stuart is a co-author). He also developed anthropic decision theory and some ideas for reduced impact AI and oracle AI. He has contributed to the strategy and forecasting challenges of ensuring good outcomes from advanced AI, e.g. in Racing to the Precipice and How We’re Predicting AI — or Failing To. MIRI previously contracted him to write a short book introducing the superintelligence control challenge to a popular audience, Smarter Than Us.

  • Peter Morgan

    Doesn’t this depend upon the architecture used? E.g., Godel Machine, AIXI, AERA, and many others?

    • Luke Muehlhauser

      We assume the first AGIs won’t use any currently-existing architectures. MIRI’s work, and Stuart’s recent AI control brainstorming work, is more about getting conceptual clarity on the constraints we want in an AGI. Implementation of those concepts in a particular architecture will have to wait until feasible AGI architectures exist.