Thoughts on Human Models

 |   |  Analysis

This is a joint post by MIRI Research Associate and DeepMind Research Scientist Ramana Kumar and MIRI Research Fellow Scott Garrabrant, cross-posted from the AI Alignment Forum and LessWrong.

Human values and preferences are hard to specify, especially in complex domains. Accordingly, much AGI safety research has focused on approaches to AGI design that refer to human values and preferences indirectly, by learning a model that is grounded in expressions of human values (via stated preferences, observed behaviour, approval, etc.) and/or real-world processes that generate expressions of those values. There are additionally approaches aimed at modelling or imitating other aspects of human cognition or behaviour without an explicit aim of capturing human preferences (but usually in service of ultimately satisfying them). Let us refer to all these models as human models.

In this post, we discuss several reasons to be cautious about AGI designs that use human models. We suggest that the AGI safety research community put more effort into developing approaches that work well in the absence of human models, alongside the approaches that rely on human models. This would be a significant addition to the current safety research landscape, especially if we focus on working out and trying concrete approaches as opposed to developing theory. We also acknowledge various reasons why avoiding human models seems difficult.


Problems with Human Models

To be clear about human models, we draw a rough distinction between our actual preferences (which may not be fully accessible to us) and procedures for evaluating our preferences. The first thing, actual preferences, is what humans actually want upon reflection. Satisfying our actual preferences is a win. The second thing, procedures for evaluating preferences, refers to various proxies for our actual preferences such as our approval, or what looks good to us (with necessarily limited information or time for thinking). Human models are in the second category; consider, as an example, a highly accurate ML model of human yes/no approval on the set of descriptions of outcomes. Our first concern, described below, is about overfitting to human approval and thereby breaking its connection to our actual preferences. (This is a case of Goodhart’s law.)

Read more »

Our 2018 Fundraiser Review

 |   |  News

Our 2018 Fundraiser ended on December 31 with the five week campaign raising $951,8171 from 348 donors to help advance MIRI’s mission. We surpassed our Mainline Target ($500k) and made it more than halfway again to our Accelerated Growth Target ($1.2M). We’re grateful to all of you who supported us. Thank you!


Fundraiser Concluded

348 donors contributed

With cryptocurrency prices significantly lower than during our 2017 fundraiser, we received less of our funding (~6%) from holders of cryptocurrency this time around. Despite this, our fundraiser was a success, in significant part thanks to the leverage gained by MIRI supporters’ participation in multiple matching campaigns during the fundraiser, including WeTrust Spring’s Ethereum-matching campaign, Facebook’s Giving Tuesday event and professional poker player Dan Smith’s Double Up Drive, expertly administered by Raising for Effective Giving.

Read more »

January 2019 Newsletter

 |   |  Newsletters

December 2018 Newsletter

 |   |  Newsletters

Announcing a new edition of “Rationality: From AI to Zombies”

 |   |  News

MIRI is putting out a new edition of Rationality: From AI to Zombies, including the first set of R:AZ print books! Map and Territory (volume 1) and How to Actually Change Your Mind (volume 2) are out today!


Map and Territory                   How to Actually Change Your Mind


  • Map and Territory is:
  • $6.50 on Amazon, for the print version.
  • Pay-what-you-on Gumroad, for PDF, EPUB, and MOBI versions.
  • How to Actually Change Your Mind is:
  • $8 on Amazon, for the print version.
  • Pay-what-you-on Gumroad, for PDF, EPUB, and MOBI versions (available in the next day).

Read more »

2017 in review

 |   |  MIRI Strategy

This post reviews MIRI’s activities in 2017, including research, recruiting, exposition, and fundraising activities.

2017 was a big transitional year for MIRI, as we took on new research projects that have a much greater reliance on hands-on programming work and experimentation. We’ve continued these projects in 2018, and they’re described more in our 2018 update. This meant a major focus on laying groundwork for much faster growth than we’ve had in the past, including setting up infrastructure and changing how we recruit to reach out to more people with engineering backgrounds.

Read more »

MIRI’s newest recruit: Edward Kmett!

 |   |  News

Prolific Haskell developer Edward Kmett has joined the MIRI team!

Edward is perhaps best known for popularizing the use of lenses for functional programming. Lenses are a tool that provides a compositional vocabulary for accessing parts of larger structures and describing what you want to do with those parts.

Beyond the lens library, Edward maintains a significant chunk of all libraries around the Haskell core libraries, covering everything from automatic differentiation (used heavily in deep learning, computer vision, and financial risk) to category theory (biased heavily towards organizing software) to graphics, SAT bindings, RCU schemes, tools for writing compilers, and more.

Initial support for Edward joining MIRI is coming in the form of funding from long-time MIRI donor Jaan Tallinn. Increased donor enthusiasm has put MIRI in a great position to take on more engineers in general, and to consider highly competitive salaries for top-of-their-field engineers like Edward who are interested in working with us.

At MIRI, Edward is splitting his time between helping us grow our research team and diving in on a line of research he’s been independently developing in the background for some time: building a new language and infrastructure to make it easier for people to write highly complex computer programs with known desirable properties. While we are big fans of his work, Edward’s research is independent of the directions we described in our 2018 Update, and we don’t consider it part of our core research focus.

We’re hugely excited to have Edward at MIRI. We expect to learn and gain a lot from our interactions, and we also hope that having Edward on the team will let him and other MIRI staff steal each other’s best problem-solving heuristics and converge on research directions over time.

As described in our recent update, our new lines of research are heavy on the mix of theoretical rigor and hands-on engineering that Edward and the functional programming community are well-known for:

In common between all our new approaches is a focus on using high-level theoretical abstractions to enable coherent reasoning about the systems we build. A concrete implication of this is that we write lots of our code in Haskell, and are often thinking about our code through the lens of type theory.

MIRI’s nonprofit mission is to ensure that smarter-than-human AI systems, once developed, have a positive impact on the world. And we want to actually succeed in that goal, not just go through the motions of working on the problem.

Our current model of the challenges involved says that the central sticking point for future engineers will likely be that the building blocks of AI just aren’t sufficiently transparent. We think that someone, somewhere, needs to develop some new foundations and deep theory/insights, above and beyond what’s likely to arise from refining or scaling up currently standard techniques.

We think that the skillset of functional programmers tends to be particularly well-suited to this kind of work, and we believe that our new research areas can absorb a large number of programmers and computer scientists. So we want this hiring announcement to double as a hiring pitch: consider joining our research effort!

To learn more about what it’s like to work at MIRI and what kinds of candidates we’re looking for, see our last big post, or shoot MIRI researcher Buck Shlegeris an email.

November 2018 Newsletter

 |   |  Newsletters