One of our biggest updates, I'm happy to announce, is that we've hired five new research staff, with a sixth to join us in February. For details, see Workshops and Scaling Up in the fundraiser post.
Also: Facebook's Giving Tuesday matching opportunity is tomorrow at 5:00am PT! See Colm's post for details on how to get your donation matched.
- Our most recent hire, “Risks from Learned Optimization” co-author Evan Hubinger, describes what he'll be doing at MIRI. See also Nate Soares' comment on how MIRI does nondisclosure-by-default.
- Buck Shlegeris discusses EA residencies as an outreach opportunity.
- OpenAI releases Safety Gym, a set of tools and environments for incorporating safety constraints into RL tasks.
- CHAI is seeking interns; application deadline Dec. 15.
Thoughts from the research team
This month, I'm trying something new: quoting MIRI researchers' summaries and thoughts on recent AI safety write-ups.
I've left out names so that these can be read as a snapshot of people's impressions, rather than a definitive “Ah, researcher X believes Y!” Just keep in mind that these will be a small slice of thoughts from staff I've recently spoken to, not anything remotely like a consensus take.
- Re Will transparency help catch deception? — “A good discussion of an important topic. Matthew Barnett suggests that any weaknesses in a transparency tool may turn it into a detrimental middle-man, and directly training supervisors to catch deception may be preferable.”
- Re Chris Olah’s views on AGI safety — “I very much agree with Evan Hubinger's idea that collecting different perspectives — different ‘hats’ — is a useful thing to do. Chris Olah's take on transparency is good to see. The concept of microscope AI seems like a useful one, and Olah's vision of how the ML field could be usefully shifted is quite interesting.”
- Re Defining AI Wireheading — “Stuart Armstrong takes a shot at making a principled distinction between wireheading and the rest of Goodhart.”
- Re How common is it to have a 3+ year lead? — “This seems like a pretty interesting question for AI progress models. The expected lead time and questions of expected takeoff speed greatly influence the extent to which winner-take-all dynamics are plausible.”
- Re Thoughts on Implementing Corrigible Robust Alignment — “Steve Byrnes provides a decent overview of some issues around getting ‘pointer’ type values.”