From now through the end of December, MIRI's 2019 Fundraiser is live! See our fundraiser post for updates on our past year and future plans.
One of our biggest updates, I'm happy to announce, is that we've hired five new research staff, with a sixth to join us in February. For details, see Workshops and Scaling Up in the fundraiser post.
Also: Facebook's Giving Tuesday matching opportunity is tomorrow at 5:00am PT! See Colm's post for details on how to get your donation matched.
- Our most recent hire, “Risks from Learned Optimization” co-author Evan Hubinger, describes what he'll be doing at MIRI. See also Nate Soares' comment on how MIRI does nondisclosure-by-default.
- Buck Shlegeris discusses EA residencies as an outreach opportunity.
- OpenAI releases Safety Gym, a set of tools and environments for incorporating safety constraints into RL tasks.
- CHAI is seeking interns; application deadline Dec. 15.
Thoughts from the research team
This month, I'm trying something new: quoting MIRI researchers' summaries and thoughts on recent AI safety write-ups.
I've left out names so that these can be read as a snapshot of people's impressions, rather than a definitive “Ah, researcher X believes Y!” Just keep in mind that these will be a small slice of thoughts from staff I've recently spoken to, not anything remotely like a consensus take.
- Re Will transparency help catch deception? — “A good discussion of an important topic. Matthew Barnett suggests that any weaknesses in a transparency tool may turn it into a detrimental middle-man, and directly training supervisors to catch deception may be preferable.”
- Re Chris Olah’s views on AGI safety — “I very much agree with Evan Hubinger's idea that collecting different perspectives — different ‘hats’ — is a useful thing to do. Chris Olah's take on transparency is good to see. The concept of microscope AI seems like a useful one, and Olah's vision of how the ML field could be usefully shifted is quite interesting.”
- Re Defining AI Wireheading — “Stuart Armstrong takes a shot at making a principled distinction between wireheading and the rest of Goodhart.”
- Re How common is it to have a 3+ year lead? — “This seems like a pretty interesting question for AI progress models. The expected lead time and questions of expected takeoff speed greatly influence the extent to which winner-take-all dynamics are plausible.”
- Re Thoughts on Implementing Corrigible Robust Alignment — “Steve Byrnes provides a decent overview of some issues around getting ‘pointer’ type values.”