Want to be in the reference class “people who solve the AI alignment problem”?
We now have a guide on how to get started, based on our experience of what tends to make research groups successful. (Also on the AI Alignment Forum.)
- Demski and Garrabrant’s introduction to MIRI’s agent foundations research, “Embedded Agency,” is now available (in lightly edited form) as an arXiv paper.
- New research posts: How Does Gradient Descent Interact with Goodhart?; “Normative Assumptions” Need Not Be Complex; How the MtG Color Wheel Explains AI Safety; Pavlov Generalizes
- Several MIRIx groups are expanding and are looking for new members to join.
- Our summer fellows program is accepting applications through March 31.
- LessWrong’s web edition of Rationality: From AI to Zombies at lesswrong.com/rationality is now fully updated to reflect the print edition of Map and Territory and How to Actually Change Your Mind, the first two books. (Announcement here.)
News and links
- OpenAI’s GPT-2 model shows meaningful progress on a wide variety of language tasks. OpenAI adds:
Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights. […] We believe our release strategy limits the initial set of organizations who may choose to [open source our results], and gives the AI community more time to have a discussion about the implications of such systems.
- The Verge discusses OpenAI’s language model concerns along with MIRI’s disclosure policies for our own research. See other discussion by Jeremy Howard, John Seymour, and Ryan Lowe.
- AI Impacts summarizes evidence on good forecasting practices from the Good Judgment Project.
- Recent AI alignment ideas and discussion: Carey on quantilization; Filan on impact regularization methods; Saunders’ HCH Is Not Just Mechanical Turk and RL in the Iterated Amplification Framework; Dai on philosophical difficulty (1, 2); Hubinger on ascription universality; and Everitt’s Understanding Agent Incentives with Causal Influence Diagrams.
- The Open Philanthropy Project announces their largest grant to date: $55 million to launch the Center for Security and Emerging Technology, a Washington, D.C. think tank with an early focus on “the intersection of security and artificial intelligence”. See also CSET’s many jobpostings.