March 2019 Newsletter

Want to be in the reference class “people who solve the AI alignment problem”?

We now have a guide on how to get started, based on our experience of what tends to make research groups successful. (Also on the AI Alignment Forum.)

Other updates

Demski and Garrabrant’s introduction to MIRI’s agent foundations research, “Embedded Agency,” is now available (in lightly edited form) as an arXiv paper.
New research posts: How Does Gradient Descent Interact with Goodhart?; “Normative Assumptions” Need Not Be Complex; How the MtG Color Wheel Explains AI Safety; Pavlov Generalizes
Several MIRIx groups are expanding and are looking for new members to join.
Our summer fellows program is accepting applications through March 31.
LessWrong’s web edition of Rationality: From AI to Zombies at lesswrong.com/rationality is now fully updated to reflect the print edition of Map and Territory and How to Actually Change Your Mind, the first two books. (Announcement here.)

News and links

OpenAI’s GPT-2 model shows meaningful progress on a wide variety of language tasks. OpenAI adds:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights. […] We believe our release strategy limits the initial set of organizations who may choose to [open source our results], and gives the AI community more time to have a discussion about the implications of such systems.
The Verge discusses OpenAI’s language model concerns along with MIRI’s disclosure policies for our own research. See other discussion by Jeremy Howard, John Seymour, and Ryan Lowe.
AI Impacts summarizes evidence on good forecasting practices from the Good Judgment Project.
Recent AI alignment ideas and discussion: Carey on quantilization; Filan on impact regularization methods; Saunders’ HCH Is Not Just Mechanical Turk and RL in the Iterated Amplification Framework; Dai on philosophical difficulty (1, 2); Hubinger on ascription universality; and Everitt’s Understanding Agent Incentives with Causal Influence Diagrams.
The Open Philanthropy Project announces their largest grant to date: $55 million to launch the Center for Security and Emerging Technology, a Washington, D.C. think tank with an early focus on “the intersection of security and artificial intelligence”. See also CSET’s many jobpostings.

Browse

March 2019 Newsletter

Other updates

News and links

Categories