February 2021 Newsletter
- Abram Demski distinguishes different versions of the problem of “pointing at” human values in AI alignment.
- Evan Hubinger discusses “Risks from Learned Optimization” on the AI X-Risk Research Podcast.
- Eliezer Yudkowsky comments on AI safety via debate and Goodhart’s law.
- MIRI supporters donated ~$135k on Giving Tuesday, of which ~26% was matched by Facebook and ~28% by employers for a total of $207,436! MIRI also received $6,624 from TisBest Philanthropy in late December, largely through Round Two of Ray Dalio’s #RedefineGifting initiative. Our thanks to all of you!
- Spencer Greenberg discusses society and education with Anna Salamon and Duncan Sabien on the Clearer Thinking podcast.
- We Want MoR: Eliezer participates in a (spoiler-laden) discussion of Harry Potter and the Methods of Rationality.
News and links
- Richard Ngo reflects on his time in effective altruism:
[…] Until recently, I was relatively passive in making big decisions. Often that meant just picking the most high-prestige default option, rather than making a specific long-term plan. This also involved me thinking about EA from a “consumer” mindset rather than a “producer” mindset. When it seemed like something was missing, I used to wonder why the people responsible hadn’t done it; now I also ask why I haven’t done it, and consider taking responsibility myself.
Partly that’s just because I’ve now been involved in EA for longer. But I think I also used to overestimate how established and organised EA is. In fact, we’re an incredibly young movement, and we’re still making up a lot of things as we go along. That makes proactivity more important.
Another reason to value proactivity highly is that taking the most standard route to success is often overrated. […] My inspiration in this regard is a friend of mine who has, three times in a row, reached out to an organisation she wanted to work for and convinced them to create a new position for her.
- Ngo distinguishes claims about goal specification, orthogonality, instrumental convergence, value fragility, and Goodhart’s law based on whether they refer to systems at training time versus deployment time.
- Connor Leahy, author of The Hacker Learns to Trust, argues (among other things) that “GPT-3 is our last warning shot” for coordinating to address AGI alignment. (Podcast version.) I include this talk because it’s a good talk and the topic warrants discussion, though MIRI staff don’t necessarily endorse this claim — and Eliezer would certainly object to any claim that something is a fire alarm for AGI.
- OpenAI safety researchers including Dario Amodei, Paul Christiano, and Chris Olah depart OpenAI.
- OpenAI’s DALL-E uses GPT-3 for image generation, while CLIP exhibits impressive zero-shot image classification capabilities. Gwern Branwen comments in his newsletter.