Updates
- A new paper from MIRI researcher Vanessa Kosoy, presented at the ICLR SafeML workshop this week: "Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help."
- New research posts: Learning "Known" Information When the Information is Not Actually Known; Defeating Goodhart and the "Closest Unblocked Strategy" Problem; Reinforcement Learning with Imperceptible Rewards
- The Long-Term Future Fund has announced twenty-three new grant recommendations, and provided in-depth explanations of the grants. These include a $50,000 grant to MIRI, and grants to CFAR and Ought. LTFF is also recommending grants to several individuals with AI alignment research proposals whose work MIRI staff will be helping assess.
- We attended the Global Governance of AI Roundtable at the World Government Summit in Dubai.
News and links
- Rohin Shah reflects on the first year of the Alignment Newsletter.
- Some good recent AI alignment discussion: Alex Turner asks for the best reasons for pessimism about impact measures; Henrik Åslund and Ryan Carey discuss corrigibility as constrained optimization; Wei Dai asks about low-cost AGI coordination; and Chris Leong asks, "Would solving counterfactuals solve anthropics?"
- From DeepMind: Towards Robust and Verified AI: Specification Testing, Robust Training, and Formal Verification.
- Ilya Sutskever and Greg Brockman discuss OpenAI's new status as a "hybrid of a for-profit and nonprofit".
- Misconceptions about China and AI: Julia Galef interviews Helen Toner. (Excerpts.)