In other news, MIRI researcher Buck Shlegeris has written over 12,000 words on a variety of MIRI-relevant topics in an EA Forum AMA. (Example topics: advice for software engineers; what alignment plans tend to look like; and decision theory.)
Other updates
- Abram Demski's The Parable of Predict-O-Matic is a great read: the predictor/optimizer issues it covers are deep, but I expect a fairly wide range of readers to enjoy it and get something out of it.
- Evan Hubinger's Gradient Hacking describes an important failure mode that hadn't previously been articulated.
- Vanessa Kosoy's LessWrong shortform has recently discussed some especially interesting topics related to her learning-theoretic agenda.
- Stuart Armstrong's All I Know Is Goodhart constitutes nice conceptual progress on expected value maximizers that are aware of Goodhart's law and trying to avoid it.
- Reddy, Dragan, and Levine's paper on modeling human intent cites (of all things) Harry Potter and the Methods of Rationality as inspiration.
News and links
- Artificial Intelligence Research Needs Responsible Publication Norms: Crootof provides a good review of the issue on Lawfare.
- Stuart Russell's new book is out: Human Compatible: Artificial Intelligence and the Problem of Control (excerpt). Rohin Shah's review does an excellent job of contextualizing Russell's views within the larger AI safety ecosystem, and Rohin highlights the quote:
The task is, fortunately, not the following: given a machine that possesses a high degree of intelligence, work out how to control it. If that were the task, we would be toast. A machine viewed as a black box, a fait accompli, might as well have arrived from outer space. And our chances of controlling a superintelligent entity from outer space are roughly zero. Similar arguments apply to methods of creating AI systems that guarantee we won’t understand how they work; these methods include whole-brain emulation — creating souped-up electronic copies of human brains — as well as methods based on simulated evolution of programs. I won’t say more about these proposals because they are so obviously a bad idea.
- Jacob Steinhardt releases an AI Alignment Research Overview.
- Patrick LaVictoire's AlphaStar: Impressive for RL Progress, Not for AGI Progress raises some important questions about how capable today's state-of-the-art systems are.