AGI Ruin: A List of Lethalities. Eliezer Yudkowsky lists reasons AGI appears likely to cause an existential catastrophe, and reasons why he thinks the current research community—MIRI included—isn't succeeding at preventing this from happening
A central AI alignment problem: capabilities generalization, and the sharp left turn. Nate Soares describes a core obstacle to aligning AGI systems:
[C]apabilities generalize further than alignment (once capabilities start to generalize real well (which is a thing I predict will happen)). And this, by default, ruins your ability to direct the AGI (that has slipped down the capabilities well), and breaks whatever constraints you were hoping would keep it corrigible.
On Nate's model, very little work is currently going into this problem. He advocates for putting far more effort into addressing this challenge in particular, and making it a major focus of future work.
Six Dimensions of Operational Adequacy in AGI Projects. Eliezer describes six criteria an AGI project likely needs to satisfy in order to have a realistic chance at preventing catastrophe at the time AGI is developed: trustworthy command, research closure, strong opsec, common good commitment, alignment mindset, and requisite resource levels.
Other MIRI updates
- I (Rob Bensinger) wrote a post discussing the inordinately slow spread of good AGI conversations in ML.
- I want to signal-boost two of my forum comments: on AGI Ruin, a discussion of common mindset issues in thinking about AGI alignment; and on Six Dimensions, a comment on pivotal acts and "strawberry-grade" alignment.
- Also, a quick note from me, in case this is non-obvious: MIRI leadership thinks that humanity never building AGI would mean the loss of nearly all of the future's value. If this were a live option, it would be an unacceptably bad one.
- Nate discusses MIRI's past writing on recursive self-improvement (with good discussion in the comments).
- Let's See You Write That Corrigibility Tag: Eliezer posts a challenge to write a list of "the sort of principles you'd build into a Bounded Thing meant to carry out some single task or task-class and not destroy the world by doing it".
- From Eliezer: MIRI announces new "Death With Dignity" strategy. Although released on April Fools' Day (whence the silly title), the post body is an entirely non-joking account of Eliezer's current models, including his currently-high p(doom) and his recommendations on conditionalization and naïve consequentialism.
News and links
- Paul Christiano (link) and Zvi Mowshowitz (link) share their takes on the AGI Ruin post.
- Google's new large language model, Minerva, achieves 50.3% performance on the MATH dataset (problems at the level of high school math competitions), a dramatic improvement on the previous state of the art of 6.9%.
- Jacob Steinhardt reports generally poor forecaster performance on predicting AI progress, with capabilities work moving faster than expected and robustness slower than expected. Outcomes for both the MATH and Massive Multitask Language Understanding datasets "exceeded the 95th percentile prediction".
- In the wake of April/May/June results like Minerva, Google's PaLM, OpenAI's DALL-E, and DeepMind's Chinchilla and Gato, Metaculus' "Date of Artificial General Intelligence" forecast has dropped from 2057 to 2039. (I'll mention that Eliezer and Nate's timelines were already pretty short, and I'm not aware of any MIRI updates toward shorter timelines this year. I'll also note that I don't personally put much weight on Metaculus' AGI timeline predictions, since many of them are inconsistent and this is a difficult and weird domain to predict.)
- Conjecture is a new London-based AI alignment startup with a focus on short-timeline scenarios, founded by EleutherAI alumni. The organization is currently hiring engineers and researchers, and is "particularly interested in hiring devops and infrastructure engineers with supercomputing experience".