Late 2021 MIRI Conversations
This page collects a series of chatroom conversation logs about artificial general intelligence between MIRI researchers and researchers from a number of other AI alignment / forecasting / strategy groups. Many topics are covered, beginning with the question of how difficult the AI alignment problem looks.
The transcripts are deliberately highly unedited and raw, to provide a relatively unfiltered snapshot of some AI alignment debates.
Relevant Background Material
Older background material for this discussion includes:
- Rationality: A–Z
- Inadequate Equilibria
- “AI Alignment: Why It’s Hard, and Where to Start”
- “There’s No Fire Alarm for Artificial General Intelligence”
- the “Security Mindset” dialogues (part one, part two)
Two more recent conversations provide useful context:
- An update on Eliezer’s current epistemic state: the gameboard looks “incredibly grim” to him, because from his perspective the field has made almost no progress on the alignment problem, and Eliezer has no currently-promising-seeming ideas for research that’s likely to change that fact.
- Success at this point will require some “positive model violation” of an Eliezer-model. Eliezer’s hyperbolic term for this is that we need a “miracle”. But no one specific “miracle” seems likely enough to be worth focusing on today.
- Eliezer instead suggests various actions that would likely be useful in a variety of scenarios, such as: building capacity to run large closed ML alignment experiments; promoting closure in general; and finding more exceptional alignment researchers to generate new ideas and approaches.
Part One (Primarily Richard Ngo and Eliezer Yudkowsky)
- Eliezer and Richard discuss “pivotal acts” — in particular, actions that prevent the world from being destroyed by any future AGI system.
- Eliezer argues that it’s hard to align an AI system capable enough to enable a pivotal act, because all pivotal acts (that look to Eliezer like they could actually work) require a powerful non-human process that “searches paths through time and selects high-scoring ones for output” (what Eliezer calls “consequentialism”), which is very dangerous as a strong default.
- Nate uses a laser analogy to articulate the idea that alignment/corrigibility/etc. are hard to achieve with consequentialists:
[T]he plan presumably needs to involve all sorts of mechanisms for refocusing the laser in the case where the environment contains fog, and redirecting the laser in the case where the environment contains mirrors[…] so that it can in fact hit a narrow and distant target. Refocusing and redirecting to stay on target are part and parcel to plans that can hit narrow distant targets.
But the humans shutting the AI down is like scattering the laser, and the humans tweaking the AI so that it plans in a different direction is like them tossing up mirrors that redirect the laser; and we want the plan to fail to correct for those interferences.
As such, on the Eliezer view as I understand it, we can see ourselves as asking for a very unnatural sort of object: a path-through-the-future that is robust enough to funnel history into a narrow band in a very wide array of circumstances, but somehow insensitive to specific breeds of human-initiated attempts to switch which narrow band it’s pointed towards.
- Richard expresses skepticism that Eliezer’s notion of “consequentialism” is very useful for making predictions about pivotal-act-capable AI.
- In §5.3, Eliezer criticizes EA epistemology.
- In §5.4, Eliezer predicts that governments won’t prepare for, coordinate around, or otherwise competently engage with future AI impacts.
- In §5.5, Eliezer criticizes Paul Christiano’s 2018 article “Takeoff Speeds”. In §5.6, Paul and Eliezer have a live debate.
- Paul argues for a “soft takeoff” view, on which we already essentially have AGI tech (e.g., GPT-3), and all we need to do is steadily scale that tech up with more compute over the coming decades, gradually and predictably increasing its impact.
- Eliezer argues for a “hard takeoff” view, on which: we don’t currently know how to build AGI; it’s hard to predict the trajectory to AGI, or how far off AGI is; and when we do figure out how to build AGI, it’s likely to take us by surprise, and very quickly progress to “can save or destroy the world” sorts of capabilities.
Part Two (Primarily Paul Christiano and Eliezer Yudkowsky)
- A continuation of Eliezer and Paul’s debate about hard v. soft takeoff, discussing (among other things): what Eliezer and Paul’s respective models predict about the future; styles of forecasting; and the role of scaled-up computational resources vs. architectural innovations in AI and in brains.
- Holden Karnofsky responds to Eliezer’s dialogue, arguing (among other points) that Moravec’s predictions look good in hindsight, that the “biological anchors” report is meant to give a soft upper bound rather than a point estimate for AGI, and that the report doesn’t assume AGI will be reached via the current deep learning paradigm.
Part Three (Varied Participants)
- Richard argues that we should expect narrow AI to reach superhuman performance in “many types of scientific research”, and expresses optimism about “task-based RL”, where “agents are rewarded (likely via human feedback, and some version of iterated amplification) for doing well on bounded tasks”. Eliezer and others respond.
- Richard, Carl, and Eliezer discuss US-China coordination scenarios.
- Paul and Eliezer conclude their own conversation with a discussion of EfficientZero and near-term AI predictions, followed by a discussion of the evolution and power of human intelligence.
- Concluding the conversation series, Rohin Shah and Eliezer discuss factors that make alignment generalize less readily than capabilities.
- This log was followed by an open discussion on LessWrong, to review outstanding issues and the “Late 2021 MIRI Conversations” as a whole.