Late 2021 MIRI Conversations

This page collects a series of chatroom conversation logs about artificial general intelligence between MIRI researchers and researchers from a number of other AI alignment / forecasting / strategy groups. Many topics are covered, beginning with the question of how difficult the AI alignment problem looks.

The transcripts are deliberately highly unedited and raw, to provide a relatively unfiltered snapshot of some AI alignment debates.

The posts below are mirrored on the MIRI Blog, LessWrong, the AI Alignment Forum, and the Effective Altruism Forum, and are also available in audio form.

Relevant Background Material

Older background material for this discussion includes:

Rationality: A–Z
Inadequate Equilibria
Superintelligence
“AI Alignment: Why It’s Hard, and Where to Start”
“There’s No Fire Alarm for Artificial General Intelligence”
the “Security Mindset” dialogues (part one, part two)

Two more recent conversations provide useful context:

Discussion with Eliezer Yudkowsky on AGI interventions — Early Sep.
MIRI · LW · AF · EA · Audio

An update on Eliezer’s current epistemic state: the gameboard looks “incredibly grim” to him, because from his perspective the field has made almost no progress on the alignment problem, and Eliezer has no currently-promising-seeming ideas for research that’s likely to change that fact.
Success at this point will require some “positive model violation” of an Eliezer-model. Eliezer’s hyperbolic term for this is that we need a “miracle”. But no one specific “miracle” seems likely enough to be worth focusing on today.
Eliezer instead suggests various actions that would likely be useful in a variety of scenarios, such as: building capacity to run large closed ML alignment experiments; promoting closure in general; and finding more exceptional alignment researchers to generate new ideas and approaches.

Comments on Carlsmith’s “Is power-seeking AI an existential risk?” — Sep. 2–13
LW · AF · Audio

A good distillation by Nate Soares of MIRI leadership’s qualitative views on AGI timelines, on the probability of existential catastrophe, and on related topics.

Part One (Primarily Richard Ngo and Eliezer Yudkowsky)

Ngo and Yudkowsky on alignment difficulty — Sep. 5–12
MIRI · LW · AF · EA · Audio (I, II, III)

Eliezer and Richard discuss “pivotal acts” — in particular, actions that prevent the world from being destroyed by any future AGI system.
Eliezer argues that it’s hard to align an AI system capable enough to enable a pivotal act, because all pivotal acts (that look to Eliezer like they could actually work) require a powerful non-human process that “searches paths through time and selects high-scoring ones for output” (what Eliezer calls “consequentialism”), which is very dangerous as a strong default.
Nate uses a laser analogy to articulate the idea that alignment/corrigibility/etc. are hard to achieve with consequentialists:

[T]he plan presumably needs to involve all sorts of mechanisms for refocusing the laser in the case where the environment contains fog, and redirecting the laser in the case where the environment contains mirrors[…] so that it can in fact hit a narrow and distant target. Refocusing and redirecting to stay on target are part and parcel to plans that can hit narrow distant targets.
But the humans shutting the AI down is like scattering the laser, and the humans tweaking the AI so that it plans in a different direction is like them tossing up mirrors that redirect the laser; and we want the plan to fail to correct for those interferences.
As such, on the Eliezer view as I understand it, we can see ourselves as asking for a very unnatural sort of object: a path-through-the-future that is robust enough to funnel history into a narrow band in a very wide array of circumstances, but somehow insensitive to specific breeds of human-initiated attempts to switch which narrow band it’s pointed towards.

Ngo and Yudkowsky on AI capability gains — Sep. 14
MIRI · LW · AF · EA · Audio (I, II)

Richard expresses skepticism that Eliezer’s notion of “consequentialism” is very useful for making predictions about pivotal-act-capable AI.
In §5.3, Eliezer criticizes EA epistemology.
In §5.4, Eliezer predicts that governments won’t prepare for, coordinate around, or otherwise competently engage with future AI impacts.

Yudkowsky and Christiano discuss “Takeoff Speeds” — Sep. 14–15
MIRI · LW · AF · EA · Audio (I, II, III)

In §5.5, Eliezer criticizes Paul Christiano’s 2018 article “Takeoff Speeds”. In §5.6, Paul and Eliezer have a live debate.
Paul argues for a “soft takeoff” view, on which we already essentially have AGI tech (e.g., GPT-3), and all we need to do is steadily scale that tech up with more compute over the coming decades, gradually and predictably increasing its impact.
Eliezer argues for a “hard takeoff” view, on which: we don’t currently know how to build AGI; it’s hard to predict the trajectory to AGI, or how far off AGI is; and when we do figure out how to build AGI, it’s likely to take us by surprise, and very quickly progress to “can save or destroy the world” sorts of capabilities.

Soares, Tallinn, and Yudkowsky discuss AGI cognition — Sep. 18–20
MIRI · LW · AF · EA · Audio

Jaan Tallinn and Eliezer (in §7.1) discuss different phases of treacherous turns, consequentialism, and a variety of other topics.
Nate Soares (in §7.2) attempts to sharpen (what he sees as) Richard Ngo’s critique of Eliezer’s model of pivotal-act-capable AI.

Part Two (Primarily Paul Christiano and Eliezer Yudkowsky)

Christiano, Cotra, and Yudkowsky on AI progress — Sep. 20–21
MIRI · LW · AF · EA

A continuation of Eliezer and Paul’s debate about hard v. soft takeoff, discussing (among other things): what Eliezer and Paul’s respective models predict about the future; styles of forecasting; and the role of scaled-up computational resources vs. architectural innovations in AI and in brains.

Biology-Inspired AGI Timelines: The Trick That Never Works — Oct. 19
MIRI · LW · AF · EA · Audio

Via fictional dialogue, Eliezer argues that AGI-forecasting methods like Ajeya Cotra’s “biological anchors” method aren’t informative.

Reply to Eliezer on Biological Anchors — Dec. 23
LW · AF · Audio

Holden Karnofsky responds to Eliezer’s dialogue, arguing (among other points) that Moravec’s predictions look good in hindsight, that the “biological anchors” report is meant to give a soft upper bound rather than a point estimate for AGI, and that the report doesn’t assume AGI will be reached via the current deep learning paradigm.

Shulman and Yudkowsky on AI progress — Sep. 21–22
MIRI · LW · AF · EA · Audio

Carl Shulman follows up on trendline extrapolation, scaling laws, AI’s future economic impacts, and other topics in the AI takeoff debate.

More Christiano, Cotra, and Yudkowsky on AI progress — Sep. 22–24
MIRI · LW · AF · EA

A discussion of “why should we expect early prototypes to be low-impact?”, and of concrete predictions for AI progress.

Conversation on technology forecasting and gradualism — Sep. 23 – Oct. 2
MIRI · LW · AF · EA · Audio

A multi-day free-for-all discussion of general technology forecasting heuristics.
§12.1 focuses on Rob Bensinger’s “AGI’s properties are a question of CS” argument, while §12.2 focuses on Eliezer’s “AI gradualism predicts too little variation between individual humans” argument.

Part Three (Varied Participants)

Ngo’s view on alignment difficulty — Sep. 25
MIRI · LW · AF · EA · Audio

Richard argues that we should expect narrow AI to reach superhuman performance in “many types of scientific research”, and expresses optimism about “task-based RL”, where “agents are rewarded (likely via human feedback, and some version of iterated amplification) for doing well on bounded tasks”. Eliezer and others respond.
Richard, Carl, and Eliezer discuss US-China coordination scenarios.

Ngo and Yudkowsky on scientific reasoning and pivotal acts — Oct. 4
MIRI · LW · AF · EA · Audio

Richard and Eliezer conclude their back-and-forth with a discussion of “shallow cognition” versus “deep generality”, and the cognitive requirements of pivotal acts.

Christiano and Yudkowsky on AI predictions and human intelligence — Oct. 19 – Nov. 27
MIRI · LW · AF · EA · Audio

Paul and Eliezer conclude their own conversation with a discussion of EfficientZero and near-term AI predictions, followed by a discussion of the evolution and power of human intelligence.

Shah and Yudkowsky on alignment failures — Nov. 6–14
MIRI · LW · AF · EA

Concluding the conversation series, Rohin Shah and Eliezer discuss factors that make alignment generalize less readily than capabilities.
This log was followed by an open discussion on LessWrong, to review outstanding issues and the “Late 2021 MIRI Conversations” as a whole.