This is the second post in a series of transcribed conversations about AGI forecasting and alignment. See the first post for prefaces and more information about the format.
Posts By: Rob Bensinger
Ngo and Yudkowsky on alignment difficulty
This post is the first in a series of transcribed Discord conversations between Richard Ngo and Eliezer Yudkowsky, moderated by Nate Soares. We’ve also added Richard and Nate’s running summaries of the conversation (and others’ replies) from Google Docs. Later conversation participants include Ajeya Cotra, Beth Barnes, Carl Shulman, Holden Karnofsky, Jaan Tallinn, Paul… Read more »
Discussion with Eliezer Yudkowsky on AGI interventions
The following is a partially redacted and lightly edited transcript of a chat conversation about AGI between Eliezer Yudkowsky and a set of invitees in early September 2021. By default, all other participants are anonymized as “Anonymous”. I think this Nate Soares quote (excerpted from Nate’s response to a report by Joe Carlsmith) is… Read more »
November 2021 Newsletter
MIRI updates MIRI won’t be running a formal fundraiser this year, though we’ll still be participating in Giving Tuesday and other matching opportunities. Visit intelligence.org/donate to donate and to get information on tax-advantaged donations, employer matching, etc. Giving Tuesday takes place on Nov. 30 at 5:00:00am PT. Facebook will 100%-match the first $2M donated — something that took less than… Read more »
October 2021 Newsletter
Redwood Research is a new alignment research organization that just launched their website and released an explainer about what they're currently working on. We're quite excited about Redwood's work, and encourage our supporters to consider applying to work there to help boost Redwood's alignment research. MIRI senior researcher Eliezer Yudkowsky writes: Redwood Research is investigating a toy problem in… Read more »
September 2021 Newsletter
Scott Garrabrant has concluded the main section of his Finite Factored Sets sequence (“Details and Proofs”) with posts on inferring time and applications, future work, and speculation. Scott’s new frameworks are also now available as a pair of arXiv papers: “Cartesian Frames” (adapted from the Cartesian Frames sequence for a philosopher audience by Daniel Hermann and Josiah Lopez-Wild) and… Read more »
August 2021 Newsletter
MIRI updates Scott Garrabrant and Rohin Shah debate one of the central questions in AI alignment strategy: whether we should try to avoid human-modeling capabilities in the first AGI systems. Scott gives a proof of the fundamental theorem of finite factored sets. News and links Redwood Research, a new AI alignment research organization, is seeking an operations lead. Led… Read more »
July 2021 Newsletter
MIRI updates MIRI researcher Evan Hubinger discusses learned optimization, interpretability, and homogeneity in takeoff speeds on the Inside View podcast. Scott Garrabrant releases part three of "Finite Factored Sets", on conditional orthogonality. UC Berkeley's Daniel Filan provides examples of conditional orthogonality in finite factored sets: 1, 2. Abram Demski proposes factoring the alignment problem into "outer alignment"… Read more »