Skip to content

Check out The AI Doc, now streaming.

The Problem
Research
About Us
Updates
Donate

The Problem
Research
About Us
Updates
Donate

July 2021 Newsletter

August 3, 2021
Rob Bensinger

MIRI updates

MIRI researcher Evan Hubinger discusses learned optimization, interpretability, and homogeneity in takeoff speeds on the Inside View podcast.
Scott Garrabrant releases part three of "Finite Factored Sets", on conditional orthogonality.
UC Berkeley's Daniel Filan provides examples of conditional orthogonality in finite factored sets: 1, 2.
Abram Demski proposes factoring the alignment problem into "outer alignment" / "on-distribution alignment", "inner robustness" / "capability robustness", and "objective robustness" / "inner alignment".
MIRI senior researcher Eliezer Yudkowsky summarizes "the real core of the argument for 'AGI risk' (AGI ruin)" as "appreciating the power of intelligence enough to realize that getting superhuman intelligence wrong, on the first try, will kill you on that first try, not let you learn and try again".

News and links

From DeepMind: "generally capable agents emerge from open-ended play".
DeepMind’s safety team summarizes their work to date on causal influence diagrams.
Another (outer) alignment failure story is similar to Paul Christiano's best guess at how AI might cause human extinction.
Christiano discusses a "special case of alignment: solve alignment when decisions are 'low stakes'".
Andrew Critch argues that power dynamics are "a blind spot or blurry spot" in the collective world-modeling of the effective altruism and rationality communities, "especially around AI".

Browse

Search

Browse

Categories

Analysis
Conversations
Guest Posts
MIRI Strategy
News
Newsletters
Papers
Uncategorized
Video

Subscribe

Follow us on

Facebook X-twitter Rss

Contact
Donate
Careers
Team
Transparency
Privacy

Contact
Donate
Careers
Team
Transparency
Privacy

Subscribe to our Newsletter

Machine Intelligence Research Institute

Berkeley, California

Facebook X-twitter