(Epistemic status: attempting to clear up a misunderstanding about points I have attempted to make in the past. This post is not intended as an argument for those points.) I have long said that the lion’s share of the AI alignment problem seems to me to be about pointing powerful cognition at anything at all, rather… Read more »
Posts By: Nate Soares
A central AI alignment problem: capabilities generalization, and the sharp left turn
(This post was factored out of a larger post that I (Nate Soares) wrote, with help from Rob Bensinger, who also rearranged some pieces and added some text to smooth things out. I’m not terribly happy with it, but am posting it anyway (or, well, having Rob post it on my behalf while I travel)… Read more »
Visible Thoughts Project and Bounty Announcement
(Update Jan. 12, 2022: We released an FAQ last month, with more details. Last updated Jan. 7.) (Update Jan. 19, 2022: We now have an example of a successful partial run, which you can use to inform how you do your runs. Details.) (Update Mar. 14, 2023: As of now the limited $20,000 prizes are… Read more »
2018 Update: Our New Research Directions
For many years, MIRI’s goal has been to resolve enough fundamental confusions around alignment and intelligence to enable humanity to think clearly about technical AI safety risks—and to do this before this technology advances to the point of potential catastrophe. This goal has always seemed to us to be difficult, but possible. Last year, we… Read more »
Ensuring smarter-than-human intelligence has a positive outcome
I recently gave a talk at Google on the problem of aligning smarter-than-human AI with operators’ goals: The talk was inspired by “AI Alignment: Why It’s Hard, and Where to Start,” and serves as an introduction to the subfield of alignment research in AI. A modified transcript follows. Talk outline (slides): 1. Overview… Read more »
Post-fundraiser update
We concluded our 2016 fundraiser eleven days ago. Progress was slow at first, but our donors came together in a big way in the final week, nearly doubling our final total. In the end, donors raised $589,316 over six weeks, making this our second-largest fundraiser to date. I’m heartened by this show of support, and… Read more »
MIRI’s 2016 Fundraiser
Update December 22: Our donors came together during the fundraiser to get us most of the way to our $750,000 goal. In all, 251 donors contributed $589,248, making this our second-biggest fundraiser to date. Although we fell short of our target by $160,000, we have since made up this shortfall thanks to November/December donors. I’m… Read more »
New paper: “Logical induction”
MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, myself, and Jessica Taylor. Readers may wish to start with the abridged version. Consider a setting where a reasoner is observing a deductive process (such as a community of mathematicians and computer… Read more »