Our big news this month is Scott Garrabrant's finite factored sets, one of MIRI's largest results to date.
For most people, the best introductory resource on FFS is likely Scott’s Topos talk/transcript. Scott is also in the process of posting a longer, more mathematically dense introduction in multiple parts: part 1, part 2.
Other MIRI updates
- On MIRI researcher Abram Demski’s view, the core inner alignment problem is the absence of robust safety arguments “in a case where we might naively expect it. We don't know how to rule out the presence of (misaligned) mesa-optimizers.” Abram advocates a more formal approach to the problem:
Most of the work on inner alignment so far has been informal or semi-formal (with the notable exception of a little work on minimal circuits). I feel this has resulted in some misconceptions about the problem. I want to write up a large document clearly defining the formal problem and detailing some formal directions for research. Here, I outline my intentions, inviting the reader to provide feedback and point me to any formal work or areas of potential formal work which should be covered in such a document.
- Mark Xu writes An Intuitive Guide to Garrabrant Induction (a.k.a. logical induction).
- MIRI research associate Ramana Kumar has formalized the ideas in Scott Garrabrant’s Cartesian Frames sequence in higher-order logic, “including machine verified proofs of all the theorems”.
- Independent researcher Alex Flint writes on probability theory and logical induction as lenses and on gradations of inner alignment obstacles.
- I (Rob) asked 44 people working on long-term AI risk about the level of existential risk from AI (EA Forum link, LW link). Responses were all over the map (with MIRI more pessimistic than most organizations). The mean respondent’s probability of existential catastrophe from “AI systems not doing/optimizing what the people deploying them wanted/intended” was ~40%, median 30%. (See also the independent survey by Clarke, Carlier, and Schuett.)
- MIRI recently spent some time seriously evaluating whether to move out of the Bay Area. We’ve now decided to stay in the Bay. For more details, see MIRI board member Blake Borgeson’s update.
News and links
- Dario and Daniela Amodei, formerly at OpenAI, have launched a new organization, Anthropic, with a goal of doing “computationally-intensive research to develop large-scale AI systems that are steerable, interpretable, and robust”.
- Jonas Vollmer writes that the Long-Term Future Fund and the Effective Altruism Infrastructure Fund are now looking for grant applications: "We fund student scholarships, career exploration, local groups, entrepreneurial projects, academic teaching buy-outs, top-up funding for poorly paid academics, and many other things. We can make anonymous grants without public reporting. We will consider grants as low as $1,000 or as high as $500,000 (or more in some cases). As a reminder, EA Funds is more flexible than you might think." Going forward, these two funds will accept applications at any time, rather than having distinct grant rounds. You can apply here.