- Scott Garrabrant and Rohin Shah debate one of the central questions in AI alignment strategy: whether we should try to avoid human-modeling capabilities in the first AGI systems.
- Scott gives a proof of the fundamental theorem of finite factored sets.
News and links
- Redwood Research, a new AI alignment research organization, is seeking an operations lead. Led by Nate Thomas, Buck Shlegeris, and Bill Zito, Redwood Research has received a strong endorsement from MIRI Executive Director Nate Soares:
Redwood Research seems to me to be led by people who care full-throatedly about the long-term future, have cosmopolitan values, are adamant truthseekers, and are competent administrators. The team seems to me to possess the virtue of practice, and no small amount of competence. I am excited about their ability to find and execute impactful plans that involve modern machine learning techniques. In my estimation, Redwood is among the very best places to do machine-learning based alignment research that has a chance of mattering. In fact, I consider it at least plausible that I work with Redwood as an individual contributor at some point in the future.
- Holden Karnofsky of Open Philanthropy has written a career guide organized around building one of nine “longtermism-relevant aptitudes”: organization building/running/boosting, political influence, research on core longtermist questions, communication, entrepreneurship, community building, software engineering, information security, and work in academia.
- Open Phil’s Joe Carlsmith argues that with the right software, 1013–1017 FLOP/s is likely enough (or more than enough) “to match the human brain’s task-performance”, with 1015 FLOP/s “more likely than not” sufficient.
- Katja Grace discusses her work at AI Impacts on Daniel Filan’s AI X-Risk Podcast.
- Chris Olah of Anthropic discusses what the hell is going on inside neural networks on the 80,000 Hours Podcast.
- Daniel Kokotajlo argues that the effective altruism community should permanently stop using the term “outside view” and “use more precise, less confused concepts instead.”