“A Game-Theoretic Analysis of The Off-Switch Game”: researchers from Australian National University and Linköping University release a new paper on corrigibility, spun off from a MIRIx workshop.
General updates
Daniel Dewey of the Open Philanthropy Project writes up his current thoughts on MIRI’s highly reliable agent design work, with discussion from Nate Soares and others in the comments section.
Sarah Marquart of the Future of Life Institute discusses MIRI’s work on logical inductors, corrigibility, and other topics.
Open Phil awards a four-year $2.4 million grant to Yoshua Bengio’s group at the Montreal Institute for Learning Algorithms “to support technical research on potential risks from advanced artificial intelligence”.
A new IARPA-commissioned report discusses the potential for AI to accelerate technological innovation and lead to “a self-reinforcing technological and economic edge”. The report suggests that AI “has the potential to be a worst-case scenario” in combining high destructive potential, military/civil dual use, and difficulty of monitoring with potentially low production difficulty.
Daniel Selsam and others release certigrad (arXiv, github), a system for creating formally verified machine learning systems; see discussion on Hacker News (1, 2).
Applications are open for the Center for Applied Rationality’s AI Summer Fellows Program, which runs September 8–25.