MIRI Research Associate Vanessa Kosoy has written a new paper, “Delegative reinforcement learning: Learning to avoid traps with a little help.” Kosoy will be presenting the paper at the ICLR 2019 SafeML workshop in two weeks. The abstract reads: Most...
Updates New research posts: Simplified Preferences Needed, Simplified Preferences Sufficient; Smoothmin and Personal Identity; Example Population Ethics: Ordered Discounted Utility; A Theory of Human Values; A Concrete Proposal for Adversarial IDA MIRI has received a set of new grants from the Open Philanthropy Project and the Berkeley...
I’m happy to announce that MIRI has received two major new grants: A two-year grant totaling $2,112,500 from the Open Philanthropy Project. A $600,000 grant from the Berkeley Existential Risk Initiative. The Open Philanthropy Project’s grant was awarded as part...
Want to be in the reference class “people who solve the AI alignment problem”? We now have a guide on how to get started, based on our experience of what tends to make research groups successful. (Also on the AI...
We’ve just released a field guide for MIRIx groups, and for other people who want to get involved in AI alignment research. MIRIx is a program where MIRI helps cover basic expenses for outside groups that want to work on...
Updates Ramana Kumar and Scott Garrabrant argue that the AGI safety community should begin prioritizing “approaches that work well in the absence of human models”: [T]o the extent that human modelling is a good idea, it is important to do...