MIRI Research Associate Vanessa Kosoy has written a new paper, “Delegative reinforcement learning: Learning to avoid traps with a little help.” Kosoy will be presenting the paper at the ICLR 2019 SafeML workshop in two weeks. The abstract reads:
Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.)
The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.
The goal of Kosoy’s work on DRL is to put us on a path toward having a deep understanding of learning systems with human-in-the-loop and formal performance guarantees, including safety guarantees. DRL tries to move us in this direction by providing models in which such performance guarantees can be derived.
While these models still make many unrealistic simplifying assumptions, Kosoy views DRL as already capturing some of the most essential features of the problem—and she has a fairly ambitious vision of how this framework might be further developed.
Kosoy previously described DRL in the post Delegative Reinforcement Learning with a Merely Sane Advisor. One feature of DRL Kosoy described here but omitted from the paper (for space reasons) is DRL’s application to corruption. Given certain assumptions, DRL ensures that a formal agent will never have its reward or advice channel tampered with (corrupted). As a special case, the agent’s own advisor cannot cause the agent to enter a corrupt state. Similarly, the general protection from traps described in “Delegative reinforcement learning” also protects the agent from harmful self-modifications.
Another set of DRL results that didn’t make it into the paper is Catastrophe Mitigation Using DRL. In this variant, a DRL agent can mitigate catastrophes that the advisor would not be able to mitigate on its own—something that isn’t supported by the more strict assumptions about the advisor in standard DRL.
Sign up to get updates on new MIRI technical results
Get notified every time a new technical paper is published.