MIRI assistant research fellow Ryan Carey has a new paper out discussing situations where good performance in Cooperative Inverse Reinforcement Learning (CIRL) tasks fails to imply that software agents will assist or cooperate with programmers. The paper, titled “Incorrigibility in...
MIRI Executive Director Nate Soares and Rutgers/UIUC decision theorist Ben Levinstein have a new paper out introducing functional decision theory (FDT), MIRI’s proposal for a general-purpose decision theory. The paper, titled “Cheating Death in Damascus,” considers a wide range of...
MIRI Research Fellow Andrew Critch has developed a new result in the theory of conflict resolution, described in “Toward negotiable reinforcement learning: Shifting priorities in Pareto optimal sequential decision-making.” Abstract: Existing multi-objective reinforcement learning (MORL) algorithms do not account for...
MIRI Research Associate Vanessa Kosoy has developed a new framework for reasoning under logical uncertainty, “Optimal polynomial-time estimators: A Bayesian notion of approximation algorithm.” Abstract: The concept of an “approximation algorithm” is usually only applied to optimization problems, since in...
MIRI is releasing a paper introducing a new model of deductively limited reasoning: “Logical induction,” authored by Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, myself, and Jessica Taylor. Readers may wish to start with the abridged version. Consider a setting where...
MIRI’s research to date has focused on the problems that we laid out in our late 2014 research agenda, and in particular on formalizing optimal reasoning for bounded, reflective decision-theoretic agents embedded in their environment. Our research team has since...