A superintelligent machine would not automatically act as intended: it will act as programmed, but the fit between human intentions and formal specification could be poor. We discuss methods by which a system could be constructed to learn what to value. We highlight open problems specific to inductive value learning (from labeled training data), and raise a number of questions about the construction of systems which model the preferences of their operators and act accordingly.
This is the last of six new major reports which describe and motivate MIRI’s current research agenda at a high level.
Update May 29, 2016: A revised version of “The Value Learning Problem” (available at the original link) has been accepted to the IJCAI-16 Ethics for Artificial Intelligence workshop. The original version of the paper can be found here.