New paper: “Defining human values for value learners”

 |   |  Papers

Defining ValuesMIRI Research Associate Kaj Sotala recently presented a new paper, “Defining Human Values for Value Learners,” at the AAAI-16 AI, Society and Ethics workshop.

The abstract reads:

Hypothetical “value learning” AIs learn human values and then try to act according to those values. The design of such AIs, however, is hampered by the fact that there exists no satisfactory definition of what exactly human values are. After arguing that the standard concept of preference is insufficient as a definition, I draw on reinforcement learning theory, emotion research, and moral psychology to offer an alternative definition. In this definition, human values are conceptualized as mental representations that encode the brain’s value function (in the reinforcement learning sense) by being imbued with a context-sensitive affective gloss. I finish with a discussion of the implications that this hypothesis has on the design of value learners.

Economic treatments of agency standardly assume that preferences encode some consistent ordering over world-states revealed in agents’ choices. Real-world preferences, however, have structure that is not always captured in economic models. A person can have conflicting preferences about whether to study for an exam, for example, and the choice they end up making may depend on complex, context-sensitive psychological dynamics, rather than on a simple comparison of two numbers representing how much one wants to study or not study.

Sotala argues that our preferences are better understood in terms of evolutionary theory and reinforcement learning. Humans evolved to pursue activities that are likely to lead to certain outcomes — outcomes that tended to improve our ancestors’ fitness. We prefer those outcomes, even if they no longer actually maximize fitness; and we also prefer events that we have learned tend to produce such outcomes.

Affect and emotion, on Sotala’s account, psychologically mediate our preferences. We enjoy and desire states that are highly rewarding in our evolved reward function. Over time, we also learn to enjoy and desire states that seem likely to lead to high-reward states. On this view, our preferences function to group together events that lead on expectation to similarly rewarding outcomes for similar reasons; and over our lifetimes we come to inherently value states that lead to high reward, instead of just valuing such states instrumentally. Rather than directly mapping onto our rewards, our preferences map onto our expectation of rewards.

Sotala proposes that value learning systems informed by this model of human psychology could more reliably reconstruct human values. On this model, for example, we can expect human preferences to change as we find new ways to move toward high-reward states. New experiences can change which states my emotions categorize as “likely to lead to reward,” and they can thereby modify which states I enjoy and desire. Value learning systems that take these facts about humans’ psychological dynamics into account may be better equipped to take our likely future preferences into account, rather than optimizing for our current preferences alone.

  • Ted Howard

    Any non-trivial system is going to be a set of approximations to cost/benefit over time.

    What defines costs and benefits can itself be an evolving set, of increasingly more abstract notions (involving choices with increasing degrees of freedom), and always something of a pyramid, with distinctions coming from experience, then abstractions coming from distinctions in the first instance, then higher level abstractions coming from sets of abstractions (recurse to whatever depth is actually achieved – n=12 being as far as I have pushed things personally).

    Evolution seems to have equipped us with base sets of systems in both the genetic and mimetic senses, and with sets of systems for consensus or arbitrage or domination depending on contexts, which contexts often include assessments of systems as to time pressure and danger present (a minimum functional set for most humans seems to be about 20, and numbers exceeding 100 do not seem that uncommon).

    It seems that if these systems are sufficiently generalised, and if one is sufficiently persistent in application, that one can build any set of value functions, based upon the models one has assembled and the projections of probable costs and rewards over various time-scales (and associated probabilities) that are delivered.

    There does not seem to be any possible way to formally constrain such an unbounded set of dimensions and probability estimates (the resulting n-dimensional probability topologies are simply too contextually sensitive, and minor differences in heuristic weightings have profound consequences). That one would even consider that such a thing might be possible seems to display a profound ignorance of the necessary consequences of complexity theory.

  • Ted Howard

    The sentence “New experiences can change which states my emotions categorize as “likely to lead to reward,” and they can thereby modify which states I enjoy and desire” is very problematic.
    The definitions of “enjoy” and “desire” can vary in such a way as to make an interpretation that is toward the more deeply abstract interpretation of the term “desire”, mean something almost the complete denial of a more literal interpretation of simple first order experiential determinants of the meanings of those terms.

    As an example. 6 years ago when I was told that there was nothing known to medical science that could extend the probability of my survival, which was given as “might be dead in 6 weeks, median survival 5 months, and 2% make 2 years”, I accepted that the oncologist on the other side of the desk believed that.

    By the end of the following day I had decided to do my own literature search and use my own background in biochemistry and probability to guide me through interpretations and possible alternative explanations of datasets. Within 3 weeks I had become convinced that there was a significant probability that I could alter that probability distribution by consistent application of a conditions that seemed from evidence to alter those probabilities.
    There were a bunch of major strategies involved.
    Prime amongst those was the Placebo effect, which seems to involve three quite separate sets of neurochemical mechanisms that can be generally categorised as stress responses, conditioned responses, and expectations (and of course all of these sets can influence each other).
    Next was the effects of diets, and the myriad tiny influences of the thousands of different things that had been in our diet over most of our evolutionary history that are missing, and things that were missing that are now present. Prime amongst these is vitamin C, which seems to be involved in about 20 different metabolic pathways connected with immune system function, and is perhaps most strongly implicated through the HIF expression in the neutrophil – phagocyte interaction. So I went RAVE vegan, and high dose oral vit C – currently on 2 x 9g doses per day, and now 5 years clear of last tumours.

    Part of going vegan, was overriding all food preferences, and eating foods that my taste buds were telling me ought to be discarded. It was over 4 months before I ate anything that was even vaguely palatable to my tastes (my tastes were adapting – slowly). Having the pure bloody minded determination to override such “desires” on the basis of interpretations of evidence sets that are not generally shared seems rather rare. Few others have managed it (and I have spoken to many over the last 5 years whom the medical system had similarly rejected and ejected).

    So the term “desire” to me, probably has a very different set of meanings to most.
    I see desires as thing determined largely by a mix of evolution over genetic and cultural time-scales modulated through personal experience sets. In my case, entirely irrelevant to the actual survival needs of the specific situation I found myself in.