MIRI Research Fellow Jessica Taylor has written a new paper on an error-tolerant framework for software agents, “Quantilizers: A safer alternative to maximizers for limited optimization.” Taylor’s paper will be presented at the AAAI-16 AI, Ethics and Society workshop. The abstract reads:
In the field of AI, expected utility maximizers are commonly used as a model for idealized agents. However, expected utility maximization can lead to unintended solutions when the utility function does not quantify everything the operators care about: imagine, for example, an expected utility maximizer tasked with winning money on the stock market, which has no regard for whether it accidentally causes a market crash. Once AI systems become sufficiently intelligent and powerful, these unintended solutions could become quite dangerous. In this paper, we describe an alternative to expected utility maximization for powerful AI systems, which we call expected utility quantilization. This could allow the construction of AI systems that do not necessarily fall into strange and unanticipated shortcuts and edge cases in pursuit of their goals.
Expected utility quantilization is the approach of selecting a random action in the top n% of actions from some distribution γ, sorted by expected utility. The distribution γ might, for example, be a set of actions weighted by how likely a human is to perform them. A quantilizer based on such a distribution would behave like a compromise between a human and an expected utility maximizer. The agent’s utility function directs it toward intuitively desirable outcomes in novel ways, making it potentially more useful than a digitized human; while γ directs it toward safer and more predictable strategies.
Quantilization is a formalization of the idea of “satisficing,” or selecting actions that achieve some minimal threshold of expected utility. Agents that try to pick good strategies, but not maximally good ones, seem less likely to come up with extraordinary and unconventional strategies, thereby reducing both the benefits and the risks of smarter-than-human AI systems. Designing AI systems to satisfice looks especially useful for averting harmful convergent instrumental goals and perverse instantiations of terminal goals:
- If we design an AI system to cure cancer, and γ labels it bizarre to reduce cancer rates by increasing the rate of some other terminal illness, them a quantilizer will be less likely to adopt this perverse strategy even if our imperfect specification of the system’s goals gave this strategy high expected utility.
- If superintelligent AI systems have a default incentive to seize control of resources, but γ labels these policies bizarre, then a quantilizer will be less likely to converge on these strategies.
Taylor notes that the quantilizing approach to satisficing may even allow us to disproportionately reap the benefits of maximization without incurring proportional costs, by specifying some restricted domain in which the quantilizer has low impact without requiring that it have low impact overall — “targeted-impact” quantilization.
One obvious objection to the idea of satisficing is that a satisficing agent might build an expected utility maximizer. Maximizing, after all, can be an extremely effective way to satisfice. Quantilization can potentially avoid this objection: maximizing and quantilizing may both be good ways to satisfice, but maximizing is not necessarily an effective way to quantilize. A quantilizer that deems the act of delegating to a maximizer “bizarre” will avoid delegating its decisions to an agent even if that agent would maximize the quantilizer’s expected utility.
Taylor shows that the cost of relying on a 0.1-quantilizer (which selects a random action from the top 10% of actions), on expectation, is no more than 10 times that of relying on the recommendation of its distribution γ; the expected cost of relying on a 0.01-quantilizer (which selects from the top 1% of actions) is no more than 100 times that of relying on γ; and so on. Quantilization is optimal among the set of strategies that are low-cost in this respect.
However, expected utility quantilization is not a magic bullet. It depends strongly on how we specify the action distribution γ, and Taylor shows that ordinary quantilizers behave poorly in repeated games and in scenarios where “ordinary” actions in γ tend to have very high or very low expected utility. Further investigation is needed to determine if quantilizers (or some variant on quantilizers) can remedy these problems.
Sign up to get updates on new MIRI technical results
Get notified every time a new technical paper is published.