New paper: “Quantilizers”

 |   |  Papers

quantilizersMIRI Research Fellow Jessica Taylor has written a new paper on an error-tolerant framework for software agents, “Quantilizers: A safer alternative to maximizers for limited optimization.” Taylor’s paper will be presented at the AAAI-16 AI, Ethics and Society workshop. The abstract reads:

In the field of AI, expected utility maximizers are commonly used as a model for idealized agents. However, expected utility maximization can lead to unintended solutions when the utility function does not quantify everything the operators care about: imagine, for example, an expected utility maximizer tasked with winning money on the stock market, which has no regard for whether it accidentally causes a market crash. Once AI systems become sufficiently intelligent and powerful, these unintended solutions could become quite dangerous. In this paper, we describe an alternative to expected utility maximization for powerful AI systems, which we call expected utility quantilization. This could allow the construction of AI systems that do not necessarily fall into strange and unanticipated shortcuts and edge cases in pursuit of their goals.

Expected utility quantilization is the approach of selecting a random action in the top n% of actions from some distribution γ, sorted by expected utility. The distribution γ might, for example, be a set of actions weighted by how likely a human is to perform them. A quantilizer based on such a distribution would behave like a compromise between a human and an expected utility maximizer. The agent’s utility function directs it toward intuitively desirable outcomes in novel ways, making it potentially more useful than a digitized human; while γ directs it toward safer and more predictable strategies.

Quantilization is a formalization of the idea of “satisficing,” or selecting actions that achieve some minimal threshold of expected utility. Agents that try to pick good strategies, but not maximally good ones, seem less likely to come up with extraordinary and unconventional strategies, thereby reducing both the benefits and the risks of smarter-than-human AI systems. Designing AI systems to satisfice looks especially useful for averting harmful convergent instrumental goals and perverse instantiations of terminal goals:

  • If we design an AI system to cure cancer, and γ labels it bizarre to reduce cancer rates by increasing the rate of some other terminal illness, them a quantilizer will be less likely to adopt this perverse strategy even if our imperfect specification of the system’s goals gave this strategy high expected utility.
  • If superintelligent AI systems have a default incentive to seize control of resources, but γ labels these policies bizarre, then a quantilizer will be less likely to converge on these strategies.

Taylor notes that the quantilizing approach to satisficing may even allow us to disproportionately reap the benefits of maximization without incurring proportional costs, by specifying some restricted domain in which the quantilizer has low impact without requiring that it have low impact overall — “targeted-impact” quantilization.

One obvious objection to the idea of satisficing is that a satisficing agent might build an expected utility maximizer. Maximizing, after all, can be an extremely effective way to satisfice. Quantilization can potentially avoid this objection: maximizing and quantilizing may both be good ways to satisfice, but maximizing is not necessarily an effective way to quantilize. A quantilizer that deems the act of delegating to a maximizer “bizarre” will avoid delegating its decisions to an agent even if that agent would maximize the quantilizer’s expected utility.

Taylor shows that the cost of relying on a 0.1-quantilizer (which selects a random action from the top 10% of actions), on expectation, is no more than 10 times that of relying on the recommendation of its distribution γ; the expected cost of relying on a 0.01-quantilizer (which selects from the top 1% of actions) is no more than 100 times that of relying on γ; and so on. Quantilization is optimal among the set of strategies that are low-cost in this respect.

However, expected utility quantilization is not a magic bullet. It depends strongly on how we specify the action distribution γ, and Taylor shows that ordinary quantilizers behave poorly in repeated games and in scenarios where “ordinary” actions in γ tend to have very high or very low expected utility. Further investigation is needed to determine if quantilizers (or some variant on quantilizers) can remedy these problems.



Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.

New paper: “Formalizing convergent instrumental goals”

 |   |  Papers

convergentTsvi Benson-Tilsen, a MIRI associate and UC Berkeley PhD candidate, has written a paper with contributions from MIRI Executive Director Nate Soares on strategies that will tend to be useful for most possible ends: “Formalizing convergent instrumental goals.” The paper will be presented as a poster at the AAAI-16 AI, Ethics and Society workshop.

Steve Omohundro has argued that AI agents with almost any goal will converge upon a set of “basic drives,” such as resource acquisition, that tend to increase agents’ general influence and freedom of action. This idea, which Nick Bostrom calls the instrumental convergence thesis, has important implications for future progress in AI. It suggests that highly capable decision-making systems may pose critical risks even if they are not programmed with any antisocial goals. Merely by being indifferent to human operators’ goals, such systems can have incentives to manipulate, exploit, or compete with operators.

The new paper serves to add precision to Omohundro and Bostrom’s arguments, while testing the arguments’ applicability in simple settings. Benson-Tilsen and Soares write:

In this paper, we will argue that under a very general set of assumptions, intelligent rational agents will tend to seize all available resources. We do this using a model, described in section 4, that considers an agent taking a sequence of actions which require and potentially produce resources. […] The theorems proved in section 4 are not mathematically difficult, and for those who find Omohundro’s arguments intuitively obvious, our theorems, too, will seem trivial. This model is not intended to be surprising; rather, the goal is to give a formal notion of “instrumentally convergent goals,” and to demonstrate that this notion captures relevant aspects of Omohundro’s intuitions.

Our model predicts that intelligent rational agents will engage in trade and cooperation, but only so long as the gains from trading and cooperating are higher than the gains available to the agent by taking those resources by force or other means. This model further predicts that agents will not in fact “leave humans alone” unless their utility function places intrinsic utility on the state of human-occupied regions: absent such a utility function, this model shows that powerful agents will have incentives to reshape the space that humans occupy.

Benson-Tilsen and Soares define a universe divided into regions that may change in different ways depending on an agent’s actions. The agent wants to make certain regions enter certain states, and may collect resources from regions to that end. This model can illustrate the idea that highly capable agents nearly always attempt to extract resources from regions they are indifferent to, provided the usefulness of the resources outweighs the extraction cost.

The relevant models are simple, and make few assumptions about the particular architecture of advanced AI systems. This makes it possible to draw some general conclusions about useful lines of safety research even if we’re largely in the dark about how or when highly advanced decision-making systems will be developed. The most obvious way to avoid harmful goals is to incorporate human values into AI systems’ utility functions, a project outlined in “The value learning problem.” Alternatively (or as a supplementary measure), we can attempt to specify highly capable agents that violate Benson-Tilsen and Soares’ assumptions, avoiding dangerous behavior in spite of lacking correct goals. This approach is explored in the paper “Corrigibility.”



Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.

November 2015 Newsletter

 |   |  Newsletters

Research updates

General updates

  • Castify has released professionally recorded audio versions of Eliezer Yudkowsky’s Rationality: From AI to Zombies: Part 1, Part 2, Part 3.
  • I’ve put together a list of excerpts from the many responses to the 2015 question, “What Do You Think About Machines That Think?”

News and links contributors discuss the future of AI

 |   |  News

In January, nearly 200 public intellectuals submitted essays in response to the 2015 question, “What Do You Think About Machines That Think?” (available online). The essay prompt began:

In recent years, the 1980s-era philosophical discussions about artificial intelligence (AI)—whether computers can “really” think, refer, be conscious, and so on—have led to new conversations about how we should deal with the forms that many argue actually are implemented. These “AIs”, if they achieve “Superintelligence” (Nick Bostrom), could pose “existential risks” that lead to “Our Final Hour” (Martin Rees). And Stephen Hawking recently made international headlines when he noted “The development of full artificial intelligence could spell the end of the human race.”

But wait! Should we also ask what machines that think, or, “AIs”, might be thinking about? Do they want, do they expect civil rights? Do they have feelings? What kind of government (for us) would an AI choose? What kind of society would they want to structure for themselves? Or is “their” society “our” society? Will we, and the AIs, include each other within our respective circles of empathy?

The essays are now out in book form, and serve as a good quick-and-dirty tour of common ideas about smarter-than-human AI. The submissions, however, add up to 541 pages in book form, and MIRI’s focus on de novo AI makes us especially interested in the views of computer professionals. To make it easier to dive into the collection, I’ve collected a shorter list of links — the 32 argumentative essays written by computer scientists and software engineers.1 The resultant list includes three MIRI advisors (Omohundro, Russell, Tallinn) and one MIRI researcher (Yudkowsky).

I’ve excerpted passages from each of the essays below, focusing on discussions of AI motivations and outcomes. None of the excerpts is intended to distill the content of the entire essay, so you’re encouraged to read the full essay if an excerpt interests you.

Read more »

  1. The exclusion of other groups from this list shouldn’t be taken to imply that this group is uniquely qualified to make predictions about AI. Psychology and neuroscience are highly relevant to this debate, as are disciplines that inform theoretical upper bounds on cognitive ability (e.g., mathematics and physics) and disciplines that investigate how technology is developed and used (e.g., economics and sociology). 

New report: “Leó Szilárd and the Danger of Nuclear Weapons”

 |   |  Papers

Today we release a new report by Katja Grace, “Leó Szilárd and the Danger of Nuclear Weapons: A Case Study in Risk Mitigation” (PDF, 72pp).

Leó Szilárd has been cited as an example of someone who predicted a highly disruptive technology years in advance — nuclear weapons — and successfully acted to reduce the risk. We conducted this investigation to check whether that basic story is true, and to determine whether we can take away any lessons from this episode that bear on highly advanced AI or other potentially disruptive technologies.

To prepare this report, Grace consulted several primary and secondary sources, and also conducted two interviews that are cited in the report and published here:

The basic conclusions of this report, which have not been separately vetted, are:

  1. Szilárd made several successful and important medium-term predictions — for example, that a nuclear chain reaction was possible, that it could produce a bomb thousands of times more powerful than existing bombs, and that such bombs could play a critical role in the ongoing conflict with Germany.
  2. Szilárd secretly patented the nuclear chain reaction in 1934, 11 years before the creation of the first nuclear weapon. It’s not clear whether Szilárd’s patent was intended to keep nuclear technology secret or bring it to the attention of the military. In any case, it did neither.
  3. Szilárd’s other secrecy efforts were more successful. Szilárd caused many sensitive results in nuclear science to be withheld from publication, and his efforts seems to have encouraged additional secrecy efforts. This effort largely ended when a French physicist, Frédéric Joliot-Curie, declined to suppress a paper on neutron emission rates in fission. Joliot-Curie’s publication caused multiple world powers to initiate nuclear weapons programs.
  4. All told, Szilárd’s efforts probably slowed the German nuclear project in expectation. This may not have made much difference, however, because the German program ended up being far behind the US program for a number of unrelated reasons.
  5. Szilárd and Einstein successfully alerted Roosevelt to the feasibility of nuclear weapons in 1939. This prompted the creation of the Advisory Committee on Uranium (ACU), but the ACU does not appear to have caused the later acceleration of US nuclear weapons development.

October 2015 Newsletter

 |   |  Newsletters

Research updates

General updates

  • As a way to engage more researchers in mathematics, logic, and the methodology of science, Andrew Critch and Tsvi Benson-Tilsen are currently co-running a seminar at UC Berkeley on Provability, Decision Theory and Artificial Intelligence.
  • We have collected links to a number of the posts we wrote for our Summer Fundraiser on
  • German and Swiss donors can now make tax-advantaged donations to MIRI and other effective altruist organizations through GBS Switzerland.
  • MIRI has received Public Benefit Organization status in the Netherlands, allowing Dutch donors to make tax-advantaged donations to MIRI as well. Our tax reference number (RSIN) is 823958644.

News and links

New paper: “Asymptotic logical uncertainty and the Benford test”

 |   |  Papers

Asymptotic Logical Uncertainty and The Benford Test
We have released a new paper on logical uncertainty, co-authored by Scott Garrabrant, Siddharth Bhaskar, Abram Demski, Joanna Garrabrant, George Koleszarik, and Evan Lloyd: “Asymptotic logical uncertainty and the Benford test.”

Garrabrant gives some background on his approach to logical uncertainty on the Intelligent Agent Foundations Forum:

The main goal of logical uncertainty is to learn how to assign probabilities to logical sentences which have not yet been proven true or false.

One common approach is to change the question, assume logical omniscience and only try to assign probabilities to the sentences that are independent of your axioms (in hopes that this gives insight to the other problem). Another approach is to limit yourself to a finite set of sentences or deductive rules, and assume logical omniscience on them. Yet another approach is to try to define and understand logical counterfactuals, so you can try to assign probabilities to inconsistent counterfactual worlds.

One thing all three of these approaches have in common is they try to allow (a limited form of) logical omniscience. This makes a lot of sense. We want a system that not only assigns decent probabilities, but which we can formally prove has decent behavior. By giving the system a type of logical omniscience, you make it predictable, which allows you to prove things about it.

However, there is another way to make it possible to prove things about a logical uncertainty system. We can take a program which assigns probabilities to sentences, and let it run forever. We can then ask about whether or not the system eventually gives good probabilities.

At first, it seems like this approach cannot work for logical uncertainty. Any machine which searches through all possible proofs will eventually give a good probability (1 or 0) to any provable or disprovable sentence. To counter this, as we give the machine more and more time to think, we have to ask it harder and harder questions.

We therefore have to analyze the machine’s behavior not on individual sentences, but on infinite sequences of sentences. For example, instead of asking whether or not the machine quickly assigns 1/10 to the probability that the 3↑↑↑↑3rd digit of π is a 5 we look at the sequence:

an:= the probability the machine assigns at timestep 2n to the n↑↑↑↑nth digit of π being 5,

and ask whether or not this sequence converges to 1/10.

Benford’s law is the observation that the first digit in base 10 of various random numbers (e.g., random powers of 3) is likely to be small: the digit 1 comes first about 30% of the time, 2 about 18% of the time, and so on; 9 is the leading digit only 5% of the time. In their paper, Garrabrant et al. pick the Benford test as a concrete example of logically uncertain reasoning, similar to the π example: a machine passes the test iff it consistently assigns the correct subjective probability to “The first digit is a 1.” for the number 3 to the power f(n), where f is a fast-growing function and f(n) cannot be quickly computed.

Garrabrant et al.’s new paper describes an algorithm that passes the Benford test in a nontrivial way by searching for infinite sequences of sentences whose truth-values cannot be distinguished from the output of a weighted coin.

In other news, the papers “Toward idealized decision theory” and “Reflective oracles: A foundation for classical game theory” are now available on arXiv. We’ll be presenting a version of the latter paper with a slightly altered title (“Reflective oracles: A foundation for game theory in artificial intelligence”) at LORI-V next month.

Update June 12, 2016: “Asymptotic logical uncertainty and the Benford test” has been accepted to AGI-16.


Sign up to get updates on new MIRI technical results

Get notified every time a new technical paper is published.

September 2015 Newsletter

 |   |  Newsletters

Research updates

General updates

News and links