Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness

 |   |  Analysis

encryptionIn 1979, Michael Rabin proved that his encryption system could be inverted — so as to decrypt the encrypted message — only if an attacker could factor n. And since this factoring task is computationally hard for any sufficiently large n, Rabin’s encryption scheme was said to be “provably secure” so long as one used a sufficiently large n.

Since then, creating encryption algorithms with this kind of “provable security” has been a major goal of cryptography,1 and new encryption algorithms that meet these criteria are sometimes marketed as “provably secure.”

Unfortunately, the term “provable security” can be misleading,2 for several reasons3.

Read more »


  1. An encryption system is said to be provably secure if its security requirements are stated formally, and proven to be satisfied by the system, as was the case with Rabin’s system. See Wikipedia
  2. Security reductions can still be useful (Damgård 2007). My point is just that term “provable security” can be misleading, especially to non-experts. 
  3. For more details, and some additional problems with the term “provable security,” see Koblitz & Menezes’ Another Look website and its linked articles, especially Koblitz & Menezes (2010)

Upcoming Talks at Harvard and MIT

 |   |  News

Paul & EliezerOn October 15th from 4:30-5:30pm, MIRI workshop participant Paul Christiano will give a technical talk at the Harvard University Science Center, room 507, as part of the Logic at Harvard seminar and colloquium.

Christiano’s title and abstract are:

Probabilistic metamathematics and the definability of truth

No model M of a sufficiently expressive theory can contain a truth predicate T such that for all S, M |= T(“S”) if and only if M |= S. I’ll consider the setting of probabilistic logic, and show that there are probability distributions over models which contain an “objective probability function” P such that M |= a < P(“S”) < b almost surely whenever a < P(M |= S) < b. This demonstrates that a probabilistic analog of a truth predicate is possible as long as we allow infinitesimal imprecision. I’ll argue that this result significantly undercuts the philosophical significance of Tarski’s undefinability theorem, and show how the techniques involved might be applied more broadly to resolve obstructions due to self-reference.

Stata CenterThen, on October 17th from 4:00-5:30pm, Scott Aaronson will host a talk by MIRI research fellow Eliezer Yudkowsky.

Yudkowsky’s talk will be somewhat more accessible than Christiano’s, and will take place in MIT’s Ray and Maria Stata Center (see image on right), in room 32-123 (aka Kirsch Auditorium, with 318 seats). There will be light refreshments 15 minutes before the talk. Yudkowsky’s title and abstract are:

Recursion in rational agents: Foundations for self-modifying AI

Reflective reasoning is a familiar but formally elusive aspect of human cognition. This issue comes to the forefront when we consider building AIs which model other sophisticated reasoners, or who might design other AIs which are as sophisticated as themselves. Mathematical logic, the best-developed contender for a formal language capable of reflecting on itself, is beset by impossibility results. Similarly, standard decision theories begin to produce counterintuitive or incoherent results when applied to agents with detailed self-knowledge. In this talk I will present some early results from workshops held by the Machine Intelligence Research Institute to confront these challenges.

The first is a formalization and significant refinement of Hofstadter’s “superrationality,” the (informal) idea that ideal rational agents can achieve mutual cooperation on games like the prisoner’s dilemma by exploiting the logical connection between their actions and their opponent’s actions. We show how to implement an agent which reliably outperforms classical game theory given mutual knowledge of source code, and which achieves mutual cooperation in the one-shot prisoner’s dilemma using a general procedure. Using a fast algorithm for finding fixed points, we are able to write implementations of agents that perform the logical interactions necessary for our formalization, and we describe empirical results.

Second, it has been claimed that Godel’s second incompleteness theorem presents a serious obstruction to any AI understanding why its own reasoning works or even trusting that it does work. We exhibit a simple model for this situation and show that straightforward solutions to this problem are indeed unsatisfactory, resulting in agents who are willing to trust weaker peers but not their own reasoning. We show how to circumvent this difficulty without compromising logical expressiveness.

Time permitting, we also describe a more general agenda for averting self-referential difficulties by replacing logical deduction with a suitable form of probabilistic inference. The goal of this program is to convert logical unprovability or undefinability into very small probabilistic errors which can be safely ignored (and may even be philosophically justified).

Also, on Oct 18th at 7pm there will be a Less Wrong / Methods of Rationality meetup/party on the MIT campus in Building 6, room 120. There will be snacks and refreshments, and Yudkowsky will be in attendance.

Paul Rosenbloom on Cognitive Architectures

 |   |  Conversations

Paul Rosenbloom portrait Paul S. Rosenbloom is Professor of Computer Science at the University of Southern California and a project leader at USC’s Institute for Creative Technologies. He was a key member of USC’s Information Sciences Institute for two decades, leading new directions activities over the second decade, and finishing his time there as Deputy Director. Earlier he was on the faculty at Carnegie Mellon University (where he had also received his MS and PhD in computer science) and Stanford University (where he had also received his BS in mathematical sciences with distinction).

His research concentrates on cognitive architectures – models of the fixed structure underlying minds, whether natural or artificial – and on understanding the nature, structure and stature of computing as a scientific domain.  He is a AAAI Fellow, the co-developer of Soar (one of the longest standing and most well developed cognitive architectures), the primary developer of Sigma (which blends insights from earlier architectures such as Soar with ideas from graphical models), and the author of On Computing: The Fourth Great Scientific Domain (MIT Press, 2012).

Read more »

Effective Altruism and Flow-Through Effects

 |   |  Conversations

Last month, MIRI research fellow Carl Shulman1 participated in a recorded debate/conversation about effective altruism and flow-through effects. This issue is highly relevant to MIRI’s mission, since MIRI focuses on activities that are intended to produce altruistic value via their flow-through effects on the invention of AGI.

The conversation (mp3, transcript) included:

Recommended background reading includes:

To summarize the conversation very briefly: All participants seemed to agree that more research on flow-through effects would be high value. However, there’s a risk that such research isn’t highly tractable. For now, GiveWell will focus on other projects that seem more tractable. Rob Wiblin might try to organize some research on flow-through effects, to learn how tractable it is.


  1. Carl was a MIRI research fellow at the time of the conversation, but left MIRI at the end of August 2013 to study computer science. 

Double Your Donations via Corporate Matching

 |   |  News

double-the-donation-logoMIRI has now partnered with Double the Donation, a company that makes it easier for donors to take advantage of donation matching programs offered by their employers.

More than 65% of Fortune 500 companies match employee donations, and 40% offer grants for volunteering, but many of these opportunities go unnoticed. Most employees don’t know these programs exist!

Go to MIRI’s Double The Donation page here to find out whether your employer can match your donations to MIRI. Or, use the form below:


How well will policy-makers handle AGI? (initial findings)

 |   |  Analysis

MIRI’s mission is “to ensure that the creation of smarter-than-human intelligence has a positive impact.”

One policy-relevant question is: How well should we expect policy makers to handle the invention of AGI, and what does this imply about how much effort to put into AGI risk mitigation vs. other concerns?

To investigate these questions, we asked Jonah Sinick to examine how well policy-makers handled past events analogous in some ways to the future invention of AGI, and summarize his findings. We pre-committed to publishing our entire email exchange on the topic (with minor editing), just as with our project on how well we can plan for future decades. The post below is a summary of findings from our full email exchange (.pdf) so far.

As with our investigation of how well we can plan for future decades, we decided to publish our initial findings after investigating only a few historical cases. This allows us to gain feedback on the value of the project, as well as suggestions for improvement, before continuing. It also means that we aren’t yet able to draw any confident conclusions about our core questions.

The most significant results from this project so far are:

  1. We came up with a preliminary list of 6 seemingly-important ways in which a historical case could be analogous to the future invention of AGI, and evaluated several historical cases on these criteria.
  2. Climate change risk seems sufficiently disanalogous to AI risk that studying climate change mitigation efforts probably gives limited insight into how well policy-makers will deal with AGI risk: the expected damage of climate change appears to be very small relative to the the expected damage due to AI risk, especially when one looks at expected damage to policy makers.
  3. The 2008 financial crisis appears, after a shallow investigation, to be sufficiently analogous to AGI risk that it should give us some small reason to be concerned that policy-makers will not manage the invention of AGI wisely.
  4. The risks to critical infrastructure from geomagnetic storms are far too small to be in the same reference class with risks from AGI.
  5. The eradication of smallpox is only somewhat analogous to the invention of AGI.
  6. Jonah performed very shallow investigations of how policy-makers have handled risks from cyberwarfare, chlorofluorocarbons, and the Cuban missile crisis, but these cases need more study before even “initial thoughts” can be given.
  7. We identified additional historical cases that could be investigated in the future.

Further details are given below. For sources and more, please see our full email exchange (.docx).

Read more »

MIRI’s September Newsletter

 |   |  Newsletters

 

 

Greetings from the Executive Director

Dear friends,

With your help, we finished our largest fundraiser ever, raising $400,000 for our research program. My thanks to everyone who contributed!

We continue to publish non-math research to our blog, including an ebook copy of The Hanson-Yudkowsky AI-Foom Debate (see below). In the meantime, earlier math results are currently being written up, and new results are being produced at our ongoing decision theory workshop.

This October, Eliezer Yudkowsky and Paul Christiano are giving talks about MIRI’s research at MIT and Harvard. Exact details are still being confirmed, so if you live near Boston then you may want to subscribe to our blog so that you can see the details as soon as they are announced (which will be long before the next newsletter).

This November, Yudkowsky and I are visiting Oxford to “sync up” with our frequent collaborators at the Future of Humanity Institute at Oxford University, and also to run our November research workshop (in Oxford).

And finally, let me share a bit of fun with you. Philosopher Robby Bensinger re-wrote Yudkowsky’s Five Theses using the xkcd-inspired Up-Goer Five Text Editor, which only allows use of the 1000 most common words in English. Enjoy.

Cheers,

Luke Muehlhauser

Executive Director

Read more »

Laurent Orseau on Artificial General Intelligence

 |   |  Conversations

Laurent Orseau is an associate professor (maître de conférences) since 2007 at AgroParisTech, Paris, France. In 2003, he graduated from a professional master in computer science at the National Institute of Applied Sciences in Rennes and from a research master in artificial intelligence at University of Rennes 1. He obtained his PhD in 2007. His goal is to build a practical theory of artificial general intelligence. With his co-author Mark Ring, they have been awarded the Solomonoff AGI Theory Prize at AGI’2011 and the Kurzweil Award for Best Idea at AGI’2012.

Luke Muehlhauser: In the past few years you’ve written some interesting papers, often in collaboration with Mark Ring, that use AIXI-like models to analyze some interesting features of different kinds of advanced theoretical agents. For example in Ring & Orseau (2011), you showed that some kinds of advanced agents will maximize their rewards by taking direct control of their input stimuli — kind of like the rats who “wirehead” when scientists give them direct control of the input stimuli to their reward circuitry (Olds & Milner 1954). At the same time, you showed that at least one kind of agent, the “knowledge-based” agent, does not wirehead. Could you try to give us an intuitive sense of why some agents would wirehead, while the knowledge-based agent would not?


Laurent Orseau: You’re starting with a very interesting question!

Read more »