Ronald de Wolf is a senior researcher at CWI and a part-time full professor at the University of Amsterdam. He obtained his PhD there in 2001 with a thesis about quantum computing and communication complexity, advised by Harry Buhrman and Paul Vitanyi. Subsequently he was a postdoc at UC Berkeley. His scientific interests include quantum computing, complexity theory, and learning theory.
He also holds a Master’s degree in philosophy (where his thesis was about Kolmogorov complexity and Occam’s razor), and enjoys classical music and literature.
Luke Muehlhauser: Before we get to quantum computing, let me ask you about philosophy. Among other topics, your MSc thesis discusses the relevance of computational learning theory to philosophical debates about Occam’s razor, which is the principle advocating that “among the theories, hypotheses, or explanations that are consistent with the facts, we are to prefer simpler over more complex ones.”
Though many philosophers and scientists adhere to the principle of Occam’s razor, it is often left ambiguous exactly what is meant by “simpler,” and also why this principle is justified in the first place. But in your thesis you write that “in certain formal settings we can, more or less, prove that certain versions of Occam’s Razor work.”
Philosophers are usually skeptical when I argue for K-complexity versions of Occam’s razor, as you do. For example, USC’s Kenny Easwaran once wrote, “I’ve never actually seen how [a K-complexity based simplicity measure] is supposed to solve anything, given that it always depends on a choice of universal machine.”
How would you reply, given your optimism about justifying Occam’s razor “in certain formal settings”?
Ronald de Wolf: I would treat Occam’s razor more as a rule of thumb than as a formal rule or theorem. Clearly it’s vague, and clearly there are cases where it doesn’t work. Still, many scientists have been guided by it to good effect, often equating simplicity with beauty (for example Einstein and Dirac). Psychologically, invoking Occam will only be effective if there is some shared notion of simplicity; maybe not to quantify simplicity, but at least to be able to rank theories according to their simplicity.
You could try to use Kolmogorov complexity as your “objective” measure of simplicity, and in some simplified cases this makes perfect sense. In my MSc thesis I surveyed a few known cases where it provably does. However, such cases do not provide convincing proof of Occam’s razor “in the real world”. They are more like thought experiments, where you strip away everything that’s superfluous in order to bring out a certain point more clearly.
In practice there are at least three issues with using Kolmogorov complexity to measure simplicity. First, it requires you to write down your theory (or whatever it is whose simplicity you’re quantifying) over some fixed alphabet, say as a string of bits. It’s often kind of subjective which background assumptions to count as actually part of your theory. Second, as Easwaran rightly says, KC depends on the choice of universal Turing machine w.r.t. which it is defined. However, I don’t think this is such a big issue. If you choose some reasonably efficient universal Turing machine and consider the KC of reasonably long strings, the constant difference incurred by the choice of universal Turing machine will be relatively small. Thirdly and possibly most importantly, KC is not computable, not even approximable by any computational process (even a very slow one) with any approximation-guarantees. This rules out using KC itself in practical settings.
However, the core idea that compression somehow corresponds to detection of patterns in your data is a perfectly valid one, and you can use it in practice if you’re willing to base “compression” on imperfect but practical programs like gzip. This loses the theoretical optimality guaranteed by KC (which you can view as the “ultimate compression”) but it gives you a tool for data mining and clustering that’s often quite good in practice. See for example here. Such practical approaches are like heuristics that try to approach, in some weak sense, the ideal but unreachable limit-case of KC.
Luke: Do you think one can use Occam-like principles to choose between, for example, the various explanations of quantum mechanics, since they appear to make essentially the same predictions about what we should observe?
Ronald: In principle you could, but to my limited understanding (I’m not following this debate closely), the main interpretations of QM all suffer from having some seemingly superfluous aspects. The standard interpretation that a measurement “collapses the wave function” to a probabilistically-chosen measurement outcome treats “observers” as a special category of quantum objects, or “observation/measurement” as a special category of quantum process. Before you know it, people will bring consciousness into the picture and mysticism beckons. It seems to me that treating the “observer” as a special category violates Occam’s razor. Alternatively you can take the position that measurement is nothing special but just another interaction between quantum systems (observer and observed system). This is sometimes known as the “church of the larger Hilbert space”. It’s mathematically pleasing because now there’s only this smooth, coherent, and even deterministic evolution of the whole universe. However, now you will have may different “branches” of the superposition that is the world’s state vector, which very quickly leads to the multiverse view of many worlds. A metaphysics that postulates infinitely many worlds existing in superposition doesn’t strike me as very Occam-compliant either.
Then there is the instrumentalist “shut up and calculate” school. This is minimalistic in an Occam-pleasing sense, but seems to substantially impoverish the scientific endeavour, whose aim should not just be to predict but also to explain and give some picture of the world. All interpretations of QM are problematic in their own way, and choosing between them based on Occam’s razor assumes some shared idea of what simplicity is as well as a shared view of the goals of science, which we seem to lack here.
Luke: Most of your work these days is in quantum computing and communication. Quantum computing is an interesting field, since its researchers design algorithms, error correction techniques, etc. for machines that cannot yet be built. In this sense, I tend to think of quantum computing as an “exploratory engineering” discipline, akin to pre-Sputnik astronautics, pre-ENIAC computer science, and Eric Drexler’s Nanosystems. Do you think that’s a fair characterization? Do you or your colleagues in quantum computing get much criticism that such work is “too speculative”? (For the record, that’s not my view.)
Ronald: The two main questions in quantum computing are (1) can we build a large-scale quantum computer and (2) what could it do if we had one. I think your term “exploratory engineering” fits the work on the first question; small quantum computers on a handful of qubits were already built a decade ago, so it’s not pure theory anymore. I myself am a theoretical computer scientist focusing on the second question. While I think this is more mathematics than engineering, you can certainly compare it to computer science in the 1930s: at that point the theoretical model of a (classical) computer had already been introduced by Alan Turing, but no large-scale computers had been built yet. You could already design algorithms for Turing machines on paper, and you could even prove that such computers could not solve certain problems (as Turing famously did for the halting problem). We are doing such work on quantum computing now: designing quantum algorithms and communication protocols that are much faster than classical problems for some computational problems, and on the other hand proving that quantum computers do not give you a speed-up for many other problems. Much of the relevance of this is of course contingent upon the eventual construction of a large QC. Interestingly, however, some of the work we are doing has spin-offs for the analysis of classical computing, and that is relevant today irrespective of progress on building a QC.
Regarding the possible charge of being “too speculative”: in the mid-1990s, right after Peter Shor published his groundbreaking quantum algorithm for factoring large numbers into their prime factors (which breaks a lot of cryptography), there was a lot of skepticism, particularly among physicists who thought that this would never fly. They expected that any attempt at implementing quantum bits and operations would have so much noise and errors that it would quickly decohere to a classical computer. Of course they had good reasons to be skeptical — manipulating something as small as an electron is extremely hard, much harder than manipulating a vacuum tube was in the 1940s and 1950s. The worries about noise and imperfections were partially answered soon after by the development (partially by Shor himself) of quantum error-correction and fault-tolerant computing, which roughly says that if the noise is not too large and not too malicious, your quantum computer can correct for it. The only way these worries can be fully overcome is to actually build a large-scale QC. My impression is that experimental physicists are making slow but sure progress on this, and are becoming more optimistic over time that this will actually be realized within one or two decades. So, sure this is a speculative endeavour (most long-term research is), but not unreasonably so.
Luke: What heuristics do you and your colleagues in quantum computing use to decide what to work on, given QC’s long-term and somewhat speculative nature? Presumably you need to make uncertain predictions about which types of quantum computers are most likely to be built, what the solutions to known obstacles might look like, etc.? (I ask because MIRI aims to conduct long-term research that is more speculative than quantum computing.)
Ronald: Most of the time we study how well quantum computers can solve classical computational problems, problems with classical inputs (such as a large number N) and classical outputs (such as the prime factors of N). Computer science has over decades been defining and studying the complexity of lots of interesting and useful computational problems and models, and often we start from there: we take an existing computational problem and try to find quantum tricks to improve over the best classical solutions. In some cases we succeed, designing quantum ways to outperform classical computers, and in some cases we can prove that a QC can’t do better than a classical computer. Of course it’s hard to predict what quantum tricks (if any) might help for a specific problem, but we have some general tools at our disposal. For example, quantum computers are good at detecting periodic patterns (that’s the core of Shor’s algorithm); they can search faster (Grover’s algorithm); you can hide information by encoding it in an unknown basis (quantum cryptography); you can pack a doubly-exponential number of quantum states in an n-qubit space (quantum fingerprinting), etc. A lot of work is based on skillfully combining and applying such known quantum tools, and once in a while people find new tricks to add to our toolbox. Of course, there is also work of a more specific quantum nature, which is not just throwing quantum tricks at classical problems. For example, a lot of work has been done recently on testing whether given quantum states are properly entangled (and hence can be used, for instance, in quantum cryptography).
We typically abstract away from the details of the specific physical system that will implement the quantum computer. Instead we just focus on the mathematical model, with quantum bits and a well-defined set of elementary operations (“gates”) that we can perform on them. It doesn’t really matter whether the qubits will be implemented as electron spins, or as photon polarizations, or as energy levels of an atom — from the perspective of the model, it only matters that a qubit has well-defined 0 and 1 states and that we can form superpositions thereof. Similarly, for classical computers it doesn’t really matter whether you program in C or Java or assembler; all such programming languages can efficiently simulate each other. And you don’t care about the precise voltages used to implement bits physically, as long as each bit has stable and clearly distinguished 0 and 1 values.
Abstracting away from such implementation details is justified when we have a large-scale quantum computer, because different varieties of quantum computers will be able to simulate each other with only moderate overhead in terms of additional number of qubits and operations needed. For example, for the purposes of designing quantum algorithms it’s convenient to assume that you can interact any pair of qubits, even when they are far apart; in the reality of physical experiments it’s much simpler to allow only nearest-neighbor interactions between qubits. We can design algorithms for the first model and then implement them in the nearest-neighbor model by inserting a few swap-operations to move interacting qubits close together. However, this “moderate overhead” is actually quite significant as long as we do not yet have a large-scale quantum computer. It’s quite likely that on the slow road towards a large QC we will first have QCs with a few dozen or a few hundred qubits (the current state of the art is a few qubits). In this case we can’t be too wasteful and probably should design algorithms that are optimized for specific physical implementations. It is actually a very interesting question to find problems where a 50- or 100-qubit QC can already outperform classical computers in some noticeable way. Such problems would be the benchmark on which intermediate-size QCs could be tested.
The point is that once you have a large number of qubits available, the differences between different physical implementations/architectures don’t matter too much, because they are all equivalent up to small overheads (needed to simulate one variant using another). But when we have only intermediate-size QCs available (of, say, a few dozen or a few hundred qubits), then these overheads do make a big difference, and we need to carefully optimize our quantum algorithm for performing on the specific physical implementation that’s actually available. In this respect quantum computing seems quite different from most other future technologies: somehow we’re better able to predict the power of this technology for the long term (when we’ll hopefully have a large-scale QC available and can essentially ignore implementation details) than for the short and medium term (while we only have small-scale QCs with quirky limitations).
Luke: My next question leaps from quantum computing to technological forecasting. What is your subjective probability that we’ll have a 500-qubit quantum computer, which is uncontroversially a quantum computer, within the next 20 years? And, how do you reason about a question like that?
Ronald: Quite high, let’s say probability greater than 2/3. That’s the typical computer science threshold for a “bounded-error” algorithm. From a theoretical perspective, I don’t think we know of any fundamental obstacles to building a large-scale QC, and the threshold theorem from fault-tolerant QC assures us we can deal with moderate amounts of noise and errors. Clearly building a QC is an exceedingly hard engineering problem, but my impression is that experimentalists are making slow but sure progress. There are basically three possible scenarios here:
- Someone constructs a large QC
- We discover a fundamental problem with quantum mechanics (which would be extremely interesting new physics!)
- Experimentalists muddle through without too much progress until either they or the funding agencies lose faith and give up.
The first scenario seems the most plausible to me. I should qualify this by saying that I’m not a certified physicist, let alone a certified experimental physicist, so this opinion is partly based on hearsay — but I do have some confidence in the progress that’s happening in places like MIT, NIST, Yale, Delft,… The recent paper you refer to casts doubt upon the controversial D-Wave quantum computer, which has gotten a lot of press in the last few years. For commercial reasons they prioritize quantity (=number of available qubits) over quality (=the coherence and “true quantum nature” of those qubits), and their machines seem too noisy to have useful quantum computing power.
Ronald: I think that would be a wise precaution, at least for important or sensitive data. There are at least two ways to handle this. We could either stick with public-key cryptography but replace Shor-breakable problems like factoring and discrete logs by problems that seem to be hard to crack even for QC; lattice problems are an oft-mentioned candidate. Or we could use quantum cryptography. Neither is as efficient as RSA, but at least they’re more secure. It makes sense to start this transition already now, even though there’s no QC yet: the security services (and, who knows, maybe the mafia too) are probably hoovering up RSA-encrypted communications that they store for the time being, waiting for the QC that will allow them to decrypt these messages later. So even today’s communication is not safe from a future QC.
Luke: Thanks, Ronald!