Dave Doty on algorithmic self-assembly

 |   |  Conversations

Dave Doty portrait Dave Doty is a Senior Research Fellow at the California Institute of Technology. He proves theorems about molecular computing and conducts experiments implementing algorithmic molecular self-assembly with DNA.





Luke Muehlhauser: A couple years ago you wrote a review article on algorithmic self-assembly, and also created a video introduction to the subject. Your review article begins:

Self-assembly is the process by which small components automatically assemble themselves into large, complex structures. Examples in nature abound: lipids self-assemble a cell’s membrane, and bacteriophage virus proteins self-assemble a capsid that allows the virus to invade other bacteria. Even a phenomenon as simple as crystal formation is a process of self-assembly… Algorithmic self-assembly systems automate a series of simple growth tasks, in which the object being grown is simultaneously the machine controlling its own growth.

As an example, here’s a an electron microscope image of a Sierpinski triangle produced via algorithmic self-assembly of DNA molecules:

Sierpinski triangles

In your view, what are the most impressive examples of algorithmic molecular self-assembly produced in the lab thus far?

Dave Doty: Let me play the politician here and answer the question I wish I had been asked, which is to say, what’s the difference between algorithmic and non-algorithmic self-assembly, how well does algorithmic self-assembly work as of April 2014, and what are the reasons to think we can make it work better in the future?

Non-algorithmic self-assembly
There are very impressive examples of non-algorithmic molecular self-assembly in the lab. Two very successful techniques have been DNA origami and DNA tiles. DNA origami requires a long DNA strand called a scaffold (the most commonly used one has 7249 bases). The idea is to synthesize about 200 much shorter (~32 bases) single strands of DNA, called staples, each of which binds via base-pairing (A binds to T and C binds to G) to multiple regions of the scaffold to bring the regions together, essentially folding the scaffold. Carefully designing which regions will be brought together means that different shapes can be created from the same scaffold by folding it differently.

DNA tiles use base-pairing of DNA in a different way. A DNA tile is a complex of DNA with some number (often 4) of single-stranded regions called “sticky ends” that are available to bind on 4 different sides (they are “sticky” because they will bind via base-pairing to complementary single strands on other tiles). One advantage of DNA tiles is that we are not limited by the size of a scaffold strand; potentially many more than 200 tiles could bind to form large structures.

The “default” approach in either of these techniques is to use a unique type of DNA strand for each “pixel” in the self-assembled shape: 200 DNA strands binding to form a shape means 200 different DNA sequences.

Algorithmic self-assembly
The “algorithmic” aspect of algorithmic self-assembly means that many different copies of a single type of DNA tile can be reused at several different parts of a structure. We call this “algorithmic” by analogy to the idea that small program (e.g., one printing the digits of pi) can encode a large or infinite object (the digits of pi) though local rules.

For instance, if a tile is designed to bind to copies of itself on all sides, the whole space (a plane if done in two dimensions) is filled with copies of itself in a regular square grid. We could get a little fancier: design two tile types, white and black, so that white binds only to black and black binds only to white. Then you’ll get an infinite checkerboard pattern. Slightly more interesting but still periodic.

In fact, we can get much fancier if we can achieve what’s called “cooperative binding”: setting up the experimental conditions juuuuuust right so that a tile can bind to a grid of other tiles only if TWO sticky ends of the tile match. The trick is that some other types of tiles may match just one of the sticky ends, or the other, but ONLY the one that matches BOTH will be allowed to bind. If this rule can be enforced, then tiles can be made to assemble structures of arbitrarily high algorithmic complexity, because this gives the tiles the ability to execute any program as they grow, in turn using that program to direct further growth.

So back to your question: what are the most impressive examples of algorithmic self-assembly in the lab today? We have tiles that assemble to count in binary (tiles bind in rows representing successive positive integers in binary) and tiles that compute XOR (which in turn leads to a pattern, like the one you showed, that creates a discrete version of the Sierpinski triangle). The best published binary counter counts from 1 up to 6, and then binding errors send it to 9. There are unpublished images of counters getting much farther before making errors, but we’re a long way from eliminating the errors entirely.

So what are these errors, and how do we plan to tame them? The errors stem from failed attempts at cooperative binding: a tile matched one of its sticky ends and not the other (hence was the wrong tile type to attach), but it attached anyway.

There are a number of experimental approaches to reducing error.

Lately we’ve been working a lot on getting certain experimental conditions right for cooperative binding to be favorable: for example, each sticky end should have the same binding strength as every other (G-C bonds are twice that of A-T bonds, for example, so this isn’t trivial to achieve just by using identical sticky end lengths).

Long temperature holds seem to produce good results, but the correct temperature to choose often depends on a number of other factors, so sometimes we work hard to find the right temperature and then in the next experiment, with slightly changed conditions, need to find a new optimal temperature.

There are ways to correct errors “in software” by designing some redundancy into the tiles, in analogy to error-correcting codes that use 200 bits to represent a 100-bit message, so that instead of a single tile representing a bit, a 2×2 or 3×3 block of tiles will all represent the same bit.

Finally, there are several labs looking into other ideas from the DNA nanotechnology community to implement cooperative binding. For example, there is a very widely used technology called DNA strand displacement that has proven very useful in controlling the interaction of artificial DNA systems. It is conceivable that it could be used as a way for a DNA tile to detect the condition “both of my ‘input’ binding sites match” and only then to allow the tile to attach. A lot of DNA nanotechonology and synthetic biology projects take existing enzymes, particularly ones that interact with DNA and RNA, and repurpose them to help drive some part of an artificial DNA system, so this may be an idea to explore with algorithmic self-assembly as well.

I don’t know whether any of these ideas I’ve mentioned will ultimately be what takes algorithmic self-assembly from “dreamy theoretical playground” to “reliable technology for bottom-up molecular fabrication”. I have no doubt at all that something will work, however. Every day, nature uses bottom-up self-assembly to automatically assemble molecular structures of incredible complexity. What we’re doing is figuring out the engineering principles needed to do it reliably and cheaply.

Luke: For those in the field of algorithmic self-assembly, do you think it has it been particularly difficult to attract funding and research talent to such a “dreamy theoretical playground”? What methods have you used to help draw funding and research talent to work on these problems, despite the fact that they are many years away from practical application?

Dave: I don’t think either has been difficult.

I suppose “difficulty of funding” is always relative to other subjects we could be studying but aren’t. It might be easier to get corporate funding, for instance, to study how companies can use online viral marketing in order to boost their profits, but the major risk of that project is that I would be unable to complete it after I die of boredom.

Of course, funding is always easier to secure when you can show that results are guaranteed. But big breakthroughs are made when — and only when — people decide to work on a project where it’s not at all clear in advance that it’s going to work.

The National Science Foundation’s mission explicitly states that basic science is a goal, i.e., studying the world for the sake of understanding it better, not merely as a means to some application. I think this is a good thing for society in general. Happily for us, the National Science Foundation has been generous in funding grants related to algorithmic self-assembly, and molecular computing more generally, as have some other governmental granting agencies.

As far as attracting talent, I should say there are a lot of experimental scientists who do work on immediate practical applications: for example, fast/cheap disease diagnosis using “DNA circuits”, or using self-assembled DNA structures that guide the precise placement of other structures such as carbon nanotubes or transistors with very fine resolution.

But I like to believe that a lot of people are like me and are attracted to the field precisely because it’s so poorly understood and so disorganized that it’s years away from (even bigger) practical applications. Once a field is so well understood that applications follow easily, we get bored and want to move on to something else that no one understands yet, so that we can be the first people to make progress in trying to understand it.

I’m inspired by someone like Alan Turing, who thought very deeply about programmable computers in 1936, years before the first one was built. Once practical electronic computers really hit their stride by the 1950s, Turing had moved on to studying, of all things, biological morphogenesis.

Of course I am frequently asked, “What are the practical applications of programming matter to manipulate itself?”, and I politely oblige with guesses such as “smart drugs” and “nanoscale robots”. I’m curious if Turing was ever asked in 1936, “What are the practical applications of automating computation?” and whether he had a memorized response that surely failed to include “space flight”, “the internet”, “smart phones”, or “cruise control”.

I’m not saying that every project should be funded, regardless of whether it has applications. But even the dullest imagination could brainstorm a few practical consequences that might follow from learning to program and control matter at the molecular level.

I think our field attracts smart people precisely because they don’t need any convincing that this is an important goal. They need only to hear two things:

  1. nothing in the laws of physics says that it can’t be done, and
  2. we don’t know how to do it yet.

Luke: That’s very interesting. Bill Hibbard and I have a forthcoming paper in CACM called “Exploratory Engineering in AI,” in which I list several examples of what Eric Drexler called “exploratory engineering.” I named Turing’s pre-ENIAC work, pre-Sputnik astronautics, and some other examples. Looking back, I should have named algorithmic self-assembly and cited your CACM paper on that!

AI is interesting in this regard, because there are some people who think fully-general “AGI” is a reasonable thing to be studying now even though it’s several decades away, precisely because physics suggests it should be feasible but we just don’t know how to do it yet. But there are others who think it’s kind of disreputable to write papers about AGI or even talk much about it in public, because it’s too speculative and theoretical.

Does that division exist in your field as well? If it doesn’t, do you have a theory as to why there is this division in the AI field but not in your own field?

Dave: This division definitely exists in my field. I’ve often thought of there being two flavors of theory, and I attribute the division to people judging results in the second type of theory by the standards of the first type:

1) Descriptive theory that helps predict the behavior of an existing natural system, in service of experiment. Here, if theoretical predictions differ from experiment, then something’s wrong with the theoretical model, and the theoretician needs to work harder to make the model more realistic. An example is Newton’s laws of motion that predict how billiard balls will move around a billiard table. When they don’t move quite as predicted, it’s a problem with the model, which needs to be updated with ideas like friction, air resistance, etc.

2) Prescriptive theory that serves as a programming language for talking about behavior that we want to engineer in a new system that may or may not presently exist, served by experiments that implement the programmed behavior. An example is theoretical work on Boolean circuits that abstracts away their underlying implementation with transistors (or more recently, implementation with DNA strands or genetic transcription factors). When the electronic circuit doesn’t output the correct bit, it’s not a problem with the Boolean circuit model, it’s a problem with the experimental implementation that failed to maintain analog voltages in such a way as to correctly represent discrete bits in the Boolean circuit.

If I understand your meaning correctly, the second of these two types of theory could very well be called exploratory engineering when the system being studied doesn’t exist yet. Most theory in the natural sciences is descriptive, but theoretical computer science is almost exclusively prescriptive. Theoretical computer scientists don’t predict physical results that are confirmed by experiment. Rather they show what is possible (and impossible) for certain engineered systems to do, and sometimes (as in the case of algorithmic self-assembly, or quantum computing, or electronic computing in 1936) they are reasoning about systems that haven’t even been built yet.

The abstract tile assembly model is a prescriptive theory (one variant of it; another variant called the kinetic tile assembly model is closer to the underlying chemistry and used more in the descriptive sense when we do experiments). The abstract tile assembly model doesn’t predict what DNA tiles do, and that’s not its goal. The point of the theory is to show that if DNA tiles can be built that have the idealized behavior of the model, then hey, look at all the other amazing things they could do! The fact that we currently aren’t able to experimentally engineer this behavior isn’t a problem with the theoretical model. It’s a problem with the experiments.

Confusion results when someone familiar only with descriptive theory encounters a result in prescriptive theory. I’ve proved impossibility theorems (of the form, “no system exists that has so-and-so behavior”) and been asked whether I intend to confirm them experimentally, a notion that doesn’t even make logical sense.

For example, the uncomputability of the halting problem, proved by Turing in 1936, states that there is no computer program P that can predict whether another computer program, given as input to P, is ever going to halt. No experiment can be done to confirm this theorem, yet its truth is a fundamental law of nature.

I can’t comment on the AI field and its detractors since I don’t know it well. This sort of phenomenon may be one explanation.

That said, not all exploratory engineering is equally valid. It is possible to be too speculative by speculating that provably impossible tasks are possible. I think it’s fine to theorize about systems that don’t yet exist, so long as we have good reasons to believe they could one day exist. Unfortunately, I’ve seen a lot of theory done in models that are demonstrably at odds with the laws of physics, e.g., ideas like stuffing 2500 strands of DNA (more than the number of atoms in the universe) into a single test tube to “test all solutions in parallel” to some hard combinatorial problem.

But, when someone tells me that work on algorithmic self-assembly is at odds with reality because they have never seen something like that happen in an experiment, I imagine them telling electrical engineers prior to the 1930s that just because they’d never seen an electronic circuit do arithmetic on binary numbers, the notion must be at odds with reality. There are ways to prove that certain systems cannot be built, but simply lacking the imagination to build them is not one of these ways.

Luke: When doing prescriptive theory, it’s very handy to have explicit, formal models of the laws which govern what you’re trying to do. E.g. it would be hard to do quantum algorithm design if there was no principled way to abstract away from implementation details, but it turns out there is, so people can actually prove certain results even though the quantum computers don’t exist yet. One problem in AGI theory is that nobody knows what an AGI will look like, and there’s very little you can demonstrate about the ultimate capabilities of AGI with (e.g.) physics and computational complexity theory alone. At least with AIXI (and its computable variants) one can make principled arguments about what certain classes of AGIs would do or not do, but that kind of precision is rarely available in the field so far.

How is prescriptive theory in your field enabled by formal models which abstract away from implementation details?

Dave: The abstract tile assembly model is one workhorse for prescriptive theory. It abstracts away the proposed DNA implementation. In fact, it’s really based more on generic crystallization theory than DNA specifically. Any crystallizing molecule forming a square lattice, held just under the temperature where the rate of monomer attachment is just barely larger than the rate of detachment of monomers held by two bonds (assuming all bond strengths are equal), should have the same effect of enforcing cooperative binding: that tiles end up permanently attached only when held by at least two bonds. There may be other physical ways to enforce this sort of cooperative binding using techniques specific to DNA, or biological molecules more generally.

There have been variants of the model that, while based on plausible physical mechanisms for implementation, abstract away the precise method of implementation. Recently there’s been some work with models of signal-passing tiles, which are able to transmit information after they attach in response to receiving a signal from a neighboring tile. They can in turn send signals to other tiles, and the signals are used to do things like activate or deactivate bonds, which allows a bit more sophisticated assembly to happen than in the original model with its passive tiles. A reasonable restriction to assume is that only a fixed number of signals can be sent across a tile before they are used up; this is based on the idea that if sending a signal is thermodynamically favorable (whatever the exact mechanism), then it should put the tile in a lower energy state after sending the signal than before, and at some point it gets to a minimum energy.

There is some unpublished lab work I’m aware of implementing the beginnings of this sort of thing using DNA, but the point of the model is to say, “What’s a reasonable abstract model of signalling tiles that seems plausible and likely to capture lots of different ways of doing it?” and then ask what the model can do. The hope is that the results apply not just to one potential implementation of signalling tiles, but lots of different ways.

Another example is some work on motion: what if the monomers are able to move relative to each other after they attach? Again, the physical mechanism for this could be some sort of molecular motor driven by ATP, or it could be driven by the fact that more base pairs are attached after the motion than before.

Moving outside of algorithmic self-assembly to molecular computing more generally, one of my favorite models is one of the oldest models in all of science: chemical reaction networks. These are lists of reactions like X + Y –> A + B, indicating that when a molecule of X and a molecule of Y collide, they might change into molecules of A and B. It’s abstracting away the question, “What exactly are X and Y and A and B, and why would they have this effect if X and Y collide?” As long as your system is based on molecules floating around a well-mixed liquid and bumping into each other (with no control of who’s going to bump into whom), and some of them might change as a result of colliding, then the model of chemical reaction networks applies to it.

The model has been around since the 1860’s. It started to get fairly serious scrutiny in the 1970s by a group of mathematicians studying what sorts of long-term behaviors are possible for different classes of chemical reaction networks: oscillations, steady states, etc. More recently, a more computational perspective has been applied, asking questions like, “What sort of computation is possible if we are allowed to engineer arbitrary chemical reactions?”

This may seem like the ultimate in theoretical navel-gazing: what makes anyone think we could engineer arbitrary chemical reactions? Well, assuming a reasonable model of a physical mechanism called DNA strand displacement (one of the experimental workhorses of the field of DNA nanotechnology), it was shown a few years ago that every abstract chemical reaction network you could think of can be systematically translated into a system of DNA complexes that undergo actual reactions imitating the abstract reactions.

Of course, that’s just one way to try to implement a chemical reaction network. When we do prescriptive theory with chemical reaction networks, we aren’t thinking of that particular implementation. We know it’s out there and it helps us sleep at night, but maybe there are other ways to engineer artificial chemical reactions, and work on abstract chemical reaction networks applies to those systems as well.

Luke: Thanks, Dave!