A Note on AI for Medicine and Biotech

Suppose that frontier AI development is centralized to a single project under tight international controls, with all other development banned internationally.

By far the likeliest outcome of this is that we all die. A centralized group of international researchers — a “CERN for AI” — can’t align superintelligence any more than decentralized organizations can.

Centralization at least has the advantage that it makes it easier to shut the remaining research down; but this advantage only helps if key decision-makers actually shut down.

Rather than advocating for centralization in the hope of getting an effective moratorium, it makes far more sense to just advocate for a moratorium directly.

Still, we can consider the question: Suppose that there were an international center for frontier AI research, with a goal of leveraging AI to improve the world as much as possible. Suppose further that this center had a mandate of backing off and halting development at the first warning sign of risk, rather than continuing to charge ahead and look for excuses to push the technology further.¹

What would be the best way to make use of such a project to maximize humanitarian benefit while steering a wide berth around anything that could seriously imperil the world?

Well, our answer is pretty short: Shut it down. We basically don’t buy the premise that the command and control structure for a project like this would handle the risks in a responsible way. We could imagine a world where a project like this could be responsibly spun up, but that world is very different from today’s world.

But do we think this impossible? If we had to spin up a project like this, and we had unlimited time to run the project and unlimited discretion when it comes to how the project is run, would we conclude that starting a project like this spells certain doom?

No. We do think a project like this would be extremely dangerous, and we think that most ways of running it would get everyone on Earth killed, if the project succeeded at producing powerful, human-level-or-smarter biomedical AIs. But we can’t quite say, with full honesty, that we think it’s literally impossible to set up a project like this, with just the right command structure, to be “merely extremely dangerous” rather than “effectively certain to get us all killed (if it succeeds).”

We can imagine how we might attempt this, if we were for some reason forced to make an attempt at it.

We can think of clever tactics for trying to get research out of AIs that are capable enough to possibly be useful for certain kinds of high-value scientific research, but that are weak and narrow enough to be passively safe — the way that GPT-3 is genuinely not smart enough to kill you.

One idea might be to have a weak AI carefully read through medical papers, and try to extract just the medical ideas and facts into a special-purpose language, such that we could grow a new AI just learning about medical facts and not the existence of a larger world, or human psychology, or ideas about re-programming itself. We would do this cautiously, knowing that if we grew an AI too large and too powerful and too smart even on the narrower database, we could expect it to end up with its own ideas and goals, such that it would plausibly become quite dangerous anyway.

But in this scenario, we could possibly push a narrower science AI further before the first distant warning signs appeared and we backed the hell off. Maybe we could get cures for cancer and many age-related disorders before the very first warning sign whatsoever (of the sort that state-of-the-art 2024 systems like Claude 3 Opus already routinely exhibited) appeared and caused us to call off the project.

What we’re suggesting here isn’t, “If you train an advanced science AI just on medicine, the AI only ever figures out medicine no matter how smart it gets.” There are problems in medicine that benefit from general intelligence. If you successfully grow something to be better and better and better at medicine, it will eventually get generally smarter and smarter and smarter.

The facts in medicine are shadows cast by a larger world. An AI can’t get arbitrarily better at reasoning about medical topics without starting to infer and reason about that larger world. You can’t keep pushing forever.

But there are subjects that are likely to make AIs dangerous sooner than medicine would.

A sufficiently powerful intelligence can deduce computer programming on its own, and figure out machine learning for itself, just like humanity did. But if you train an AI on computer programming and on machine learning research papers (as is currently the practice), and try to teach it to make more powerful AIs (as we understand is also currently the practice at some frontier AI companies), then that AI becomes dangerous sooner.

Which fields make an AI dangerous sooner? We’d guess: computer programming, computer science, AI research, mastering human psychology, mastering prediction of individual humans (as all large language models are currently trained to do), game theory to reason about conflicts, decision theory to reason about your own goals and their long-term implications…

The topic list is long, and current AIs get them all under the heading of generally learning about a wider world. It’s why we wouldn’t advocate for training a biology-only or physics-only AI directly on current biology or physics papers; those often mention the existence of a larger world, and are written to a level of detail where they shadow lots of facts about the human authors (and so human psychology). You would want a weak AI to translate research papers into just biology and physics facts, before training some other AI on safer-by-comparison knowledge that isn’t about human psychology or programming or the game theory of conflicts.

The goal would be an AI that has been explicitly trained just to think about medicine and physics. We imagine that an AI like this would need reasonably general prediction and steering capabilities in order to perform well on medicine and physics tasks. But the goal would be to avoid the level of generality where the AI would start to deduce game theory and decision theory and computer programming and machine learning and spend a lot of time thinking about the implied existence of outside agents who might shut it off.

Everything the AI needs to know, every new level of excellence it needs to reach, is a safety burden. Not all of those burdens weigh the same. We can imagine ourselves making a serious effort to guess the burden of each capability, trying to get an AI that would produce a huge amount of groundbreaking medicine without ever triggering the first sorts of warning signs that appeared in Claude 3 Opus. Not trying a dozen different patches and quick fixes until the warning signs disappeared and then continuing on.

But we can count on one or two hands the number of people in the world who we would trust to run an extraordinarily dangerous project of that kind.

It looks to us like nearly all players in the AI field, left to their own devices, never start proactively and seriously thinking that way—in terms of a balance of dangerous burdens, in the way we just outlined. After more than twenty years of working in this field, mentoring new researchers, running workshops, visiting and discussing alignment ideas with engineers, and seeing the ebb and flow of different fashions and frameworks in the field, we just do not think that the background level of technical discretion and foresight is remotely near the level it would need to be in order for a project like this to actually work in real life, without severely threatening the lives of everyone (or failing to ever approach the required capabilities to move the needle on medical research).²

People who go into ASI and even “ASI safety” tend, in the overwhelming majority of cases, to be huge optimists about how tractable, easy, and solvable all of these problems are, and tend to show very little of what we would consider appropriate caution and cautious methodology.

So now it’s a question of us saying:

“Oh, well, maybe that would be, if not safe, then at least not suicidal. But only if you put people we trust in charge.”

We have to say that because it’s our honest guess. But should you believe it? Should you take our word for it that the set-up above would work? Should presidents and prime ministers believe it? If we say we think maybe we could get away with doing something very tricky and dangerous, should Earth’s policy be to put us in charge?

That does not seem to us like how a smart planet would make policy. Maybe if a lot of other scientists said, together, “That could be less than total suicide, but only if you put their favorite researchers in charge,” Earth should take notice of that.

We would prefer not to try at all. We’re raising the topic mainly to say what it would actually look like, from our (fallible) perspective, to try to dance on the cliff-edge and play with fire, to try to extract big benefits from narrower science-capable AI at a centralized international project with no time pressure and no perverse incentives, instead of just shutting down the entire disaster.

It would simplify things if we could just flatly call it impossible, but that wouldn’t seem quite true. In everything we’ve written so far, we’ve tried to be extremely careful to never resort to rhetorically convenient exaggerations or oversimplifications. We think we see clever tricks that we could try, to get more valuable outputs from less dangerous intelligence. There is a very specific and constrained idea that we don’t expect would actually work in practice, but that we cannot call outright suicidal, where we back off at the first warning signs, and don’t try a dozen variations to make visible warnings disappear so that we can proceed further.

But, we say, that’s only if our favorite researchers are at the helm, using their research taste to make sure that nothing goes off the rails and that safety policies are followed in spirit, not just in letter. If those other guys do it, we think they probably kill you.

What is the sensible thing for Earth to do in this situation? We do not think it is: Try to play with fire, dance on the cliff edge, and have a council appointed by Nate Soares or Eliezer Yudkowsky ensure that the project adheres to a sufficiently deep security mindset.

We think the sensible thing to do in this situation is to back off from the fire, and not try to be clever about how close you dance to the edge of the cliff.

This is in stark contrast to the current dominant paradigm at AI labs: responding to warning signs by trying a dozen different patches or work-arounds until the visible signs of danger disappear, and then pushing on even further from there, and getting more warning signs, and making them disappear again, and pushing further. If researchers understood why a problem occurred, they could hope to diagnose and solve the root cause. In the process, they might address other upstream and downstream issues, and other issues in the AI that arise for similar reasons, including ones that didn’t arise in testing. What happens in practice, in modern AI research, is that researchers almost never understand why an AI is exhibiting a certain problem. Instead, they try a variety of different ideas until one happens to work, and declare victory. If the visible issue was a symptom of a deeper issue, the deeper issue goes unaddressed (or gets addressed only by coincidence). This general process optimizes against the visibility of warning signs, even more so than it optimizes against the issues themselves: AIs that hide any sign of concerning behavior will perform optimally, whether they do this by trained-in instinct or as a deliberate strategy.
Ten years in Soares’ case.

Browse

A Note on AI for Medicine and Biotech

Categories