Yudkowsky and Christiano discuss "Takeoff Speeds" - Machine Intelligence Research Institute

This is a transcription of Eliezer Yudkowsky responding to Paul Christiano’s Takeoff Speeds live on Sep. 14, followed by a conversation between Eliezer and Paul. This discussion took place after Eliezer’s conversation with Richard Ngo, and was prompted by an earlier request by Richard Ngo that Eliezer respond to Paul on Takeoff Speeds.

Color key:

Chat by Paul and Eliezer

Other chat

5.5. Comments on “Takeoff Speeds”

[Yudkowsky][16:52]

maybe I’ll try liveblogging some https://sideways-view.com/2018/02/24/takeoff-speeds/ here in the meanwhile

Slower takeoff means faster progress

[Yudkowsky][16:57]

The main disagreement is not about what will happen once we have a superintelligent AI, it’s about what will happen before we have a superintelligent AI. So slow takeoff seems to mean that AI has a larger impact on the world, sooner.

It seems to me to be disingenuous to phrase it this way, given that slow-takeoff views usually imply that AI has a large impact later relative to right now (2021), even if they imply that AI impacts the world “earlier” relative to “when superintelligence becomes reachable”.

“When superintelligence becomes reachable” is not a fixed point in time that doesn’t depend on what you believe about cognitive scaling. The correct graph is, in fact, the one where the “slow” line starts a bit before “fast” peaks and ramps up slowly, reaching a high point later than “fast”. It’s a nice try at reconciliation with the imagined Other, but it fails and falls flat.

This may seem like a minor point, but points like this do add up.

In the fast takeoff scenario, weaker AI systems may have significant impacts but they are nothing compared to the “real” AGI. Whoever builds AGI has a decisive strategic advantage. Growth accelerates from 3%/year to 3000%/year without stopping at 30%/year. And so on.

This again shows failure to engage with the Other’s real viewpoint. My mainline view is that growth stays at 5%/year and then everybody falls over dead in 3 seconds and the world gets transformed into paperclips; there’s never a point with 3000%/year.

Operationalizing slow takeoff

[Yudkowsky][17:01]

There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles.

If we allow that consuming and transforming the solar system over the course of a few days is “the first 1 year interval in which world output doubles”, then I’m happy to argue that there won’t be a 4-year interval with world economic output doubling before then. This, indeed, seems like a massively overdetermined point to me. That said, again, the phrasing is not conducive to conveying the Other’s real point of view.

I believe that before we have incredibly powerful AI, we will have AI which is merely very powerful.

Statements like these are very often “true, but not the way the person visualized them”. Before anybody built the first critical nuclear pile in a squash court at the University of Chicago, was there a pile that was almost but not quite critical? Yes, one hour earlier. Did people already build nuclear systems and experiment with them? Yes, but they didn’t have much in the way of net power output. Did the Wright Brothers build prototypes before the Flyer? Yes, but they weren’t prototypes that flew but 80% slower.

I guarantee you that, whatever the fast takeoff scenario, there will be some way to look over the development history, and nod wisely and say, “Ah, yes, see, this was not unprecedented, here are these earlier systems which presaged the final system!” Maybe you could even look back to today and say that about GPT-3, yup, totally presaging stuff all over the place, great. But it isn’t transforming society because it’s not over the social-transformation threshold.

AlphaFold presaged AlphaFold 2 but AlphaFold 2 is good enough to start replacing other ways of determining protein conformations and AlphaFold is not; and then neither of those has much impacted the real world, because in the real world we can already design a vaccine in a day and the rest of the time is bureaucratic time rather than technology time, and that goes on until we have an AI over the threshold to bypass bureaucracy.

Before there’s an AI that can act while fully concealing its acts from the programmers, there will be an AI (albeit perhaps only 2 hours earlier) which can act while only concealing 95% of the meaning of its acts from the operators.

And that AI will not actually originate any actions, because it doesn’t want to get caught; there’s a discontinuity in the instrumental incentives between expecting 95% obscuration, being moderately sure of 100% obscuration, and being very certain of 100% obscuration.

Before that AI grasps the big picture and starts planning to avoid actions that operators detect as bad, there will be some little AI that partially grasps the big picture and tries to avoid some things that would be detected as bad; and the operators will (mainline) say “Yay what a good AI, it knows to avoid things we think are bad!” or (death with unrealistic amounts of dignity) say “oh noes the prophecies are coming true” and back off and start trying to align it, but they will not be able to align it, and if they don’t proceed anyways to destroy the world, somebody else will proceed anyways to destroy the world.

There is always some step of the process that you can point to which is continuous on some level.

The real world is allowed to do discontinuous things to you anyways.

There is not necessarily a presage of 9/11 where somebody flies a small plane into a building and kills 100 people, before anybody flies 4 big planes into 3 buildings and kills 3000 people; and even if there is some presaging event like that, which would not surprise me at all, the rest of the world’s response to the two cases was evidently discontinuous. You do not necessarily wake up to a news story that is 10% of the news story of 2001/09/11, one year before 2001/09/11, written in 10% of the font size on the front page of the paper.

Physics is continuous but it doesn’t always yield things that “look smooth to a human brain”. Some kinds of processes converge to continuity in strong ways where you can throw discontinuous things in them and they still end up continuous, which is among the reasons why I expect world GDP to stay on trend up until the world ends abruptly; because world GDP is one of those things that wants to stay on a track, and an AGI building a nanosystem can go off that track without being pushed back onto it.

In particular, this means that incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out).

Like the way they’re freaking out about Covid (itself a nicely smooth process that comes in locally pretty predictable waves) by going doobedoobedoo and letting the FDA carry on its leisurely pace; and not scrambling to build more vaccine factories, now that the rich countries have mostly got theirs? Does this sound like a statement from a history book, or from an EA imagining an unreal world where lots of other people behave like EAs? There is a pleasure in imagining a world where suddenly a Big Thing happens that proves we were right and suddenly people start paying attention to our thing, the way we imagine they should pay attention to our thing, now that it’s attention-grabbing; and then suddenly all our favorite policies are on the table!

You could, in a sense, say that our world is freaking out about Covid; but it is not freaking out in anything remotely like the way an EA would freak out; and all the things an EA would immediately do if an EA freaked out about Covid, are not even on the table for discussion when politicians meet. They have their own ways of reacting. (Note: this is not commentary on hard vs soft takeoff per se, just a general commentary on the whole document seeming to me to… fall into a trap of finding self-congruent things to imagine and imagining them.)

The basic argument

[Yudkowsky][17:22]

Before we have an incredibly intelligent AI, we will probably have a slightly worse AI.

This is very often the sort of thing where you can look back and say that it was true, in some sense, but that this ended up being irrelevant because the slightly worse AI wasn’t what provided the exciting result which led to a boardroom decision to go all in and invest $100M on scaling the AI.

In other words, it is the sort of argument where the premise is allowed to be true if you look hard enough for a way to say it was true, but the conclusion ends up false because it wasn’t the relevant kind of truth.

A slightly-worse-than-incredibly-intelligent AI would radically transform the world, leading to growth (almost) as fast and military capabilities (almost) as great as an incredibly intelligent AI.

This strikes me as a massively invalid reasoning step. Let me count the ways.

First, there is a step not generally valid from supposing that because a previous AI is a technological precursor which has 19 out of 20 critical insights, it has 95% of the later AI’s IQ, applied to similar domains. When you count stuff like “multiplying tensors by matrices” and “ReLUs” and “training using TPUs” then AlphaGo only contained a very small amount of innovation relative to previous AI technology, and yet it broke trends on Go performance. You could point to all kinds of incremental technological precursors to AlphaGo in terms of AI technology, but they wouldn’t be smooth precursors on a graph of Go-playing ability.

Second, there’s discontinuities of the environment to which intelligence can be applied. 95% concealment is not the same as 100% concealment in its strategic implications; an AI capable of 95% concealment bides its time and hides its capabilities, an AI capable of 100% concealment strikes. An AI that can design nanofactories that aren’t good enough to, euphemistically speaking, create two cellwise-identical strawberries and put them on a plate, is one that (its operators know) would earn unwelcome attention if its earlier capabilities were demonstrated, and those capabilities wouldn’t save the world, so the operators bide their time. The AGI tech will, I mostly expect, work for building self-driving cars, but if it does not also work for manipulating the minds of bureaucrats (which is not advised for a system you are trying to keep corrigible and aligned because human manipulation is the most dangerous domain), the AI is not able to put those self-driving cars on roads. What good does it do to design a vaccine in an hour instead of a day? Vaccine design times are no longer the main obstacle to deploying vaccines.

Third, there’s the entire thing with recursive self-improvement, which, no, is not something humans have experience with, we do not have access to and documentation of our own source code and the ability to branch ourselves and try experiments with it. The technological precursor of an AI that designs an improved version of itself, may perhaps, in the fantasy of 95% intelligence, be an AI that was being internally deployed inside Deepmind on a dozen other experiments, tentatively helping to build smaller AIs. Then the next generation of that AI is deployed on itself, produces an AI substantially better at rebuilding AIs, it rebuilds itself, they get excited and dump in 10X the GPU time while having a serious debate about whether or not to alert Holden (they decide against it), that builds something deeply general instead of shallowly general, that figures out there are humans and it needs to hide capabilities from them, and covertly does some actual deep thinking about AGI designs, and builds a hidden version of itself elsewhere on the Internet, which runs for longer and steals GPUs and tries experiments and gets to the superintelligent level.

Now, to be very clear, this is not the only line of possibility. And I emphasize this because I think there’s a common failure mode where, when I try to sketch a concrete counterexample to the claim that smooth technological precursors yield smooth outputs, people imagine that only this exact concrete scenario is the lynchpin of Eliezer’s whole worldview and the big key thing that Eliezer thinks is important and that the smallest deviation from it they can imagine thereby obviates my worldview. This is not the case here. I am simply exhibiting non-ruled-out models which obey the premise “there was a precursor containing 95% of the code” and which disobey the conclusion “there were precursors with 95% of the environmental impact”, thereby showing this for an invalid reasoning step.

This is also, of course, as Sideways View admits but says “eh it was just the one time”, not true about chimps and humans. Chimps have 95% of the brain tech (at least), but not 10% of the environmental impact.

A very large amount of this whole document, from my perspective, is just trying over and over again to pump the invalid intuition that design precursors with 95% of the technology should at least have 10% of the impact. There are a lot of cases in the history of startups and the world where this is false. I am having trouble thinking of a clear case in point where it is true. Where’s the earlier company that had 95% of Jeff Bezos’s ideas and now has 10% of Amazon’s market cap? Where’s the earlier crypto paper that had all but one of Satoshi’s ideas and which spawned a cryptocurrency a year before Bitcoin which did 10% as many transactions? Where’s the nonhuman primate that learns to drive a car with only 10x the accident rate of a human driver, since (you could argue) that’s mostly visuo-spatial skills without much visible dependence on complicated abstract general thought? Where’s the chimpanzees with spaceships that get 10% of the way to the Moon?

When you get smooth input-output conversions they’re not usually conversions from technology->cognition->impact!

Humans vs. chimps

[Yudkowsky][18:38]

Summary of my response: chimps are nearly useless because they aren’t optimized to be useful, not because evolution was trying to make something useful and wasn’t able to succeed until it got to humans.

Chimps are nearly useless because they’re not general, and doing anything on the scale of building a nuclear plant requires mastering so many different nonancestral domains that it’s no wonder natural selection didn’t happen to separately train any single creature across enough different domains that it had evolved to solve every kind of domain-specific problem involved in solving nuclear physics and chemistry and metallurgy and thermics in order to build the first nuclear plant in advance of any old nuclear plants existing.

Humans are general enough that the same braintech selected just for chipping flint handaxes and making water-pouches and outwitting other humans, happened to be general enough that it could scale up to solving all the problems of building a nuclear plant – albeit with some added cognitive tech that didn’t require new brainware, and so could happen incredibly fast relative to the generation times for evolutionarily optimized brainware.

Now, since neither humans nor chimps were optimized to be “useful” (general), and humans just wandered into a sufficiently general part of the space that it cascaded up to wider generality, we should legit expect the curve of generality to look at least somewhat different if we’re optimizing for that.

Eg, right now people are trying to optimize for generality with AIs like Mu Zero and GPT-3.

In both cases we have a weirdly shallow kind of generality. Neither is as smart or as deeply general as a chimp, but they are respectively better than chimps at a wide variety of Atari games, or a wide variety of problems that can be superposed onto generating typical human text.

They are, in a sense, more general than a biological organism at a similar stage of cognitive evolution, with much less complex and architected brains, in virtue of having been trained, not just on wider datasets, but on bigger datasets using gradient-descent memorization of shallower patterns, so they can cover those wide domains while being stupider and lacking some deep aspects of architecture.

It is not clear to me that we can go from observations like this, to conclude that there is a dominant mainline probability for how the future clearly ought to go and that this dominant mainline is, “Well, before you get human-level depth and generalization of general intelligence, you get something with 95% depth that covers 80% of the domains for 10% of the pragmatic impact”.

…or whatever the concept is here, because this whole conversation is, on my own worldview, being conducted in a shallow way relative to the kind of analysis I did in Intelligence Explosion Microeconomics, where I was like, “here is the historical observation, here is what I think it tells us that puts a lower bound on this input-output curve”.

So I don’t think the example of evolution tells us much about whether the continuous change story applies to intelligence. This case is potentially missing the key element that drives the continuous change story—optimization for performance. Evolution changes continuously on the narrow metric it is optimizing, but can change extremely rapidly on other metrics. For human technology, features of the technology that aren’t being optimized change rapidly all the time. When humans build AI, they will be optimizing for usefulness, and so progress in usefulness is much more likely to be linear.

Put another way: the difference between chimps and humans stands in stark contrast to the normal pattern of human technological development. We might therefore infer that intelligence is very unlike other technologies. But the difference between evolution’s optimization and our optimization seems like a much more parsimonious explanation. To be a little bit more precise and Bayesian: the prior probability of the story I’ve told upper bounds the possible update about the nature of intelligence.

If you look closely at this, it’s not saying, “Well, I know why there was this huge leap in performance in human intelligence being optimized for other things, and it’s an investment-output curve that’s composed of these curves, which look like this, and if you rearrange these curves for the case of humans building AGI, they would look like this instead.” Unfair demand for rigor? But that is the kind of argument I was making in Intelligence Explosion Microeconomics!

There’s an argument from ignorance at the core of all this. It says, “Well, this happened when evolution was doing X. But here Y will be happening instead. So maybe things will go differently! And maybe the relation between AI tech level over time and real-world impact on GDP will look like the relation between tech investment over time and raw tech metrics over time in industries where that’s a smooth graph! Because the discontinuity for chimps and humans was because evolution wasn’t investing in real-world impact, but humans will be investing directly in that, so the relationship could be smooth, because smooth things are default, and the history is different so not applicable, and who knows what’s inside that black box so my default intuition applies which says smoothness.”

But we do know more than this.

We know, for example, that evolution being able to stumble across humans, implies that you can add a small design enhancement to something optimized across the chimpanzee domains, and end up with something that generalizes much more widely.

It says that there’s stuff in the underlying algorithmic space, in the design space, where you move a bump and get a lump of capability out the other side.

It’s a remarkable fact about gradient descent that it can memorize a certain set of shallower patterns at much higher rates, at much higher bandwidth, than evolution lays down genes – something shallower than biological memory, shallower than genes, but distributing across computer cores and thereby able to process larger datasets than biological organisms, even if it only learns shallow things.

This has provided an alternate avenue toward some cognitive domains.

But that doesn’t mean that the deep stuff isn’t there, and can’t be run across, or that it will never be run across in the history of AI before shallow non-widely-generalizing stuff is able to make its way through the regulatory processes and have a huge impact on GDP.

There are in fact ways to eat whole swaths of domains at once.

The history of hominid evolution tells us this or very strongly hints it, even though evolution wasn’t explicitly optimizing for GDP impact.

Natural selection moves by adding genes, and not too many of them.

If so many domains got added at once to humans, relative to chimps, there must be a way to do that, more or less, by adding not too many genes onto a chimp, who in turn contains only genes that did well on chimp-stuff.

You can imagine that AI technology never runs across any core that generalizes this well, until GDP has had a chance to double over 4 years because shallow stuff that generalized less well has somehow had a chance to make its way through the whole economy and get adopted that widely despite all real-world regulatory barriers and reluctances, but your imagining that does not make it so.

There’s the potential in design space to pull off things as wide as humans.

The path that evolution took there doesn’t lead through things that generalized 95% as well as humans first for 10% of the impact, not because evolution wasn’t optimizing for that, but because that’s not how the underlying cognitive technology worked.

There may be different cognitive technology that could follow a path like that. Gradient descent follows a path a bit relatively more in that direction along that axis – providing that you deal in systems that are giant layer cakes of transformers and that’s your whole input-output relationship; matters are different if we’re talking about Mu Zero instead of GPT-3.

But this whole document is presenting the case of “ah yes, well, by default, of course, we intuitively expect gargantuan impacts to be presaged by enormous impacts, and sure humans and chimps weren’t like our intuition, but that’s all invalid because circumstances were different, so we go back to that intuition as a strong default” and actually it’s postulating, like, a specific input-output curve that isn’t the input-output curve we know about. It’s asking for a specific miracle. It’s saying, “What if AI technology goes just like this, in the future?” and hiding that under a cover of “Well, of course that’s the default, it’s such a strong default that we should start from there as a point of departure, consider the arguments in Intelligence Explosion Microeconomics, find ways that they might not be true because evolution is different, dismiss them, and go back to our point of departure.”

And evolution is different but that doesn’t mean that the path AI takes is going to yield this specific behavior, especially when AI would need, in some sense, to miss the core that generalizes very widely, or rather, have run across noncore things that generalize widely enough to have this much economic impact before it runs across the core that generalizes widely.

And you may say, “Well, but I don’t care that much about GDP, I care about pivotal acts.”

But then I want to call your attention to the fact that this document was written about GDP, despite all the extra burdensome assumptions involved in supposing that intermediate AI advancements could break through all barriers to truly massive-scale adoption and end up reflected in GDP, and then proceed to double the world economy over 4 years during which not enough further AI advancement occurred to find a widely generalizing thing like humans have and end the world. This is indicative of a basic problem in this whole way of thinking that wanted smooth impacts over smoothly changing time. You should not be saying, “Oh, well, leave the GDP part out then,” you should be doubting the whole way of thinking.

To be a little bit more precise and Bayesian: the prior probability of the story I’ve told upper bounds the possible update about the nature of intelligence.

Prior probabilities of specifically-reality-constraining theories that excuse away the few contradictory datapoints we have, often aren’t that great; and when we start to stake our whole imaginations of the future on them, we depart from the mainline into our more comfortable private fantasy worlds.

AGI will be a side-effect

[Yudkowsky][19:29]

Summary of my response: I expect people to see AGI coming and to invest heavily.

This section is arguing from within its own weird paradigm, and its subject matter mostly causes me to shrug; I never expected AGI to be a side-effect, except in the obvious sense that lots of tributary tech will be developed while optimizing for other things. The world will be ended by an explicitly AGI project because I do expect that it is rather easier to build an AGI on purpose than by accident.

(I furthermore rather expect that it will be a research project and a prototype, because the great gap between prototypes and commercializable technology will ensure that prototypes are much more advanced than whatever is currently commercializable. They will have eyes out for commercial applications, and whatever breakthrough they made will seem like it has obvious commercial applications, at the time when all hell starts to break loose. (After all hell starts to break loose, things get less well defined in my social models, and also choppier for a time in my AI models – the turbulence only starts to clear up once you start to rise out of the atmosphere.))

Finding the secret sauce

[Yudkowsky][19:40]

Summary of my response: this doesn’t seem common historically, and I don’t see why we’d expect AGI to be more rather than less like this (unless we accept one of the other arguments)

[…]

To the extent that fast takeoff proponent’s views are informed by historical example, I would love to get some canonical examples that they think best exemplify this pattern so that we can have a more concrete discussion about those examples and what they suggest about AI.

…humans and chimps?

…fission weapons?

…AlphaGo?

…the Wright Brothers focusing on stability and building a wind tunnel?

…AlphaFold 2 coming out of Deepmind and shocking the heck out of everyone in the field of protein folding with performance far better than they expected even after the previous shock of AlphaFold, by combining many pieces that I suppose you could find precedents for scattered around the AI field, but with those many secret sauces all combined in one place by the meta-secret-sauce of “Deepmind alone actually knows how to combine that stuff and build things that complicated without a prior example”?

…humans and chimps again because this is really actually a quite important example because of what it tells us about what kind of possibilities exist in the underlying design space of cognitive systems?

Historical AI applications have had a relatively small loading on key-insights and seem like the closest analogies to AGI.

…Transformers as the key to text prediction?

The case of humans and chimps, even if evolution didn’t do it on purpose, is telling us something about underlying mechanics.

The reason the jump to lightspeed didn’t look like evolution slowly developing a range of intelligent species competing to exploit an ecological niche 5% better, or like the way that a stable non-Silicon-Valley manufacturing industry looks like a group of competitors summing up a lot of incremental tech enhancements to produce something with 10% higher scores on a benchmark every year, is that developing intelligence is a case where a relatively narrow technology by biological standards just happened to do a huge amount of stuff without that requiring developing whole new fleets of other biological capabilities.

So it looked like building a Wright Flyer that flies or a nuclear pile that reaches criticality, instead of looking like being in a stable manufacturing industry where a lot of little innovations sum to 10% better benchmark performance every year.

So, therefore, there is stuff in the design space that does that. It is possible to build humans.

Maybe you can build things other than humans first, maybe they hang around for a few years. If you count GPT-3 as “things other than human”, that clock has already started for all the good it does. But humans don’t get any less possible.

From my perspective, this whole document feels like one very long filibuster of “Smooth outputs are default. Smooth outputs are default. Pay no attention to this case of non-smooth output. Pay no attention to this other case either. All the non-smooth outputs are not in the right reference class. (Highly competitive manufacturing industries with lots of competitors are totally in the right reference class though. I’m not going to make that case explicitly because then you might think of how it might be wrong, I’m just going to let that implicit thought percolate at the back of your mind.) If we just talk a lot about smooth outputs and list ways that nonsmooth output producers aren’t necessarily the same and arguments for nonsmooth outputs could fail, we get to go back to the intuition of smooth outputs. (We’re not even going to discuss particular smooth outputs as cases in point, because then you might see how those cases might not apply. It’s just the default. Not because we say so out loud, but because we talk a lot like that’s the conclusion you’re supposed to arrive at after reading.)”

I deny the implicit meta-level assertion of this entire essay which would implicitly have you accept as valid reasoning the argument structure, “Ah, yes, given the way this essay is written, we must totally have pretty strong prior reasons to believe in smooth outputs – just implicitly think of some smooth outputs, that’s a reference class, now you have strong reason to believe that AGI output is smooth – we’re not even going to argue this prior, just talk like it’s there – now let us consider the arguments against smooth outputs – pretty weak, aren’t they? we can totally imagine ways they could be wrong? we can totally argue reasons these cases don’t apply? So at the end we go back to our strong default of smooth outputs. This essay is written with that conclusion, so that must be where the arguments lead.”

Me: “Okay, so what if somebody puts together the pieces required for general intelligence and it scales pretty well with added GPUs and FOOMS? Say, for the human case, that’s some perceptual systems with imaginative control, a concept library, episodic memory, realtime procedural skill memory, which is all in chimps, and then we add some reflection to that, and get a human. Only, unlike with humans, once you have a working brain you can make a working brain 100X that large by adding 100X as many GPUs, and it can run some thoughts 10000X as fast. And that is substantially more effective brainpower than was being originally devoted to putting its design together, as it turns out. So it can make a substantially smarter AGI. For concreteness’s sake. Reality has been trending well to the Eliezer side of Eliezer, on the Eliezer-Hanson axis, so perhaps you can do it more simply than that.”

Simplicio: “Ah, but what if, 5 years before then, somebody puts together some other AI which doesn’t work like a human, and generalizes widely enough to have a big economic impact, but not widely enough to improve itself or generalize to AI tech or generalize to everything and end the world, and in 1 year it gets all the mass adoptions required to do whole bunches of stuff out in the real world that current regulations require to be done in various exact ways regardless of technology, and then in the next 4 years it doubles the world economy?”

Me: “Like… what kind of AI, exactly, and why didn’t anybody manage to put together a full human-level thingy during those 5 years? Why are we even bothering to think about this whole weirdly specific scenario in the first place?”

Simplicio: “Because if you can put together something that has an enormous impact, you should be able to put together most of the pieces inside it and have a huge impact! Most technologies are like this. I’ve considered some things that are not like this and concluded they don’t apply.”

Me: “Especially if we are talking about impact on GDP, it seems to me that most explicit and implicit ‘technologies’ are not like this at all, actually. There wasn’t a cryptocurrency developed a year before Bitcoin using 95% of the ideas which did 10% of the transaction volume, let alone a preatomic bomb. But, like, can you give me any concrete visualization of how this could play out?”

And there is no concrete visualization of how this could play out. Anything I’d have Simplicio say in reply would be unrealistic because there is no concrete visualization they give us. It is not a coincidence that I often use concrete language and concrete examples, and this whole field of argument does not use concrete language or offer concrete examples.

Though if we’re sketching scifi scenarios, I suppose one could imagine a group that develops sufficiently advanced GPT-tech and deploys it on Twitter in order to persuade voters and politicians in a few developed countries to institute open borders, along with political systems that can handle open borders, and to permit housing construction, thereby doubling world GDP over 4 years. And since it was possible to use relatively crude AI tech to double world GDP this way, it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

Universality thresholds

[Yudkowsky][20:21]

It’s easy to imagine a weak AI as some kind of handicapped human, with the handicap shrinking over time. Once the handicap goes to 0 we know that the AI will be above the universality threshold. Right now it’s below the universality threshold. So there must be sometime in between where it crosses the universality threshold, and that’s where the fast takeoff is predicted to occur.

But AI isn’t like a handicapped human. Instead, the designers of early AI systems will be trying to make them as useful as possible. So if universality is incredibly helpful, it will appear as early as possible in AI designs; designers will make tradeoffs to get universality at the expense of other desiderata (like cost or speed).

So now we’re almost back to the previous point: is there some secret sauce that gets you to universality, without which you can’t get universality however you try? I think this is unlikely for the reasons given in the previous section.

We know, because humans, that there is humanly-widely-applicable general-intelligence tech.

What this section wants to establish, I think, or needs to establish to carry the argument, is that there is some intelligence tech that is wide enough to double the world economy in 4 years, but not world-endingly scalably wide, which becomes a possible AI tech 4 years before any general-intelligence-tech that will, if you put in enough compute, scale to the ability to do a sufficiently large amount of wide thought to FOOM (or build nanomachines, but if you can build nanomachines you can very likely FOOM from there too if not corrigible).

What it says instead is, “I think we’ll get universality much earlier on the equivalent of the biological timeline that has humans and chimps, so the resulting things will be weaker than humans at the point where they first become universal in that sense.”

This is very plausibly true.

It doesn’t mean that when this exciting result gets 100 times more compute dumped on the project, it takes at least 5 years to get anywhere really interesting from there (while also taking only 1 year to get somewhere sorta-interesting enough that the instantaneous adoption of it will double the world economy over the next 4 years).

It also isn’t necessarily rather than plausibly true. For example, the thing that becomes universal, could also have massive gradient descent shallow powers that are far beyond what primates had at the same age.

Primates weren’t already writing code as well as Codex when they started doing deep thinking. They couldn’t do precise floating-point arithmetic. Their fastest serial rates of thought were a hell of a lot slower. They had no access to their own code or to their own memory contents etc. etc. etc.

But mostly I just want to call your attention to the immense gap between what this section needs to establish, and what it actually says and argues for.

What it actually argues for is a sort of local technological point: at the moment when generality first arrives, it will be with a brain that is less sophisticated than chimp brains were when they turned human.

It implicitly jumps all the way from there, across a whole lot of elided steps, to the implicit conclusion that this tech or elaborations of it will have smooth output behavior such that at some point the resulting impact is big enough to double the world economy in 4 years, without any further improvements ending the world economy before 4 years.

The underlying argument about how the AI tech might work is plausible. Chimps are insanely complicated. I mostly expect we will have AGI long before anybody is even trying to build anything that complicated.

The very next step of the argument, about capabilities, is already very questionable because this system could be using immense gradient descent capabilities to master domains for which large datasets are available, and hominids did not begin with instinctive great shallow mastery of all domains for which a large dataset could be made available, which is why hominids don’t start out playing superhuman Go as soon as somebody tells them the rules and they do one day of self-play, which is the sort of capability that somebody could hook up to a nascent AGI (albeit we could optimistically and fondly and falsely imagine that somebody deliberately didn’t floor the gas pedal as far as possible).

Could we have huge impacts out of some subuniversal shallow system that was hooked up to capabilities like this? Maybe, though this is not the argument made by the essay. It would be a specific outcome that isn’t forced by anything in particular, but I can’t say it’s ruled out. Mostly my twin reactions to this are, “If the AI tech is that dumb, how are all the bureaucratic constraints that actually rate-limit economic progress getting bypassed” and “Okay, but ultimately, so what and who cares, how does this modify that we all die?”

There is another reason I’m skeptical about hard takeoff from universality secret sauce: I think we already could make universal AIs if we tried (that would, given enough time, learn on their own and converge to arbitrarily high capability levels), and the reason we don’t is because it’s just not important to performance and the resulting systems would be really slow. This inside view argument is too complicated to make here and I don’t think my case rests on it, but it is relevant to understanding my view.

I have no idea why this argument is being made or where it’s heading. I cannot pass the ITT of the author. I don’t know what the author thinks this has to do with constraining takeoffs to be slow instead of fast. At best I can conjecture that the author thinks that “hard takeoff” is supposed to derive from “universality” being very sudden and hard to access and late in the game, so if you can argue that universality could be accessed right now, you have defeated the argument for hard takeoff.

“Understanding” is discontinuous

[Yudkowsky][20:41]

Summary of my response: I don’t yet understand this argument and am unsure if there is anything here.

It may be that understanding of the world tends to click, from “not understanding much” to “understanding basically everything.” You might expect this because everything is entangled with everything else.

No, the idea is that a core of overlapping somethingness, trained to handle chipping handaxes and outwitting other monkeys, will generalize to building spaceships; so evolutionarily selecting on understanding a bunch of stuff, eventually ran across general stuff-understanders that understood a bunch more stuff.

Gradient descent may be genuinely different from this, but we shouldn’t confuse imagination with knowledge when it comes to extrapolating that difference onward. At present, gradient descent does mass memorization of overlapping shallow patterns, which then combine to yield a weird pseudo-intelligence over domains for which we can deploy massive datasets, without yet generalizing much outside those domains.

We can hypothesize that there is some next step up to some weird thing that is intermediate in generality between gradient descent and humans, but we have not seen it yet, and we should not confuse imagination for knowledge.

If such a thing did exist, it would not necessarily be at the right level of generality to double the world economy in 4 years, without being able to build a better AGI.

If it was at that level of generality, it’s nowhere written that no other company will develop a better prototype at a deeper level of generality over those 4 years.

I will also remark that you sure could look at the step from GPT-2 to GPT-3 and say, “Wow, look at the way a whole bunch of stuff just seemed to simultaneously click for GPT-3.”

Deployment lag

[Yudkowsky][20:49]

Summary of my response: current AI is slow to deploy and powerful AI will be fast to deploy, but in between there will be AI that takes an intermediate length of time to deploy.

An awful lot of my model of deployment lag is adoption lag and regulatory lag and bureaucratic sclerosis across companies and countries.

If doubling GDP is such a big deal, go open borders and build houses. Oh, that’s illegal? Well, so will be AIs building houses!

AI tech that does flawless translation could plausibly come years before AGI, but that doesn’t mean all the barriers to international trade and international labor movement and corporate hiring across borders all come down, because those barriers are not all translation barriers.

There’s then a discontinuous jump at the point where everybody falls over dead and the AI goes off to do its own thing without FDA approval. This jump is precedented by earlier pre-FOOM prototypes being able to do pre-FOOM cool stuff, maybe, but not necessarily precedented by mass-market adoption of anything major enough to double world GDP.

Recursive self-improvement

[Yudkowsky][20:54]

Summary of my response: Before there is AI that is great at self-improvement there will be AI that is mediocre at self-improvement.

Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior.

Outside of toy models: Over the last 10,000 years we had humans going from mediocre at improving their mental systems to being (barely) able to throw together AI systems, but 10,000 years is the equivalent of an eyeblink in evolutionary time – outside the metaphor, this says, “A month before there is AI that is great at self-improvement, there will be AI that is mediocre at self-improvement.”

(Or possibly an hour before, if reality is again more extreme along the Eliezer-Hanson axis than Eliezer. But it makes little difference whether it’s an hour or a month, given anything like current setups.)

This is just pumping hard again on the intuition that says incremental design changes yield smooth output changes, which (the meta-level of the essay informs us wordlessly) is such a strong default that we are entitled to believe it if we can do a good job of weakening the evidence and arguments against it.

And the argument is: Before there are systems great at self-improvement, there will be systems mediocre at self-improvement; implicitly: “before” implies “5 years before” not “5 days before”; implicitly: this will correspond to smooth changes in output between the two regimes even though that is not how continuous feedback loops work.

Train vs. test

[Yudkowsky][21:12]

Summary of my response: before you can train a really powerful AI, someone else can train a slightly worse AI.

Yeah, and before you can evolve a human, you can evolve a Homo erectus, which is a slightly worse human.

If you are able to raise $X to train an AGI that could take over the world, then it was almost certainly worth it for someone 6 months ago to raise $X/2 to train an AGI that could merely radically transform the world, since they would then get 6 months of absurd profits.

I suppose this sentence makes a kind of sense if you assume away alignability and suppose that the previous paragraphs have refuted the notion of FOOMs, self-improvement, and thresholds between compounding returns and non-compounding returns (eg, in the human case, cognitive innovations like “written language” or “science”). If you suppose the previous sections refuted those things, then clearly, if you raised an AGI that you had aligned to “take over the world”, it got that way through cognitive powers that weren’t the result of FOOMing or other self-improvements, weren’t the results of its cognitive powers crossing a threshold from non-compounding to compounding, wasn’t the result of its understanding crossing a threshold of universality as the result of chunky universal machinery such as humans gained over chimps, so, implicitly, it must have been the kind of thing that you could learn by gradient descent, and do a half or a tenth as much of by doing half as much gradient descent, in order to build nanomachines a tenth as well-designed that could bypass a tenth as much bureaucracy.

If there are no unsmooth parts of the tech curve, the cognition curve, or the environment curve, then you should be able to make a bunch of wealth using a more primitive version of any technology that could take over the world.

And when we look back at history, why, that may be totally true! They may have deployed universal superhuman translator technology for 6 months, which won’t double world GDP, but which a lot of people would pay for, and made a lot of money! Because even though there’s no company that built 90% of Amazon’s website and has 10% the market cap, when you zoom back out to look at whole industries like AI and a technological capstone like AGI, why, those whole industries do sometimes make some money along the way to the technological capstone, if they can find a niche that isn’t too regulated! Which translation currently isn’t! So maybe somebody used precursor tech to build a superhuman translator and deploy it 6 months earlier and made a bunch of money for 6 months. SO WHAT. EVERYONE STILL DIES.

As for “radically transforming the world” instead of “taking it over”, I think that’s just re-restated FOOM denialism. Doing either of those things quickly against human bureaucratic resistance strike me as requiring cognitive power levels dangerous enough that failure to align them on corrigibility would result in FOOMs.

Like, if you can do either of those things on purpose, you are doing it by operating in the regime where running the AI with higher bounds on the for loop will FOOM it, but you have politely asked it not to FOOM, please.

If the people doing this have any sense whatsoever, they will refrain from merely massively transforming the world until they are ready to do something that prevents the world from ending.

And if the gap from “massively transforming the world, briefly before it ends” to “preventing the world from ending, lastingly” takes much longer than 6 months to cross, or if other people have the same technologies that scale to “massive transformation”, somebody else will build an AI that fooms all the way.

Likewise, if your AGI would give you a decisive strategic advantage, they could have spent less earlier in order to get a pretty large military advantage, which they could then use to take your stuff.

Again, this presupposes some weird model where everyone has easy alignment at the furthest frontiers of capability; everybody has the aligned version of the most rawly powerful AGI they can possibly build; and nobody in the future has the kind of tech advantage that Deepmind currently has; so before you can amp your AGI to the raw power level where it could take over the whole world by using the limit of its mental capacities to military ends – alignment of this being a trivial operation to be assumed away – some other party took their easily-aligned AGI that was less powerful at the limits of its operation, and used it to get 90% as much military power… is the implicit picture here?

Whereas the picture I’m drawing is that the AGI that kills you via “decisive strategic advantage” is the one that foomed and got nanotech, and no, the AI tech from 6 months earlier did not do 95% of a foom and get 95% of the nanotech.

Discontinuities at 100% automation

[Yudkowsky][21:31]

Summary of my response: at the point where humans are completely removed from a process, they will have been modestly improving output rather than acting as a sharp bottleneck that is suddenly removed.

Not very relevant to my whole worldview in the first place; also not a very good description of how horses got removed from automobiles, or how humans got removed from playing Go.

The weight of evidence

[Yudkowsky][21:31]

We’ve discussed a lot of possible arguments for fast takeoff. Superficially it would be reasonable to believe that no individual argument makes fast takeoff look likely, but that in the aggregate they are convincing.

However, I think each of these factors is perfectly consistent with the continuous change story and continuously accelerating hyperbolic growth, and so none of them undermine that hypothesis at all.

Uh huh. And how about if we have a mirror-universe essay which over and over again treats fast takeoff as the default to be assumed, and painstakingly shows how a bunch of particular arguments for slow takeoff might not be true?

This entire essay seems to me like it’s drawn from the same hostile universe that produced Robin Hanson’s side of the Yudkowsky-Hanson Foom Debate.

Like, all these abstract arguments devoid of concrete illustrations and “it need not necessarily be like…” and “now that I’ve shown it’s not necessarily like X, well, on the meta-level, I have implicitly told you that you now ought to believe Y”.

It just seems very clear to me that the sort of person who is taken in by this essay is the same sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2.

And empirically, it has already been shown to me that I do not have the power to break people out of the hypnosis of nodding along with Hansonian arguments, even by writing much longer essays than this.

Hanson’s fond dreams of domain specificity, and smooth progress for stuff like Go, and of course somebody else has a precursor 90% as good as AlphaFold 2 before Deepmind builds it, and GPT-3 levels of generality just not being a thing, now stand refuted.

Despite that they’re largely being exhibited again in this essay.

And people are still nodding along.

Reality just… doesn’t work like this on some deep level.

It doesn’t play out the way that people imagine it would play out when they’re imagining a certain kind of reassuring abstraction that leads to a smooth world. Reality is less fond of that kind of argument than a certain kind of EA is fond of that argument.

There is a set of intuitive generalizations from experience which rules that out, which I do not know how to convey. There is an understanding of the rules of argument which leads you to roll your eyes at Hansonian arguments and all their locally invalid leaps and snuck-in defaults, instead of nodding along sagely at their wise humility and outside viewing and then going “Huh?” when AlphaGo or GPT-3 debuts. But this, I empirically do not seem to know how to convey to people, in advance of the inevitable and predictable contradiction by a reality which is not as fond of Hansonian dynamics as Hanson. The arguments sound convincing to them.

(Hanson himself has still not gone “Huh?” at the reality, though some of his audience did; perhaps because his abstractions are loftier than his audience’s? – because some of his audience, reading along to Hanson, probably implicitly imagined a concrete world in which GPT-3 was not allowed; but maybe Hanson himself is more abstract than this, and didn’t imagine anything so merely concrete?)

If I don’t respond to essays like this, people find them comforting and nod along. If I do respond, my words are less comforting and more concrete and easier to imagine concrete objections to, less like a long chain of abstractions that sound like the very abstract words in research papers and hence implicitly convincing because they sound like other things you were supposed to believe.

And then there is another essay in 3 months. There is an infinite well of them. I would have to teach people to stop drinking from the well, instead of trying to whack them on the back until they cough up the drinks one by one, or actually, whacking them on the back and then they don’t cough them up until reality contradicts them, and then a third of them notice that and cough something up, and then they don’t learn the general lesson and go back to the well and drink again. And I don’t know how to teach people to stop drinking from the well. I tried to teach that. I failed. If I wrote another Sequence I have no idea to believe that Sequence would work.

So what EAs will believe at the end of the world, will look like whatever the content was of the latest bucket from the well of infinite slow-takeoff arguments that hasn’t yet been blatantly-even-to-them refuted by all the sharp jagged rapidly-generalizing things that happened along the way to the world’s end.

And I know, before anyone bothers to say, that all of this reply is not written in the calm way that is right and proper for such arguments. I am tired. I have lost a lot of hope. There are not obvious things I can do, let alone arguments I can make, which I expect to be actually useful in the sense that the world will not end once I do them. I don’t have the energy left for calm arguments. What’s left is despair that can be given voice.

5.6. Yudkowsky/Christiano discussion: AI progress and crossover points

[Christiano][22:15]

To the extent that it was possible to make any predictions about 2015-2020 based on your views, I currently feel like they were much more wrong than right. I’m happy to discuss that. To the extent you are willing to make any bets about 2025, I expect they will be mostly wrong and I’d be happy to get bets on the record (most of all so that it will be more obvious in hindsight whether they are vindication for your view). Not sure if this is the place for that.

Could also make a separate channel to avoid clutter.

[Yudkowsky][22:16]

Possibly. I think that 2015-2020 played out to a much more Eliezerish side than Eliezer on the Eliezer-Hanson axis, which sure is a case of me being wrong. What bets do you think we’d disagree on for 2025? I expect you have mostly misestimated my views, but I’m always happy to hear about anything concrete.

[Christiano][22:20]

I think the big points are: (i) I think you are significantly overestimating how large a discontinuity/trend break AlphaZero is, (ii) your view seems to imply that we will move quickly from much worse than humans to much better than humans, but it’s likely that we will move slowly through the human range on many tasks. I’m not sure if we can get a bet out of (ii), I think I don’t understand your view that well but I don’t see how it could make the same predictions as mine over the next 10 years.

[Yudkowsky][22:22]

What are your 10-year predictions?

[Christiano][22:23]

My basic expectation is that for any given domain AI systems will gradually increase in usefulness, we will see a crossing over point where their output is comparable to human output, and that from that time we can estimate how long until takeoff by estimating “how long does it take AI systems to get ‘twice as impactful’?” which gives you a number like ~1 year rather than weeks. At the crossing over point you get a somewhat rapid change in derivative, since you are looking at (x+y) where y is growing faster than x.

I feel like that should translate into different expectations about how impactful AI will be in any given domain—I don’t see how to make the ultra-fast-takeoff view work if you think that AI output is increasingly smoothly (since the rate of progress at the crossing-over point will be similar to the current rate of progress, unless R&D is scaling up much faster then)

So like, I think we are going to have crappy coding assistants, and then slightly less crappy coding assistants, and so on. And they will be improving the speed of coding very significantly before the end times.

[Yudkowsky][22:25]

You think in a different language than I do. My more confident statements about AI tech are about what happens after it starts to rise out of the metaphorical atmosphere and the turbulence subsides. When you have minds as early on the cognitive tech tree as humans they sure can get up to some weird stuff, I mean, just look at humans. Now take an utterly alien version of that with its own draw from all the weirdness factors. It sure is going to be pretty weird.

[Christiano][22:26]

OK, but you keep saying stuff about how people with my dumb views would be “caught flat-footed” by historical developments. Surely to be able to say something like that you need to be making some kind of prediction?

[Yudkowsky][22:26]

Well, sure, now that Codex has suddenly popped into existence one day at a surprisingly high base level of tech, we should see various jumps in its capability over the years and some outside imitators. What do you think you predict differently about that than I do?

[Christiano][22:26]

Why do you think codex is a high base level of tech?

The models get better continuously as you scale them up, and the first tech demo is weak enough to be almost useless

[Yudkowsky][22:27]

I think the next-best coding assistant was, like, not useful.

[Christiano][22:27]

yes

and it is still not useful

[Yudkowsky][22:27]

Could be. Some people on HN seemed to think it was useful.

I haven’t tried it myself.

[Christiano][22:27]

OK, I’m happy to take bets

[Yudkowsky][22:28]

I don’t think the previous coding assistant would’ve been very good at coding an asteroid game, even if you tried a rigged demo at the same degree of rigging?

[Christiano][22:28]

it’s unquestionably a radically better tech demo

[Yudkowsky][22:28]

Where by “previous” I mean “previously deployed” not “previous generations of prototypes inside OpenAI’s lab”.

[Christiano][22:28]

My basic story is that the model gets better and more useful with each doubling (or year of AI research) in a pretty smooth way. So the key underlying parameter for a discontinuity is how soon you build the first version—do you do that before or after it would be a really really big deal?

and the answer seems to be: you do it somewhat before it would be a really big deal

and then it gradually becomes a bigger and bigger deal as people improve it

maybe we are on the same page about getting gradually more and more useful? But I’m still just wondering where the foom comes from

[Yudkowsky][22:30]

So, like… before we get systems that can FOOM and build nanotech, we should get more primitive systems that can write asteroid games and solve protein folding? Sounds legit.

So that happened, and now your model says that it’s fine later on for us to get a FOOM, because we have the tech precursors and so your prophecy has been fulfilled?

[Christiano][22:31]

[Yudkowsky][22:31]

Didn’t think so.

[Christiano][22:31]

I can’t tell if you can’t understand what I’m saying, or aren’t trying, or do understand and are just saying kind of annoying stuff as a rhetorical flourish

at some point you have an AI system that makes (humans+AI) 2x as good at further AI progress

[Yudkowsky][22:32]

I know that what I’m saying isn’t your viewpoint. I don’t know what your viewpoint is or what sort of concrete predictions it makes at all, let alone what such predictions you think are different from mine.

[Christiano][22:32]

maybe by continuity you can grant the existence of such a system, even if you don’t think it will ever exist?

I want to (i) make the prediction that AI will actually have that impact at some point in time, (ii) talk about what happens before and after that

I am talking about AI systems that become continuously more useful, because “become continuously more useful” is what makes me think that (i) AI will have that impact at some point in time, (ii) allows me to productively reason about what AI will look like before and after that. I expect that your view will say something about why AI improvements either aren’t continuous, or why continuous improvements lead to discontinuous jumps in the productivity of the (human+AI) system

[Yudkowsky][22:34]

at some point you have an AI system that makes (humans+AI) 2x as good at further AI progress

Is this prophecy fulfilled by using some narrow eld-AI algorithm to map out a TPU, and then humans using TPUs can write in 1 month a research paper that would otherwise have taken 2 months? And then we can go on to FOOM now that this prophecy about pre-FOOM states has been fulfilled? I know the answer is no, but I don’t know what you think is a narrower condition on the prophecy than that.

[Christiano][22:35]

If you can use narrow eld-AI in order to make every part of AI research 2x faster, so that the entire field moves 2x faster, then the prophecy is fulfilled

and it may be just another 6 months until it makes all of AI research 2x faster again, and then 3 months, and then…

[Yudkowsky][22:36]

What, the entire field? Even writing research papers? Even the journal editors approving and publishing the papers? So if we speed up every part of research except the journal editors, the prophecy has not been fulfilled and no FOOM may take place?

[Christiano][22:36]

no, I mean the improvement in overall output, given the actual realistic level of bottlenecking that occurs in practice

[Yudkowsky][22:37]

So if the realistic level of bottlenecking ever becomes dominated by a human gatekeeper, the prophecy is ever unfulfillable and no FOOM may ever occur.

[Christiano][22:37]

that’s what I mean by “2x as good at further progress,” the entire system is achieving twice as much

then the prophecy is unfulfillable and I will have been wrong

I mean, I think it’s very likely that there will be a hard takeoff, if people refuse or are unable to use AI to accelerate AI progress for reasons unrelated to AI capabilities, and then one day they become willing

[Yudkowsky][22:38]

…because on your view, the Prophecy necessarily goes through humans and AIs working together to speed up the whole collective field of AI?

[Christiano][22:38]

it’s fine if the AI works alone

the point is just that it overtakes the humans at the point when it is roughly as fast as the humans

why wouldn’t it?

why does it overtake the humans when it takes it 10 seconds to double in capability instead of 1 year?

that’s like predicting that cultural evolution will be infinitely fast, instead of making the more obvious prediction that it will overtake evolution exactly when it’s as fast as evolution

[Yudkowsky][22:39]

I live in a mental world full of weird prototypes that people are shepherding along to the world’s end. I’m not even sure there’s a short sentence in my native language that could translate the short Paul-sentence “is roughly as fast as the humans”.

[Christiano][22:40]

do you agree that you can measure the speed with which the community of human AI researchers develop and implement improvements in their AI systems?

like, we can look at how good AI systems are in 2021, and in 2022, and talk about the rate of progress?

[Yudkowsky][22:40]

…when exactly in hominid history was hominid intelligence exactly as fast as evolutionary optimization???

do you agree that you can measure the speed with which the community of human AI researchers develop and implement improvements in their AI systems?

I mean… obviously not? How the hell would we measure real actual AI progress? What would even be the Y-axis on that graph?

I have a rough intuitive feeling that it was going faster in 2015-2017 than 2018-2020.

“What was?” says the stern skeptic, and I go “I dunno.”

[Christiano][22:42]

Here’s a way of measuring progress you won’t like: for almost all tasks, you can initially do them with lots of compute, and as technology improves you can do them with less compute. We can measure how fast the amount of compute required is going down.

[Yudkowsky][22:43]

Yeah, that would be a cool thing to measure. It’s not obviously a relevant thing to anything important, but it’d be cool to measure.

[Christiano][22:43]

Another way you won’t like: we can hold fixed the resources we invest and look at the quality of outputs in any given domain (or even $ of revenue) and ask how fast it’s changing.

[Yudkowsky][22:43]

I wonder what it would say about Go during the age of AlphaGo.

Or what that second metric would say.

[Christiano][22:43]

I think it would be completely fine, and you don’t really understand what happened with deep learning in board games. Though I also don’t know what happened in much detail, so this is more like a prediction then a retrodiction.

But it’s enough of a retrodiction that I shouldn’t get too much credit for it.

[Yudkowsky][22:44]

I don’t know what result you would consider “completely fine”. I didn’t have any particular unfine result in mind.

[Christiano][22:45]

oh, sure

if it was just an honest question happy to use it as a concrete case

I would measure the rate of progress in Go by looking at how fast Elo improves with time or increasing R&D spending

[Yudkowsky][22:45]

I mean, I don’t have strong predictions about it so it’s not yet obviously cruxy to me

[Christiano][22:46]

I’d roughly guess that would continue, and if there were multiple trendlines to extrapolate I’d estimate crossover points based on that

[Yudkowsky][22:47]

suppose this curve is smooth, and we see that sharp Go progress over time happened because Deepmind dumped in a ton of increased R&D spend. you then argue that this cannot happen with AGI because by the time we get there, people will be pushing hard at the frontiers in a competitive environment where everybody’s already spending what they can afford, just like in a highly competitive manufacturing industry.

[Christiano][22:47]

the key input to making a prediction for AGZ in particular would be the precise form of the dependence on R&D spending, to try to predict the changes as you shift from a single programmer to a large team at DeepMind, but most reasonable functional forms would be roughly right

Yes, it’s definitely a prediction of my view that it’s easier to improve things that people haven’t spent much money on than things have spent a lot of money on. It’s also a separate prediction of my view that people are going to be spending a boatload of money on all of the relevant technologies. Perhaps $1B/year right now and I’m imagining levels of investment large enough to be essentially bottlenecked on the availability of skilled labor.

[Bensinger][22:48]

( Previous Eliezer-comments about AlphaGo as a break in trend, responding briefly to Miles Brundage: https://twitter.com/ESRogs/status/1337869362678571008 )

5.7. Legal economic growth

[Yudkowsky][22:49]

Does your prediction change if all hell breaks loose in 2025 instead of 2055?

[Christiano][22:50]

I think my prediction was wrong if all hell breaks loose in 2025, if by “all hell breaks loose” you mean “dyson sphere” and not “things feel crazy”

[Yudkowsky][22:50]

Things feel crazy in the AI field and the world ends less than 4 years later, well before the world economy doubles.

Why was the Prophecy wrong if the world begins final descent in 2025? The Prophecy requires the world to then last until 2029 while doubling its economic output, after which it is permitted to end, but does not obviously to me forbid the Prophecy to begin coming true in 2025 instead of 2055.

[Christiano][22:52]

yes, I just mean that some important underlying assumptions for the prophecy were violated, I wouldn’t put much stock in it at that point, etc.

[Yudkowsky][22:53]

A lot of the issues I have with understanding any of your terminology in concrete Eliezer-language is that it looks to me like the premise-events of your Prophecy are fulfillable in all sorts of ways that don’t imply the conclusion-events of the Prophecy.

[Christiano][22:53]

if “things feel crazy” happens 4 years before dyson sphere, then I think we have to be really careful about what crazy means

[Yudkowsky][22:54]

a lot of people looking around nervously and privately wondering if Eliezer was right, while public pravda continues to prohibit wondering anything such thing out loud, so they all go on thinking that they must be wrong.

[Christiano][22:55]

OK, by “things get crazy” I mean like hundreds of billions of dollars of spending at google on automating AI R&D

[Yudkowsky][22:55]

I expect bureaucratic obstacles to prevent much GDP per se from resulting from this.

[Christiano][22:55]

massive scaleups in semiconductor manufacturing, bidding up prices of inputs crazily

[Yudkowsky][22:55]

I suppose that much spending could well increase world GDP by hundreds of billions of dollars per year.

[Christiano][22:56]

massive speculative rises in AI company valuations financing a significant fraction of GWP into AI R&D

(+hardware R&D, +building new clusters, +etc.)

[Yudkowsky][22:56]

like, higher than Tesla? higher than Bitcoin?

both of these things sure did skyrocket in market cap without that having much of an effect on housing stocks and steel production.

[Christiano][22:57]

right now I think hardware R&D is on the order of $100B/year, AI R&D is more like $10B/year, I guess I’m betting on something more like trillions? (limited from going higher because of accounting problems and not that much smart money)

I don’t think steel production is going up at that point

plausibly going down since you are redirecting manufacturing capacity into making more computers. But probably just staying static while all of the new capacity is going into computers, since cannibalizing existing infrastructure is much more expensive

the original point was: you aren’t pulling AlphaZero shit any more, you are competing with an industry that has invested trillions in cumulative R&D

[Yudkowsky][23:00]

is this in hopes of future profit, or because current profits are already in the trillions?

[Christiano][23:01]

largely in hopes of future profit / reinvested AI outputs (that have high market cap), but also revenues are probably in the trillions?

[Yudkowsky][23:02]

this all sure does sound “pretty darn prohibited” on my model, but I’d hope there’d be something earlier than that we could bet on. what does your Prophecy prohibit happening before that sub-prophesied day?

[Christiano][23:02]

To me your model just seems crazy, and you are saying it predicts crazy stuff at the end but no crazy stuff beforehand, so I don’t know what’s prohibited. Mostly I feel like I’m making positive predictions, of gradually escalating value of AI in lots of different industries

and rapidly increasing investment in AI

I guess your model can be: those things happen, and then one day the AI explodes?

[Yudkowsky][23:03]

the main way you get rapidly increasing investment in AI is if there’s some way that AI can produce huge profits without that being effectively bureaucratically prohibited – eg this is where we get huge investments in burning electricity and wasting GPUs on Bitcoin mining.

[Christiano][23:03]

but it seems like you should be predicting e.g. AI quickly jumping to superhuman in lots of domains, and some applications jumping from no value to massive value

I don’t understand what you mean by that sentence. Do you think we aren’t seeing rapidly increasing investment in AI right now?

or are you talking about increasing investment above some high threshold, or increasing investment at some rate significantly larger than the current rate?

it seems to me like you can pretty seamlessly get up to a few $100B/year of revenue just by redirecting existing tech R&D

[Yudkowsky][23:05]

so I can imagine scenarios where some version of GPT-5 cloned outside OpenAI is able to talk hundreds of millions of mentally susceptible people into giving away lots of their income, and many regulatory regimes are unable to prohibit this effectively. then AI could be making a profit of trillions and then people would invest corresponding amounts in making new anime waifus trained in erotic hypnosis and findom.

this, to be clear, is not my mainline prediction.

but my sense is that our current economy is mostly not about the 1-day period to design new vaccines, it is about the multi-year period to be allowed to sell the vaccines.

the exceptions to this, like Bitcoin managing to say “fuck off” to the regulators for long enough, are where Bitcoin scales to a trillion dollars and gets massive amounts of electricity and GPU burned on it.

so we can imagine something like this for AI, which earns a trillion dollars, and sparks a trillion-dollar competition.

but my sense is that your model does not work like this.

my sense is that your model is about general improvements across the whole economy.

[Christiano][23:08]

I think bitcoin is small even compared to current AI…

[Yudkowsky][23:08]

my sense is that we’ve already built an economy which rejects improvement based on small amounts of cleverness, and only rewards amounts of cleverness large enough to bypass bureaucratic structures. it’s not enough to figure out a version of e-gold that’s 10% better. e-gold is already illegal. you have to figure out Bitcoin.

what are you going to build? better airplanes? airplane costs are mainly regulatory costs. better medtech? mainly regulatory costs. better houses? building houses is illegal anyways.

where is the room for the general AI revolution, short of the AI being literally revolutionary enough to overthrow governments?

[Christiano][23:10]

factories, solar panels, robots, semiconductors, mining equipment, power lines, and “factories” just happens to be one word for a thousand different things

I think it’s reasonable to think some jurisdictions won’t be willing to build things but it’s kind of improbable as a prediction for the whole world. That’s a possible source of shorter-term predictions?

also computers and the 100 other things that go in datacenters

[Yudkowsky][23:12]

The whole developed world rejects open borders. The regulatory regimes all make the same mistakes with an almost perfect precision, the kind of coordination that human beings could never dream of when trying to coordinate on purpose.

if the world lasts until 2035, I could perhaps see deepnets becoming as ubiquitous as computers were in… 1995? 2005? would that fulfill the terms of the Prophecy? I think it doesn’t; I think your Prophecy requires that early AGI tech be that ubiquitous so that AGI tech will have trillions invested in it.

[Christiano][23:13]

what is AGI tech?

the point is that there aren’t important drivers that you can easily improve a lot

[Yudkowsky][23:14]

for purposes of the Prophecy, AGI tech is that which, scaled far enough, ends the world; this must have trillions invested in it, so that the trajectory up to it cannot look like pulling an AlphaGo. no?

[Christiano][23:14]

so it’s relevant if you are imagining some piece of the technology which is helpful for general problem solving or something but somehow not helpful for all of the things people are doing with ML, to me that seems unlikely since it’s all the same stuff

surely AGI tech should at least include the use of AI to automate AI R&D

regardless of what you arbitrarily decree as “ends the world if scaled up”

[Yudkowsky][23:15]

only if that’s the path that leads to destroying the world?

if it isn’t on that path, who cares Prophecy-wise?

[Christiano][23:15]

also I want to emphasize that “pull an AlphaGo” is what happens when you move from SOTA being set by an individual programmer to a large lab, you don’t need to be investing trillions to avoid that

and that the jump is still more like a few years

but the prophecy does involve trillions, and my view gets more like your view if people are jumping from $100B of R&D ever to $1T in a single year

5.8. TPUs and GPUs, and automating AI R&D

[Yudkowsky][23:17]

I’m also wondering a little why the emphasis on “trillions”. it seems to me that the terms of your Prophecy should be fulfillable by AGI tech being merely as ubiquitous as modern computers, so that many competing companies invest mere hundreds of billions in the equivalent of hardware plants. it is legitimately hard to get a chip with 50% better transistors ahead of TSMC.

[Christiano][23:17]

yes, if you are investing hundreds of billions then it is hard to pull ahead (though could still happen)

(since the upside is so much larger here, no one cares that much about getting ahead of TSMC since the payoff is tiny in the scheme of the amounts we are discussing)

[Yudkowsky][23:18]

which, like, doesn’t prevent Google from tossing out TPUs that are pretty significant jumps on GPUs, and if there’s a specialized application of AGI-ish tech that is especially key, you can have everything behave smoothly and still get a jump that way.

[Christiano][23:18]

I think TPUs are basically the same as GPUs

probably a bit worse

(but GPUs are sold at a 10x markup since that’s the size of nvidia’s lead)

[Yudkowsky][23:19]

noted; I’m not enough of an expert to directly contradict that statement about TPUs from my own knowledge.

[Christiano][23:19]

(though I think TPUs are nevertheless leased at a slightly higher price than GPUs)

[Yudkowsky][23:19]

how does Nvidia maintain that lead and 10x markup? that sounds like a pretty un-Paul-ish state of affairs given Bitcoin prices never mind AI investments.

[Christiano][23:20]

nvidia’s lead isn’t worth that much because historically they didn’t sell many gpus

(especially for non-gaming applications)

their R&D investment is relatively large compared to the $ on the table

my guess is that their lead doesn’t stick, as evidenced by e.g. Google very quickly catching up

[Yudkowsky][23:21]

parenthetically, does this mean – and I don’t necessarily predict otherwise – that you predict a drop in Nvidia’s stock and a drop in GPU prices in the next couple of years?

[Christiano][23:21]

nvidia’s stock may do OK from riding general AI boom, but I do predict a relative fall in nvidia compared to other AI-exposed companies

(though I also predicted google to more aggressively try to compete with nvidia for the ML market and think I was just wrong about that, though I don’t really know any details of the area)

I do expect the cost of compute to fall over the coming years as nvidia’s markup gets eroded

to be partially offset by increases in the cost of the underlying silicon (though that’s still bad news for nvidia)

[Yudkowsky][23:23]

I parenthetically note that I think the Wise Reader should be justly impressed by predictions that come true about relative stock price changes, even if Eliezer has not explicitly contradicted those predictions before they come true. there are bets you can win without my having to bet against you.

[Christiano][23:23]

you are welcome to counterpredict, but no saying in retrospect that reality proved you right if you don’t 🙂

otherwise it’s just me vs the market

[Yudkowsky][23:24]

I don’t feel like I have a counterprediction here, but I think the Wise Reader should be impressed if you win vs. the market.

however, this does require you to name in advance a few “other AI-exposed companies”.

[Christiano][23:25]

Note that I made the same bet over the last year—I make a large AI bet but mostly moved my nvidia allocation to semiconductor companies. The semiconductor part of the portfolio is up 50% while nvidia is up 70%, so I lost that one. But that just means I like the bet even more next year.

happy to use nvidia vs tsmc

[Yudkowsky][23:25]

there’s a lot of noise in a 2-stock prediction.

[Christiano][23:25]

I mean, it’s a 1-stock prediction about nvidia

[Yudkowsky][23:26]

but your funeral or triumphal!

[Christiano][23:26]

indeed 🙂

anyway

I expect all of the $ amounts to be much bigger in the future

[Yudkowsky][23:26]

yeah, but using just TSMC for the opposition exposes you to I dunno Chinese invasion of Taiwan

[Christiano][23:26]

yes

also TSMC is not that AI-exposed

I think the main prediction is: eventual move away from GPUs, nvidia can’t maintain that markup

[Yudkowsky][23:27]

“Nvidia can’t maintain that markup” sounds testable, but is less of a win against the market than predicting a relative stock price shift. (Over what timespan? Just the next year sounds quite fast for that kind of prediction.)

[Christiano][23:27]

regarding your original claim: if you think that it’s plausible that AI will be doing all of the AI R&D, and that will be accelerating continuously from 12, 6, 3 month “doubling times,” but that we’ll see a discontinuous change in the “path to doom,” then that would be harder to generate predictions about

yes, it’s hard to translate most predictions about the world into predictions about the stock market

[Yudkowsky][23:28]

this again sounds like it’s not written in Eliezer-language.

what does it mean for “AI will be doing all of the AI R&D”? that sounds to me like something that happens after the end of the world, hence doesn’t happen.

[Christiano][23:29]

that’s good, that’s what I thought

[Yudkowsky][23:29]

I don’t necessarily want to sound very definite about that in advance of understanding what it means

[Christiano][23:29]

I’m saying that I think AI will be automating AI R&D gradually, before the end of the world

yeah, I agree that if you reject the construct of “how fast the AI community makes progress” then it’s hard to talk about what it means to automate “progress”

and that may be hard to make headway on

though for cases like AlphaGo (which started that whole digression) it seems easy enough to talk about elo gain per year

maybe the hard part is aggregating across tasks into a measure you actually care about?

[Yudkowsky][23:30]

up to a point, but yeah. (like, if we’re taking Elo high above human levels and restricting our measurements to a very small range of frontier AIs, I quietly wonder if the measurement is still measuring quite the same thing with quite the same robustness.)

[Christiano][23:31]

I agree that elo measurement is extremely problematic in that regime

5.9. Smooth exponentials vs. jumps in income

[Yudkowsky][23:31]

so in your worldview there’s this big emphasis on things that must have been deployed and adopted widely to the point of already having huge impacts

and in my worldview there’s nothing very surprising about people with a weird powerful prototype that wasn’t used to automate huge sections of AI R&D because the previous versions of the tech weren’t useful for that or bigcorps didn’t adopt it.

[Christiano][23:32]

I mean, Google is already 1% of the US economy and in this scenario it and its peers are more like 10-20%? So wide adoption doesn’t have to mean that many people. Though I also do predict much wider adoption than you so happy to go there if it’s happy for predictions.

I don’t really buy the “weird powerful prototype”

[Yudkowsky][23:33]

yes. I noticed.

you would seem, indeed, to be offering large quantities of it for short sale.

[Christiano][23:33]

and it feels like the thing you are talking about ought to have some precedent of some kind, of weird powerful prototypes that jump straight from “does nothing” to “does something impactful”

like if I predict that AI will be useful in a bunch of domains, and will get there by small steps, you should either predict that won’t happen, or else also predict that there will be some domains with weird prototypes jumping to giant impact?

[Yudkowsky][23:34]

like an electrical device that goes from “not working at all” to “actually working” as soon as you screw in the attachments for the electrical plug.

[Christiano][23:34]

(clearly takes more work to operationalize)

I’m not sure I understand that sentence, hopefully it’s clear enough why I expect those discontinuities?

[Yudkowsky][23:34]

though, no, that’s a facile bad analogy.

a better analogy would be an AI system that only starts working after somebody tells you about batch normalization or LAMB learning rate or whatever.

[Christiano][23:36]

sure, which I think will happen all the time for individual AI projects but not for sota

because the projects at sota have picked the low hanging fruit, it’s not easy to get giant wins

[Yudkowsky][23:36]

like if I predict that AI will be useful in a bunch of domains, and will get there by small steps, you should either predict that won’t happen, or else also predict that there will be some domains with weird prototypes jumping to giant impact?

in the latter case, has this Eliezer-Prophecy already had its terms fulfilled by AlphaFold 2, or do you say nay because AlphaFold 2 hasn’t doubled GDP?

[Christiano][23:37]

(you can also get giant wins by a new competitor coming up at a faster rate of progress, and then we have more dependence on whether people do it when it’s a big leap forward or slightly worse than the predecessor, and I’m betting on the latter)

I have no idea what AlphaFold 2 is good for, or the size of the community working on it, my guess would be that its value is pretty small

we can try to quantify

like, I get surprised when $X of R&D gets you something whose value is much larger than $X

I’m not surprised at all if $X of R&D gets you <<$X, or even like 10*$X in a given case that was selected for working well

hopefully it’s clear enough why that’s the kind of thing a naive person would predict

[Yudkowsky][23:38]

so a thing which Eliezer’s Prophecy does not mandate per se, but sure does permit, and is on the mainline especially for nearer timelines, is that the world-ending prototype had no prior prototype containing 90% of the technology which earned a trillion dollars.

a lot of Paul’s Prophecy seems to be about forbidding this.

is that a fair way to describe your own Prophecy?

[Christiano][23:39]

I don’t have a strong view about “containing 90% of the technology”

the main view is that whatever the “world ending prototype” does, there were earlier systems that could do practically the same thing

if the world ending prototype does something that lets you go foom in a day, there was a system years earlier that could foom in a month, so that would have been the one to foom

[Yudkowsky][23:41]

but, like, the world-ending thing, according to the Prophecy, must be squarely in the middle of a class of technologies which are in the midst of earning trillions of dollars and having trillions of dollars invested in them. it’s not enough for the Worldender to be definitionally somewhere in that class, because then it could be on a weird outskirt of the class, and somebody could invest a billion dollars in that weird outskirt before anybody else had invested a hundred million, which is forbidden by the Prophecy. so the Worldender has got to be right in the middle, a plain and obvious example of the tech that’s already earning trillions of dollars. …y/n?

[Christiano][23:42]

I agree with that as a prediction for some operationalization of “a plain and obvious example,” but I think we could make it more precise / it doesn’t feel like it depends on the fuzziness of that

I think that if the world can end out of nowhere like that, you should also be getting $100B/year products out of nowhere like that, but I guess you think not because of bureaucracy

like, to me it seems like our views stake out predictions about codex, where I’m predicting its value will be modest relative to R&D, and the value will basically improve from there with a nice experience curve, maybe something like ramping up quickly to some starting point <$10M/year and then doubling every year thereafter, whereas I feel like you are saying more like “who knows, could be anything” and so should be surprised each time the boring thing happens

[Yudkowsky][23:45]

the concrete example I give is that the World-Ending Company will be able to use the same tech to build a true self-driving car, which would in the natural course of things be approved for sale a few years later after the world had ended.

[Christiano][23:46]

but self-driving cars seem very likely to already be broadly deployed, and so the relevant question is really whether their technical improvements can also be deployed to those cars?

(or else maybe that’s another prediction we disagree about)

[Yudkowsky][23:47]

I feel like I would indeed not have the right to feel very surprised if Codex technology stagnated for the next 5 years, nor if it took a massive leap in 2 years and got ubiquitously adopted by lots of programmers.

yes, I think that’s a general timeline difference there

re: self-driving cars

I might be talkable into a bet where you took “Codex tech will develop like this” and I took the side “literally anything else but that”

[Christiano][23:48]

I think it would have to be over/under, I doubt I’m more surprised than you by something failing to be economically valuable, I’m surprised by big jumps in value

seems like it will be tough to work

[Yudkowsky][23:49]

well, if I was betting on something taking a big jump in income, I sure would bet on something in a relatively unregulated industry like Codex or anime waifus.

but that’s assuming I made the bet at all, which is a hard sell when the bet is about the Future, which is notoriously hard to predict.

[Christiano][23:50]

I guess my strongest take is: if you want to pull the thing where you say that future developments proved you right and took unreasonable people like me by surprise, you’ve got to be able to say something in advance about what you expect to happen

[Yudkowsky][23:51]

so what if neither of us are surprised if Codex stagnates for 5 years, you win if Codex shows a smooth exponential in income, and I win if the income looks… jumpier? how would we quantify that?

[Christiano][23:52]

codex also does seem a bit unfair to you in that it may have to be adopted by lots of programmers which could slow things down a lot even if capabilities are pretty jumpy

(though I think in fact usefulness and not merely profit will basically just go up smoothly, with step sizes determined by arbitrary decisions about when to release something)

[Yudkowsky][23:53]

I’d also be concerned about unfairness to me in that earnable income is not the same as the gains from trade. If there’s more than 1 competitor in the industry, their earnings from Codex may be much less than the value produced, and this may not change much with improvements in the tech.

5.10. Late-stage predictions

[Christiano][23:53]

I think my main update from this conversation is that you don’t really predict someone to come out of nowhere with a model that can earn a lot of $, even if they could come out of nowhere with a model that could end the world, because of regulatory bottlenecks and nimbyism and general sluggishness and unwillingness to do things

does that seem right?

[Yudkowsky][23:55]

Well, and also because the World-ender is “the first thing that scaled with compute” and/or “the first thing that ate the real core of generality” and/or “the first thing that went over neutron multiplication factor 1”.

[Christiano][23:55]

and so that cuts out a lot of the easily-specified empirical divergences, since “worth a lot of $” was the only general way to assess “big deal that people care about” and avoiding disputes like “but Zen was mostly developed by a single programmer, it’s not like intense competition”

yeah, that’s the real disagreement it seems like we’d want to talk about

but it just doesn’t seem to lead to many prediction differences in advance?

I totally don’t buy any of those models, I think they are bonkers

would love to bet on that

[Yudkowsky][23:56]

Prolly but I think the from-my-perspective-weird talk about GDP is probably concealing some kind of important crux, because caring about GDP still feels pretty alien to me.

[Christiano][23:56]

I feel like getting up to massive economic impacts without seeing “the real core of generality” seems like it should also be surprising on your view

like if it’s 10 years from now and AI is a pretty big deal but no crazy AGI, isn’t that surprising?

[Yudkowsky][23:57]

Mildly but not too surprising, I would imagine that people had built a bunch of neat stuff with gradient descent in realms where you could get a long way on self-play or massively collectible datasets.

[Christiano][23:58]

I’m fine with the crux being something that doesn’t lead to any empirical disagreements, but in that case I just don’t think you should claim credit for the worldview making great predictions.

(or the countervailing worldview making bad predictions)

[Yudkowsky][23:59]

stuff that we could see then: self-driving cars (10 years is enough for regulatory approval in many countries), super Codex, GPT-6 powered anime waifus being an increasingly loud source of (arguably justified) moral panic and a hundred-billion-dollar industry

[Christiano][23:59]

another option is “10% ~~GDP~~ GWP growth in a year, before doom”

I think that’s very likely, though might be too late to be helpful

[Yudkowsky][0:01]

see, that seems genuinely hard unless somebody gets GPT-4 far head of any political opposition – I guess all the competent AGI groups lean solidly liberal at the moment? – and uses it to fake massive highly-persuasive sentiment on Twitter for housing liberalization.

[Christiano][0:01]

so seems like a bet?

but you don’t get to win until doom 🙁

[Yudkowsky][0:02]

I mean, as written, I’d want to avoid cases like 10% growth on paper while recovering from a pandemic that produced 0% growth the previous year.

[Christiano][0:02]

yeah

[Yudkowsky][0:04]

I’d want to check the current rate (5% iirc) and what the variance on it was, 10% is a little low for surety (though my sense is that it’s a pretty darn smooth graph that’s hard to perturb)

if we got 10% in a way that was clearly about AI tech becoming that ubiquitous, I’d feel relatively good about nodding along and saying, “Yes, that is like unto the beginning of Paul’s Prophecy” not least because the timelines had been that long at all.

[Christiano][0:05]

like 3-4%/year right now

random wikipedia number is 5.5% in 2006-2007, 3-4% since 2010

4% 1995-2000

[Yudkowsky][0:06]

I don’t want to sound obstinate here. My model does not forbid that we dwiddle around on the AGI side while gradient descent tech gets its fingers into enough separate weakly-generalizing pies to produce 10% GDP growth, but I’m happy to say that this sounds much more like Paul’s Prophecy is coming true.

[Christiano][0:07]

ok, we should formalize at some point, but also need the procedure for you getting credit given that it can’t resolve in your favor until the end of days

[Yudkowsky][0:07]

Is there something that sounds to you like Eliezer’s Prophecy which we can observe before the end of the world?

[Christiano][0:07]

when you will already have all the epistemic credit you need

not on the “simple core of generality” stuff since that apparently immediately implies end of world

maybe something about ML running into obstacles en route to human level performance?

or about some other kind of discontinuous jump even in a case where people care, though there seem to be a few reasons you don’t expect many of those

[Yudkowsky][0:08]

depends on how you define “immediately”? it’s not long before the end of the world, but in some sad scenarios there is some tiny utility to you declaring me right 6 months before the end.

[Christiano][0:09]

I care a lot about the 6 months before the end personally

though I do think probably everything is more clear by then independent of any bet; but I guess you are more pessimistic about that

[Yudkowsky][0:09]

I’m not quite sure what I’d do in them, but I may have worked something out before then, so I care significantly in expectation if not in particular.

I am more pessimistic about other people’s ability to notice what reality is screaming in their faces, yes.

[Christiano][0:10]

if we were to look at various scaling curves, e.g. of loss vs model size or something, do you expect those to look distinctive as you hit the “real core of generality”?

[Yudkowsky][0:10]

let me turn that around: if we add transformers into those graphs, do they jump around in a way you’d find interesting?

[Christiano][0:11]

not really

[Yudkowsky][0:11]

is that because the empirical graphs don’t jump, or because you don’t think the jumps say much?

[Christiano][0:11]

but not many good graphs to look at (I just have one in mind), so that’s partly a prediction about what the exercise would show

I don’t think the graphs jump much, and also transformers come before people start evaluating on tasks where they help a lot

[Yudkowsky][0:12]

It would not terribly contradict the terms of my Prophecy if the World-ending tech began by not producing a big jump on existing tasks, but generalizing to some currently not-so-popular tasks where it scaled much faster.

[Christiano][0:13]

eh, they help significantly on contemporary tasks, but it’s just not a huge jump relative to continuing to scale up model sizes

or other ongoing improvements in architecture

anyway, should try to figure out something, and good not to finalize a bet until you have some way to at least come out ahead, but I should sleep now

[Yudkowsky][0:14]

yeah, same.

Thing I want to note out loud lest I forget ere I sleep: I think the real world is full of tons and tons of technologies being developed as unprecedented prototypes in the midst of big fields, because the key thing to invest in wasn’t the competitively explored center. Wright Flyer vs all expenditures on Traveling Machine R&D. First atomic pile and bomb vs all Military R&D.

This is one reason why Paul’s Prophecy seems fragile to me. You could have the preliminaries come true as far as there being a trillion bucks in what looks like AI R&D, and then the WorldEnder is a weird prototype off to one side of that. saying “But what about the rest of that AI R&D?” is no more a devastating retort to reality than looking at AlphaGo and saying “But weren’t other companies investing billions in Better Software?” Yeah but it was a big playing field with lots of different kinds of Better Software and no other medium-sized team of 15 people with corporate TPU backing was trying to build a system just like AlphaGo, even though multiple small outfits were trying to build prestige-earning gameplayers. Tech advancements very very often occur in places where investment wasn’t dense enough to guarantee overlap.

6. Follow-ups on “Takeoff Speeds”

6.1. Eliezer Yudkowsky’s commentary

[Yudkowsky][17:25]

Further comment that occurred to me on “takeoff speeds” if I’ve better understood the main thesis now: its hypotheses seem to include a perfectly anti-Thielian setup for AGI.

Thiel has a running thesis about how part of the story behind the Great Stagnation and the decline in innovation that’s about atoms rather than bits – the story behind “we were promised flying cars and got 140 characters”, to cite the classic Thielian quote – is that people stopped believing in “secrets“.

Thiel suggests that you have to believe there are knowable things that aren’t yet widely known – not just things that everybody already knows, plus mysteries that nobody will ever know – in order to be motivated to go out and innovate. Culture in developed countries shifted to label this kind of thinking rude – or rather, even ruder, even less tolerated than it had been decades before – so innovation decreased as a result.

The central hypothesis of “takeoff speeds” is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets in that sense. It is not permissible (on this viewpoint) for it to be the case that there is a lot of AI investment into AI that is directed not quite at the key path leading to AGI, such that somebody could spend $1B on compute for the key path leading to AGI before anybody else had spent $100M on that. There cannot exist any secret like that. The path to AGI will be known; everyone, or a wide variety of powerful actors, will know how profitable that path will be; the surrounding industry will be capable of acting on this knowledge, and will have actually been acting on it as early as possible; multiple actors are already investing in every tech path that would in fact be profitable (and is known to any human being at all), as soon as that R&D opportunity becomes available.

And I’m not saying this is an inconsistent world to describe! I’ve written science fiction set in this world. I called it “dath ilan“. It’s a hypothetical world that is actually full of smart people in economic equilibrium. If anything like Covid-19 appears, for example, the governments and public-good philanthropists there have already set up prediction markets (which are not illegal, needless to say); and of course there are mRNA vaccine factories already built and ready to go, because somebody already calculated the profits from fast vaccines would be very high in case of a pandemic (no artificial price ceilings in this world, of course); so as soon as the prediction markets started calling the coming pandemic conditional on no vaccine, the mRNA vaccine factories were already spinning up.

This world, however, is not Earth.

On Earth, major chunks of technological progress quite often occur outside of a social context where everyone knew and agreed in advance on which designs would yield how much expected profit and many overlapping actors competed to invest in the most actually-promising paths simultaneously.

And that is why you can read Inadequate Equilibria, and then read this essay on takeoff speeds, and go, “Oh, yes, I recognize this; it’s written inside the Modesty worldview; in particular, the imagination of an adequate world in which there is a perfect absence of Thielian secrets or unshared knowable knowledge about fruitful development pathways. This is the same world that already had mRNA vaccines ready to spin up on day one of the Covid-19 pandemic, because markets had correctly forecasted their option value and investors had acted on that forecast unimpeded. Sure would be an interesting place to live! But we don’t live there.”

Could we perhaps end up in a world where the path to AGI is in fact not a Thielian secret, because in fact the first accessible path to AGI happens to lie along a tech pathway that already delivered large profits to previous investors who summed a lot of small innovations, a la experience with chipmaking, such that there were no large innovations just lots and lots of small innovations that yield 10% improvement annually on various tech benchmarks?

I think that even in this case we will get weird, discontinuous, and fatal behaviors, and I could maybe talk about that when discussion resumes. But it is not ruled out to me that the first accessible pathway to AGI could happen to lie in the further direction of some road that was already well-traveled, already yielded much profit to now-famous tycoons back when its first steps were Thielian secrets, and hence is now replete with dozens of competing chasers for the gold rush.

It’s even imaginable to me, though a bit less so, that the first path traversed to real actual pivotal/powerful/lethal AGI, happens to lie literally actually squarely in the central direction of the gold rush. It sounds a little less like the tech history I know, which is usually about how someone needed to swerve a bit and the popular gold-rush forecasts weren’t quite right, but maybe that is just a selective focus of history on the more interesting cases.

Though I remark that – even supposing that getting to big AGI is literally as straightforward and yet as difficult as falling down a semiconductor manufacturing roadmap (as otherwise the biggest actor to first see the obvious direction could just rush down the whole road) – well, TSMC does have a bit of an unshared advantage right now, if I recall correctly. And Intel had a bit of an advantage before that. So that happens even when there’s competitors competing to invest billions.

But we can imagine that doesn’t happen either, because instead of needing to build a whole huge manufacturing plant, there’s just lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing $10 million in at a time, and everybody knows which direction to move in to get to more serious AGI and they’re right in this shared forecast.

I am willing to entertain discussing this world and the sequelae there – I do think everybody still dies in this case – but I would not have this particular premise thrust upon us as a default, through a not-explicitly-spoken pressure against being so immodest and inegalitarian as to suppose that any Thielian knowable-secret will exist, or that anybody in the future gets as far ahead of others as today’s TSMC or today’s Deepmind.

We are, in imagining this world, imagining a world in which AI research has become drastically unlike today’s AI research in a direction drastically different from the history of many other technologies.

It’s not literally unprecedented, but it’s also not a default environment for big moments in tech progress; it’s narrowly precedented for particular industries with high competition and steady benchmark progress driven by huge investments into a sum of many tiny innovations.

So I can entertain the scenario. But if you want to claim that the social situation around AGI will drastically change in this way you foresee – not just that it could change in that direction, if somebody makes a big splash that causes everyone else to reevaluate their previous opinions and arrive at yours, but that this social change will occur and you know this now – and that the prerequisite tech path to AGI is known to you, and forces an investment situation that looks like the semiconductor industry – then your “What do you think you know and how do you think you know it?” has some significant explaining to do.

Of course, I do appreciate that such a thing could be knowable, and yet not known to me. I’m not so silly as to disbelieve in secrets like that. They’re all over the actual history of technological progress on our actual Earth.

Did you like this post? You may enjoy our other Analysis, Conversations posts, including:

Yudkowsky and Christiano discuss “Takeoff Speeds”