Yudkowsky on AGI risk on the Bankless podcast

Eliezer gave a very frank overview of his take on AI two weeks ago on the cryptocurrency show Bankless:

I’ve posted a transcript of the show and a follow-up Q&A below.

Thanks to Andrea_Miotti, remember, and vonk for help posting transcripts.

Intro

Eliezer Yudkowsky: [clip] I think that we are hearing the last winds start to blow, the fabric of reality start to fray. This thing alone cannot end the world, but I think that probably some of the vast quantities of money being blindly and helplessly piled into here are going to end up actually accomplishing something.

Ryan Sean Adams: Welcome to Bankless, where we explore the frontier of internet money and internet finance. This is how to get started, how to get better, how to front run the opportunity. This is Ryan Sean Adams. I’m here with David Hoffman, and we’re here to help you become more bankless.

Okay, guys, we wanted to do an episode on AI at Bankless, but I feel like David…

David: Got what we asked for.

Ryan: We accidentally waded into the deep end of the pool here. And I think before we get into this episode, it probably warrants a few comments. I’m going to say a few things I’d like to hear from you too. But one thing I want to tell the listener is, don’t listen to this episode if you’re not ready for an existential crisis. Okay? I’m kind of serious about this. I’m leaving this episode shaken. And I don’t say that lightly. In fact, David, I think you and I will have some things to discuss in the debrief as far as how this impacted you. But this was an impactful one. It sort of hit me during the recording, and I didn’t know fully how to react. I honestly am coming out of this episode wanting to refute some of the claims made in this episode by our guest, Eliezer Yudkowsky, who makes the claim that humanity is on the cusp of developing an AI that’s going to destroy us, and that there’s really not much we can do to stop it.

David: There’s no way around it, yeah.

Ryan: I have a lot of respect for this guest. Let me say that. So it’s not as if I have some sort of big-brained technical disagreement here. In fact, I don’t even know enough to fully disagree with anything he’s saying. But the conclusion is so dire and so existentially heavy that I’m worried about it impacting you, listener, if we don’t give you this warning going in.

I also feel like, David, as interviewers, maybe we could have done a better job. I’ll say this on behalf of myself. Sometimes I peppered him with a lot of questions in one fell swoop, and he was probably only ready to synthesize one at a time.

I also feel like we got caught flat-footed at times. I wasn’t expecting his answers to be so frank and so dire, David. It was just bereft of hope.

And I appreciated very much the honesty, as we always do on Bankless. But I appreciated it almost in the way that a patient might appreciate the honesty of their doctor telling them that their illness is terminal. Like, it’s still really heavy news, isn’t it?

So that is the context going into this episode. I will say one thing. In good news, for our failings as interviewers in this episode, they might be remedied because at the end of this episode, after we finished with hitting the record button to stop recording, Eliezer said he’d be willing to provide an additional Q&A episode with the Bankless community. So if you guys have questions, and if there’s sufficient interest for Eliezer to answer, tweet at us to express that interest. Hit us in Discord. Get those messages over to us and let us know if you have some follow-up questions.

He said if there’s enough interest in the crypto community, he’d be willing to come on and do another episode with follow-up Q&A. Maybe even a Vitalik and Eliezer episode is in store. That’s a possibility that we threw to him. We’ve not talked to Vitalik about that too, but I just feel a little overwhelmed by the subject matter here. And that is the basis, the preamble through which we are introducing this episode.

David, there’s a few benefits and takeaways I want to get into. But before I do, can you comment or reflect on that preamble? What are your thoughts going into this one?

David: Yeah, we approached the end of our agenda—for every Bankless podcast, there’s an equivalent agenda that runs alongside of it. But once we got to this crux of this conversation, it was not possible to proceed in that agenda, because… what was the point?

Ryan: Nothing else mattered.

David: And nothing else really matters, which also just relates to the subject matter at hand. And so as we proceed, you’ll see us kind of circle back to the same inevitable conclusion over and over and over again, which ultimately is kind of the punchline of the content.

I’m of a specific disposition where stuff like this, I kind of am like, “Oh, whatever, okay”, just go about my life. Other people are of different dispositions and take these things more heavily. So Ryan’s warning at the beginning is if you are a type of person to take existential crises directly to the face, perhaps consider doing something else instead of listening to this episode.

Ryan: I think that is good counsel.

So, a few things if you’re looking for an outline of the agenda. We start by talking about ChatGPT. Is this a new era of artificial intelligence? Got to begin the conversation there.

Number two, we talk about what an artificial superintelligence might look like. How smart exactly is it? What types of things could it do that humans cannot do?

Number three, we talk about why an AI superintelligence will almost certainly spell the end of humanity and why it’ll be really hard, if not impossible, according to our guest, to stop this from happening.

And number four, we talk about if there is absolutely anything we can do about all of this. We are heading careening maybe towards the abyss. Can we divert direction and not go off the cliff? That is the question we ask Eliezer.

David, I think you and I have a lot to talk about during the debrief. All right, guys, the debrief is an episode that we record right after the episode. It’s available for all Bankless citizens. We call this the Bankless Premium Feed. You can access that now to get our raw and unfiltered thoughts on the episode. And I think it’s going to be pretty raw this time around, David.

David: I didn’t expect this to hit you so hard.

Ryan: Oh, I’m dealing with it right now.

David: Really?

Ryan: And this is not too long after the episode. So, yeah, I don’t know how I’m going to feel tomorrow, but I definitely want to talk to you about this. And maybe have you give me some counseling. (laughs)

David: I’ll put my psych hat on, yeah.

Ryan: Please! I’m going to need some help.

ChatGPT

Ryan: Bankless Nation, we are super excited to introduce you to our next guest. Eliezer Yudkowsky is a decision theorist. He’s an AI researcher. He’s the seeder of the Less Wrong community blog, a fantastic blog by the way. There’s so many other things that he’s also done. I can’t fit this in the short bio that we have to introduce you to Eliezer.

But most relevant probably to this conversation is he’s working at the Machine Intelligence Research Institute to ensure that when we do make general artificial intelligence, it doesn’t come kill us all. Or at least it doesn’t come ban cryptocurrency, because that would be a poor outcome as well.

Eliezer: (laughs)

Ryan: Eliezer, it’s great to have you on Bankless. How are you doing?

Eliezer: Within one standard deviation of my own peculiar little mean.

Ryan: (laughs) Fantastic. You know, we want to start this conversation with something that jumped onto the scene for a lot of mainstream folks quite recently, and that is ChatGPT. So apparently over 100 million or so have logged on to ChatGPT quite recently. I’ve been playing with it myself. I found it very friendly, very useful. It even wrote me a sweet poem that I thought was very heartfelt and almost human-like.

I know that you have major concerns around AI safety, and we’re going to get into those concerns. But can you tell us in the context of something like a ChatGPT, is this something we should be worried about? That this is going to turn evil and enslave the human race? How worried should we be about ChatGPT and BARD and the new AI that’s entered the scene recently?

Eliezer: ChatGPT itself? Zero. It’s not smart enough to do anything really wrong. Or really right either, for that matter.

Ryan: And what gives you the confidence to say that? How do you know this?

Eliezer: Excellent question. So, every now and then, somebody figures out how to put a new prompt into ChatGPT. You know, one time somebody found that one of the earlier generations of the technology would sound smarter if you first told it it was Eliezer Yudkowsky. There’s other prompts too, but that one’s one of my favorites. So there’s untapped potential in there that people hadn’t figured out how to prompt yet.

But when people figure it out, it moves ahead sufficiently short distances that I do feel fairly confident that there is not so much untapped potential in there that it is going to take over the world. It’s, like, making small movements, and to take over the world it would need a very large movement. There’s places where it falls down on predicting the next line that a human would say in its shoes that seem indicative of “probably that capability just is not in the giant inscrutable matrices, or it would be using it to predict the next line”, which is very heavily what it was optimized for. So there’s going to be some untapped potential in there. But I do feel quite confident that the upper range of that untapped potential is insufficient to outsmart all the living humans and implement the scenario that I’m worried about.

Ryan: Even so, though, is ChatGPT a big leap forward in the journey towards AI in your mind? Or is this fairly incremental, it’s just (for whatever reason) caught mainstream attention?

Eliezer: GPT-3 was a big leap forward. There’s rumors about GPT-4, which, who knows? ChatGPT is a commercialization of the actual AI-in-the-lab giant leap forward. If you had never heard of GPT-3 or GPT-2 or the whole range of text transformers before ChatGPT suddenly entered into your life, then that whole thing is a giant leap forward. But it’s a giant leap forward based on a technology that was published in, if I recall correctly, 2018.

David: I think that what’s going around in everyone’s minds right now—and the Bankless listenership (and crypto people at large) are largely futurists, so everyone (I think) listening understands that in the future, there will be sentient AIs perhaps around us, at least by the time that we all move on from this world.

So we all know that this future of AI is coming towards us. And when we see something like ChatGPT, everyone’s like, “Oh, is this the moment in which our world starts to become integrated with AI?” And so, Eliezer, you’ve been tapped into the world of AI. Are we onto something here? Or is this just another fad that we will internalize and then move on for? And then the real moment of generalized AI is actually much further out than we’re initially giving credit for. Like, where are we in this timeline?

Eliezer: Predictions are hard, especially about the future. I sure hope that this is where it saturates — this or the next generation, it goes only this far, it goes no further. It doesn’t get used to make more steel or build better power plants, first because that’s illegal, and second because the large language model technologies’ basic vulnerability is that it’s not reliable. It’s good for applications where it works 80% of the time, but not where it needs to work 99.999% of the time. This class of technology can’t drive a car because it will sometimes crash the car.

So I hope it saturates there. I hope they can’t fix it. I hope we get, like, a 10-year AI winter after this.

This is not what I actually predict. I think that we are hearing the last winds start to blow, the fabric of reality start to fray. This thing alone cannot end the world. But I think that probably some of the vast quantities of money being blindly and helplessly piled into here are going to end up actually accomplishing something.

Not most of the money—that just never happens in any field of human endeavor. But 1% of $10 billion is still a lot of money to actually accomplish something.

AGI

Ryan: So listeners, I think you’ve heard Eliezer’s thesis on this, which is pretty dim with respect to AI alignment—and we’ll get into what we mean by AI alignment—and very worried about AI-safety-related issues.

But I think for a lot of people to even worry about AI safety and for us to even have that conversation, I think they have to have some sort of grasp of what AGI looks like. I understand that to mean “artificial general intelligence” and this idea of a super-intelligence.

Can you tell us: if there was a superintelligence on the scene, what would it look like? I mean, is this going to look like a big chat box on the internet that we can all type things into? It’s like an oracle-type thing? Or is it like some sort of a robot that is going to be constructed in a secret government lab? Is this, like, something somebody could accidentally create in a dorm room? What are we even looking for when we talk about the term “AGI” and “superintelligence”?

Eliezer: First of all, I’d say those are pretty distinct concepts. ChatGPT shows a very wide range of generality compared to the previous generations of AI. Not very wide generality compared to GPT-3—not literally the lab research that got commercialized, that’s the same generation. But compared to stuff from 2018 or even 2020, ChatGPT is better at a much wider range of things without having been explicitly programmed by humans to be able to do those things.

To imitate a human as best it can, it has to capture all of the things that humans can think about that it can, which is not all the things. It’s still not very good at long multiplication (unless you give it the right instructions, in which case suddenly it can do it).

It’s significantly more general than the previous generation of artificial minds. Humans were significantly more general than the previous generation of chimpanzees, or rather Australopithecus or last common ancestor.

Humans are not fully general. If humans were fully general, we’d be as good at coding as we are at football, throwing things, or running. Some of us are okay at programming, but we’re not spec’d for it. We’re not fully general minds.

You can imagine something that’s more general than a human, and if it runs into something unfamiliar, it’s like, okay, let me just go reprogram myself a bit and then I’ll be as adapted to this thing as I am to anything else.

So ChatGPT is less general than a human, but it’s genuinely ambiguous, I think, whether it’s more or less general than (say) our cousins, the chimpanzees. Or if you don’t believe it’s as general as a chimpanzee, a dolphin or a cat.

Ryan: So this idea of general intelligence is sort of a range of things that it can actually do, a range of ways it can apply itself?

Eliezer: How wide is it? How much reprogramming does it need? How much retraining does it need to make it do a new thing?

Bees build hives, beavers build dams, a human will look at a beehive and imagine a honeycomb shaped dam. That’s. like, humans alone in the animal kingdom. But that doesn’t mean that we are general intelligences, it means we’re significantly more generally applicable intelligences than chimpanzees.

It’s not like we’re all that narrow. We can walk on the moon. We can walk on the moon because there’s aspects of our intelligence that are made in full generality for universes that contain simplicities, regularities, things that recur over and over again. We understand that if steel is hard on Earth, it may stay hard on the moon. And because of that, we can build rockets, walk on the moon, breathe amid the vacuum.

Chimpanzees cannot do that, but that doesn’t mean that humans are the most general possible things. The thing that is more general than us, that figures that stuff out faster, is the thing to be scared of if the purposes to which it turns its intelligence are not ones that we would recognize as nice things, even in the most cosmopolitan and embracing senses of what’s worth doing.

Efficiency

Ryan: And you said this idea of a general intelligence is different than the concept of superintelligence, which I also brought into that first part of the question. How is superintelligence different than general intelligence?

Eliezer: Well, because ChatGPT has a little bit of general intelligence. Humans have more general intelligence. A superintelligence is something that can beat any human and the entire human civilization at all the cognitive tasks. I don’t know if the efficient market hypothesis is something where I can rely on the entire…

Ryan: We’re all crypto investors here. We understand the efficient market hypothesis for sure.

Eliezer: So the efficient market hypothesis is of course not generally true. It’s not true that literally all the market prices are smarter than you. It’s not true that all the prices on earth are smarter than you. Even the most arrogant person who is at all calibrated, however, still thinks that the efficient market hypothesis is true relative to them 99.99999% of the time. They only think that they know better about one in a million prices.

They might be important prices. The price of Bitcoin is an important price. It’s not just a random price. But if the efficient market hypothesis was only true to you 90% of the time, you could just pick out the 10% of the remaining prices and double your money every day on the stock market. And nobody can do that. Literally nobody can do that.

So this property of relative efficiency that the market has to you, that the price’s estimate of the future price already has all the information you have—not all the information that exists in principle, maybe not all the information that the best equity could, but it’s efficient relative to you.

For you, if you pick out a random price, like the price of Microsoft stock, something where you’ve got no special advantage, that estimate of its price a week later is efficient relative to you. You can’t do better than that price.

We have much less experience with the notion of instrumental efficiency, efficiency in choosing actions, because actions are harder to aggregate estimates about than prices. So you have to look at, say, AlphaZero playing chess—or just, you know, whatever the latest Stockfish number is, an advanced chess engine.

When it makes a chess move, you can’t do better than that chess move. It may not be the optimal chess move, but if you pick a different chess move, you’ll do worse. That you’d call a kind of efficiency of action. Given its goal of winning the game, once you know its move—unless you consult some more powerful AI than Stockfish—you can’t figure out a better move than that.

A superintelligence is like that with respect to everything, with respect to all of humanity. It is relatively efficient to humanity. It has the best estimates—not perfect estimates, but the best estimates—and its estimates contain all the information that you’ve got about it. Its actions are the most efficient actions for accomplishing its goals. If you think you see a better way to accomplish its goals, you’re mistaken.

Ryan: So you’re saying [if something is a] superintelligence, we’d have to imagine something that knows all of the chess moves in advance. But here we’re not talking about chess, we’re talking about everything. It knows all of the moves that we would make and the most optimum pattern, including moves that we would not even know how to make, and it knows these things in advance.

I mean, how would human beings sort of experience such a superintelligence? I think we still have a very hard time imagining something smarter than us, just because we’ve never experienced anything like it before.

Of course, we all know somebody who’s genius-level IQ, maybe quite a bit smarter than us, but we’ve never encountered something like what you’re describing, some sort of mind that is superintelligent.

What sort of things would it be doing that humans couldn’t? How would we experience this in the world?

Eliezer: I mean, we do have some tiny bit of experience with it. We have experience with chess engines, where we just can’t figure out better moves than they make. We have experience with market prices, where even though your uncle has this really long, elaborate story about Microsoft stock, you just know he’s wrong. Why is he wrong? Because if he was correct, it would already be incorporated into the stock price.

And especially because the market’s efficiency is not perfect, like that whole downward swing and then upward move in COVID. I have friends who made more money off that than I did, but I still managed to buy back into the broader stock market on the exact day of the low—basically coincidence. So the markets aren’t perfectly efficient, but they’re efficient almost everywhere.

And that sense of deference, that sense that your weird uncle can’t possibly be right because the hedge funds would know it—you know. unless he’s talking about COVID, in which case maybe he is right if you have the right choice of weird uncle! I have weird friends who are maybe better at calling these things than your weird uncle. So among humans, it’s subtle.

And then with superintelligence, it’s not subtle, just massive advantage. But not perfect. It’s not that it knows every possible move you make before you make it. It’s that it’s got a good probability distribution about that. And it has figured out all the good moves you could make and figured out how to reply to those.

And I mean, in practice, what’s that like? Well, unless it’s limited, narrow superintelligence, I think you mostly don’t get to observe it because you are dead, unfortunately.

Ryan: What? (laughs)

Eliezer: Like, Stockfish makes strictly better chess moves than you, but it’s playing on a very narrow board. And the fact that it’s better at you than chess doesn’t mean it’s better at you than everything. And I think that the actual catastrophe scenario for AI looks like big advancement in a research lab, maybe driven by them getting a giant venture capital investment and being able to spend 10 times as much on GPUs as they did before, maybe driven by a new algorithmic advance like transformers, maybe driven by hammering out some tweaks in last year’s algorithmic advance that gets the thing to finally work efficiently. And the AI there goes over a critical threshold, which most obviously could be like, “can write the next AI”.

That’s so obvious that science fiction writers figured it out almost before there were computers, possibly even before there were computers. I’m not sure what the exact dates here are. But if it’s better at you than everything, it’s better at you than building AIs. That snowballs. It gets an immense technological advantage. If it’s smart, it doesn’t announce itself. It doesn’t tell you that there’s a fight going on. It emails out some instructions to one of those labs that’ll synthesize DNA and synthesize proteins from the DNA and get some proteins mailed to a hapless human somewhere who gets paid a bunch of money to mix together some stuff they got in the mail in a file. Like, smart people will not do this for any sum of money. Many people are not smart. Builds the ribosome, but the ribosome that builds things out of covalently bonded diamondoid instead of proteins folding up and held together by Van der Waals forces, builds tiny diamondoid bacteria. The diamondoid bacteria replicate using atmospheric carbon, hydrogen, oxygen, nitrogen, and sunlight. And a couple of days later, everybody on earth falls over dead in the same second.

That’s the disaster scenario if it’s as smart as I am. If it’s smarter, it might think of a better way to do things. But it can at least think of that if it’s relatively efficient compared to humanity because I’m in humanity and I thought of it.

Ryan: This is—I’ve got a million questions, but I’m gonna let David go first.

David: Yeah. So we sped run the introduction of a number of different concepts, which I want to go back and take our time to really dive into.

There’s the AI alignment problem. There’s AI escape velocity. There is the question of what happens when AIs are so incredibly intelligent that humans are to AIs what ants are to us.

And so I want to kind of go back and tackle these, Eliezer, one by one.

We started this conversation talking about ChatGPT, and everyone’s up in arms about ChatGPT. And you’re saying like, yes, it’s a great step forward in the generalizability of some of the technologies that we have in the AI world. All of a sudden ChatGPT becomes immensely more useful and it’s really stoking the imaginations of people today.

But what you’re saying is it’s not the thing that’s actually going to be the thing to reach escape velocity and create superintelligent AIs that perhaps might be able to enslave us. But my question to you is, how do we know when that—

Eliezer: Not enslave. They don’t enslave you, but sorry, go on.

David: Yeah, sorry.

Ryan: Murder, David. Kill all of us. Eliezer was very clear on that.

David: So if it’s not ChatGPT, how close are we? Because there’s this unknown event horizon where you kind of alluded to it, where we make this AI that we train it to create a smarter AI and that smart AI is so incredibly smart that it hits escape velocity and all of a sudden these dominoes fall. How close are we to that point? And are we even capable of answering that question?

Eliezer: How the heck would I know?

Ryan: Well, when you were talking, Eliezer, if we had already crossed that event horizon, a smart AI wouldn’t necessarily broadcast that to the world. I mean, it’s possible we’ve already crossed that event horizon, is it not?

Eliezer: I mean, it’s theoretically possible, but seems very unlikely. Somebody would need inside their lab an AI that was much more advanced than the public AI technology. And as far as I currently know, the best labs and the best people are throwing their ideas to the world! Like, they don’t care.

And there’s probably some secret government labs with secret government AI researchers. My pretty strong guess is that they don’t have the best people and that those labs could not create ChatGPT on their own because ChatGPT took a whole bunch of fine twiddling and tuning and visible access to giant GPU farms and that they don’t have the people who know how to do the twiddling and tuning. This is just a guess.

AI Alignment

David: Could you walk us through—one of the big things that you spend a lot of time on is this thing called the AI alignment problem. Some people are not convinced that when we create AI, that AI won’t really just be fundamentally aligned with humans. I don’t believe that you fall into that camp. I think you fall into the camp of when we do create this superintelligent, generalized AI, we are going to have a hard time aligning with it in terms of our morality and our ethics.

Can you walk us through a little bit of that thought process? Why do you feel disaligned?

Ryan: The dumb way to ask that question too is like, Eliezer, why do you think that the AI automatically hates us? Why is it going to—

Eliezer: It doesn’t hate you.

Ryan: Why does it want to kill us all?

Eliezer: The AI doesn’t hate you, neither does it love you, and you’re made of atoms that it can use for something else.

David: It’s indifferent to you.

Eliezer: It’s got something that it actually does care about, which makes no mention of you. And you are made of atoms that it can use for something else. That’s all there is to it in the end.

The reason you’re not in its utility function is that the programmers did not know how to do that. The people who built the AI, or the people who built the AI that built the AI that built the AI, did not have the technical knowledge that nobody on earth has at the moment as far as I know, whereby you can do that thing and you can control in detail what that thing ends up caring about.

David: So this feels like humanity is hurdling itself towards what we’re calling, again, an event horizon where there’s this AI escape velocity, and there’s nothing on the other side. As in, we do not know what happens past that point as it relates to having some sort of superintelligent AI and how it might be able to manipulate the world. Would you agree with that?

Eliezer: No.

Again, the Stockfish chess-playing analogy. You cannot predict exactly what move it would make, because in order to predict exactly what move it would make, you would have to be at least that good at chess, and it’s better than you.

This is true even if it’s just a little better than you. Stockfish is actually enormously better than you, to the point that once it tells you the move, you can’t figure out a better move without consulting a different AI. But even if it was just a bit better than you, then you’re in the same position.

This kind of disparity also exists between humans. If you ask me, where will Garry Kasparov move on this chessboard? I’m like, I don’t know, maybe here. Then if Garry Kasparov moves somewhere else, it doesn’t mean that he’s wrong, it means that I’m wrong. If I could predict exactly where Garry Kasparov would move on a chessboard, I’d be Garry Kasparov. I’d be at least that good at chess. Possibly better. I could also be able to predict him, but also see an even better move than that.

That’s an irreducible source of uncertainty with respect to superintelligence, or anything that’s smarter than you. If you could predict exactly what it would do, you’d be that smart yourself. It doesn’t mean you can predict no facts about it.

With Stockfish in particular, I can predict it’s going to win the game. I know what it’s optimizing for. I know where it’s trying to steer the board. I can’t predict exactly what the board will end up looking like after Stockfish has finished winning its game against me. I can predict it will be in the class of states that are winning positions for black or white or whichever color Stockfish picked, because, you know, it wins either way.

And that’s similarly where I’m getting the prediction about everybody being dead, because if everybody were alive, then there’d be some state that the superintelligence preferred to that state, which is all of the atoms making up these people and their farms are being used for something else that it values more.

So if you postulate that everybody’s still alive, I’m like, okay, well, why is it you’re postulating that Stockfish made a stupid chess move and ended up with a non-winning board position? That’s where that class of predictions come from.

Ryan: Can you reinforce this argument, though, a little bit? So, why is it that an AI can’t be nice, sort of like a gentle parent to us, rather than sort of a murderer looking to deconstruct our atoms and apply for use somewhere else?

What are its goals? And why can’t they be aligned to at least some of our goals? Or maybe, why can’t it get into a status which is somewhat like us and the ants, which is largely we just ignore them unless they interfere in our business and come in our house and raid our cereal boxes?

Eliezer: There’s a bunch of different questions there. So first of all, the space of minds is very wide. Imagine this giant sphere and all the humans are in this one tiny corner of the sphere. We’re all basically the same make and model of car, running the same brand of engine. We’re just all painted slightly different colors.

Somewhere in that mind space, there’s things that are as nice as humans. There’s things that are nicer than humans. There are things that are trustworthy and nice and kind in ways that no human can ever be. And there’s even things that are so nice that they can understand the concept of leaving you alone and doing your own stuff sometimes instead of hanging around trying to be obsessively nice to you every minute and all the other famous disaster scenarios from ancient science fiction (“With Folded Hands” by Jack Williamson is the one I’m quoting there.)

We don’t know how to reach into mind design space and pluck out an AI like that. It’s not that they don’t exist in principle. It’s that we don’t know how to do it. And I’ll hand back the conversational ball now and figure out, like, which next question do you want to go down there?

Ryan: Well, I mean, why? Why is it so difficult to align an AI with even our basic notions of morality?

Eliezer: I mean, I wouldn’t say that it’s difficult to align an AI with our basic notions of morality. I’d say that it’s difficult to align an AI on a task like “take this strawberry, and make me another strawberry that’s identical to this strawberry down to the cellular level, but not necessarily the atomic level”. So it looks the same under like a standard optical microscope, but maybe not a scanning electron microscope. Do that. Don’t destroy the world as a side effect.

Now, this does intrinsically take a powerful AI. There’s no way you can make it easy to align by making it stupid. To build something that’s cellular identical to a strawberry—I mean, mostly I think the way that you do this is with very primitive nanotechnology, but we could also do it using very advanced biotechnology. And these are not technologies that we already have. So it’s got to be something smart enough to develop new technology.

Never mind all the subtleties of morality. I think we don’t have the technology to align an AI to the point where we can say, “Build me a copy of the strawberry and don’t destroy the world.”

Why do I think that? Well, case in point, look at natural selection building humans. Natural selection mutates the humans a bit, runs another generation. The fittest ones reproduce more, their genes become more prevalent to the next generation. Natural selection hasn’t really had very much time to do this to modern humans at all, but you know, the hominid line, the mammalian line, go back a few million generations. And this is an example of an optimization process building an intelligence.

And natural selection asked us for only one thing: “Make more copies of your DNA. Make your alleles more relatively prevalent in the gene pool.” Maximize your inclusive reproductive fitness—not just your own reproductive fitness, but your two brothers or eight cousins, as the joke goes, because they’ve got on average one copy of your genes. This is all we were optimized for, for millions of generations, creating humans from scratch, from the first accidentally self-replicating molecule.

Internally, psychologically, inside our minds, we do not know what genes are. We do not know what DNA is. We do not know what alleles are. We have no concept of inclusive genetic fitness until our scientists figure out what that even is. We don’t know what we were being optimized for. For a long time, many humans thought they’d been created by God!

When you use the hill-climbing paradigm and optimize for one single extremely pure thing, this is how much of it gets inside.

In the ancestral environment, in the exact distribution that we were originally optimized for, humans did tend to end up using their intelligence to try to reproduce more. Put them into a different environment, and all the little bits and pieces and fragments of optimizing for fitness that were in us now do totally different stuff. We have sex, but we wear condoms.

If natural selection had been a foresightful, intelligent kind of engineer that was able to engineer things successfully, it would have built us to be revolted by the thought of condoms. Men would be lined up and fighting for the right to donate to sperm banks. And in our natural environment, the little drives that got into us happened to lead to more reproduction, but distributional shift: run the humans out of their distribution over which they were optimized, and you get totally different results.

And gradient descent would by default do—not quite the same thing, it’s going to do a weirder thing because natural selection has a much narrower information bottleneck. In one sense, you could say that natural selection was at an advantage because it finds simpler solutions. You could imagine some hopeful engineer who just built intelligences using gradient descent and found out that they end up wanting these thousands and millions of little tiny things, none of which were exactly what the engineer wanted, and being like, well, let’s try natural selection instead. It’s got a much sharper information bottleneck. It’ll find the simple specification of what I want.

But we actually get there as humans. And then, gradient descent, probably may be even worse.

But more importantly, I’m just pointing out that there is no physical law, computational law, mathematical/logical law, saying when you optimize using hill-climbing on a very simple, very sharp criterion, you get a general intelligence that wants that thing.

Ryan: So just like natural selection, our tools are too blunt in order to get to that level of granularity to program in some sort of morality into these super intelligent systems?

Eliezer: Or build me a copy of a strawberry without destroying the world. Yeah. The tools are too blunt.

David: So I just want to make sure I’m following with what you were saying. I think the conclusion that you left me with is that my brain, which I consider to be at least decently smart, is actually a byproduct, an accidental byproduct of this desire to reproduce. And it’s actually just like a tool that I have, and just like conscious thought is a tool, which is a useful tool in means of that end.

And so if we’re applying this to AI and AI’s desire to achieve some certain goal, what’s the parallel there?

Eliezer: I mean, every organ in your body is a reproductive organ. If it didn’t help you reproduce, you would not have an organ like that. Your brain is no exception. This is merely conventional science and merely the conventional understanding of the world. I’m not saying anything here that ought to be at all controversial. I’m sure it’s controversial somewhere, but within a pre-filtered audience, it should not be at all controversial. And this is, like, the obvious thing to expect to happen with AI, because why wouldn’t it? What new law of existence has been invoked, whereby this time we optimize for a thing and we get a thing that wants exactly what we optimized for on the outside?

AI Goals

Ryan: So what are the types of goals an AI might want to pursue? What types of utility functions is it going to want to pursue off the bat? Is it just those it’s been programmed with, like make an identical strawberry?

Eliezer: Well, the whole thing I’m saying is that we do not know how to get goals into a system. We can cause them to do a thing inside a distribution they were optimized over using gradient descent. But if you shift them outside of that distribution, I expect other weird things start happening. When they reflect on themselves, other weird things start happening.

What kind of utility functions are in there? I mean, darned if I know. I think you’d have a pretty hard time calling the shape of humans from advance by looking at natural selection, the thing that natural selection was optimizing for, if you’d never seen a human or anything like a human.

If we optimize them from the outside to predict the next line of human text, like GPT-3—I don’t actually think this line of technology leads to the end of the world, but maybe it does, in like GPT-7—there’s probably a bunch of stuff in there too that desires to accurately model things like humans under a wide range of circumstances, but it’s not exactly humans, because: ice cream.

Ice cream didn’t exist in the natural environment, the ancestral environment, the environment of evolutionary adaptedness. There was nothing with that much sugar, salt, fat combined together as ice cream. We are not built to want ice cream. We were built to want strawberries, honey, a gazelle that you killed and cooked and had some fat in it and was therefore nourishing and gave you the all-important calories you need to survive, salt, so you didn’t sweat too much and run out of salt. We evolved to want those things, but then ice cream comes along and it fits those taste buds better than anything that existed in the environment that we were optimized over.

So, a very primitive, very basic, very unreliable wild guess, but at least an informed kind of wild guess: Maybe if you train a thing really hard to predict humans, then among the things that it likes are tiny little pseudo things that meet the definition of “human” but weren’t in its training data and that are much easier to predict, or where the problem of predicting them can be solved in a more satisfying way, where “satisfying” is not like human satisfaction, but some other criterion of “thoughts like this are tasty because they help you predict the humans from the training data”. (shrugs)

Consensus

David: Eliezer, when we talk about all of these ideas about the ways that AI thought will be fundamentally not able to be understood by the ways that humans think, and then all of a sudden we see this rotation by venture capitalists by just pouring money into AI, do alarm bells go off in your head? Like, hey guys, you haven’t thought deeply about these subject matters yet? Does the immense amount of capital going into AI investments scare you?

Eliezer: I mean, alarm bells went off for me in 2015, which is when it became obvious that this is how it was going to go down. I sure am now seeing the realization of that stuff I felt alarmed about back then.

Ryan: Eliezer, is this view that AI is incredibly dangerous and that AGI is going to eventually end humanity and that we’re just careening toward a precipice, would you say this is the consensus view now, or are you still somewhat of an outlier? And why aren’t other smart people in this field as alarmed as you? Can you steel-man their arguments?

Eliezer: You’re asking, again, several questions sequentially there. Is it the consensus view? No. Do I think that the people in the wider scientific field who dispute this point of view—do I think they understand it? Do I think they’ve done anything like an impressive job of arguing against it at all? No.

If you look at the famous prestigious scientists who sometimes make a little fun of this view in passing, they’re making up arguments rather than deeply considering things that are held to any standard of rigor, and people outside their own fields are able to validly shoot them down.

I have no idea how to pronounce his last name. Francis Chollet said something about, I forget his exact words, but it was something like, I never hear any good arguments for stuff. I was like, okay, here’s some good arguments for stuff. You can read the reply from Yudkowsky to Chollet and Google that, and that’ll give you some idea of what the eminent voices versus the reply to the eminent voices sound like. And Scott Aronson, who at the time was off on complexity theory, he was like, “That’s not how no free lunch theorems work”, correctly.

I think the state of affairs is we have eminent scientific voices making fun of this possibility, but not engaging with the arguments for it.

Now, if you step away from the eminent scientific voices, you can find people who are more familiar with all the arguments and disagree with me. And I think they lack security mindset. I think that they’re engaging in the sort of blind optimism that many, many scientific fields throughout history have engaged in, where when you’re approaching something for the first time, you don’t know why it will be hard, and you imagine easy ways to do things. And the way that this is supposed to naturally play out over the history of a scientific field is that you run out and you try to do the things and they don’t work, and you go back and you try to do other clever things and they don’t work either, and you learn some pessimism and you start to understand the reasons why the problem is hard.

The field of artificial intelligence itself recapitulated this very common ontogeny of a scientific field, where initially we had people getting together at the Dartmouth conference. I forget what their exact famous phrasing was, but it’s something like, “We are wanting to address the problem of getting AIs to, you know, like understand language, improve themselves”, and I forget even what else was there. A list of what now sound like grand challenges. “And we think we can make substantial progress on this using 10 researchers for two months.” And I think that at the core is what’s going on.

They have not run into the actual problems of alignment. They aren’t trying to get ahead of the game. They’re not trying to panic early. They’re waiting for reality to hit them onto the head and turn them into grizzled old cynics of their scientific field who understand the reasons why things are hard. They’re content with the predictable life cycle of starting out as bright-eyed youngsters, waiting for reality to hit them over the head with the news. And if it wasn’t going to kill everybody the first time that they’re really wrong, it’d be fine! You know, this is how science works! If we got unlimited free retries and 50 years to solve everything, it’d be okay. We could figure out how to align AI in 50 years given unlimited retries.

You know, the first team in with the bright-eyed optimists would destroy the world and people would go, oh, well, you know, it’s not that easy. They would try something else clever. That would destroy the world. People would go like, oh, well, you know, maybe this field is actually hard. Maybe this is actually one of the thorny things like computer security or something. And so what exactly went wrong last time? Why didn’t these hopeful ideas play out? Oh, like you optimize for one thing on the outside and you get a different thing on the inside. Wow. That’s really basic. All right. Can we even do this using gradient descent? Can you even build this thing out of giant inscrutable matrices of floating point numbers that nobody understands at all? You know, maybe we need different methodology. And 50 years later, you’d have an aligned AGI.

If we got unlimited free retries without destroying the world, it’d be, you know, it’d play out the same way that ChatGPT played out. It’s, you know, from 1956 or 1955 or whatever it was to 2023. So, you know, about 70 years, give or take a few. And, you know, just like we can do the stuff that they wanted to do in the summer of 1955, you know, 70 years later, you’d have your aligned AGI.

Problem is that the world got destroyed in the meanwhile. And that’s why, you know, that’s the problem there.

God Mode and Aliens

David: So this feels like a gigantic Don’t Look Up scenario. If you’re familiar with that movie, it’s a movie about this asteroid hurtling to Earth, but it becomes popular and in vogue to not look up and not notice it. And Eliezer, you’re the guy who’s saying like, hey, there’s an asteroid. We have to do something about it. And if we don’t, it’s going to come destroy us.

If you had God mode over the progress of AI research and just innovation and development, what choices would you make that humans are not currently making today?

Eliezer: I mean, I could say something like shut down all the large GPU clusters. How long do I have God mode? Do I get to like stick around for seventy years?

David: You have God mode for the 2020 decade.

Eliezer: For the 2020 decade. All right. That does make it pretty hard to do things.

I think I shut down all the GPU clusters and get all of the famous scientists and brilliant, talented youngsters—the vast, vast majority of whom are not going to be productive and where government bureaucrats are not going to be able to tell who’s actually being helpful or not, but, you know—put them all on a large island, and try to figure out some system for filtering the stuff through to me to give thumbs up or thumbs down on that is going to work better than scientific bureaucrats producing entire nonsense.

Because, you know, the trouble is—the reason why scientific fields have to go through this long process to produce the cynical oldsters who know that everything is difficult. It’s not that the youngsters are stupid. You know, sometimes youngsters are fairly smart. You know, Marvin Minsky, John McCarthy back in 1955, they weren’t idiots. You know, privileged to have met both of them. They didn’t strike me as idiots. They were very old, and they still weren’t idiots. But, you know, it’s hard to see what’s coming in advance of experimental evidence hitting you over the head with it.

And if I only have the decade of the 2020s to run all the researchers on this giant island somewhere, it’s really not a lot of time. Mostly what you’ve got to do is invent some entirely new AI paradigm that isn’t the giant inscrutable matrices of floating point numbers on gradient descent. Because I’m not really seeing what you can do that’s clever with that, that doesn’t kill you and that you know doesn’t kill you and doesn’t kill you the very first time you try to do something clever like that.

You know, I’m sure there’s a way to do it. And if you got to try over and over again, you could find it.

Ryan: Eliezer, do you think every intelligent civilization has to deal with this exact problem that humanity is dealing with now? Of how do we solve this problem of aligning with an advanced general intelligence?

Eliezer: I expect that’s much easier for some alien species than others. Like, there are alien species who might arrive at “this problem” in an entirely different way. Maybe instead of having two entirely different information processing systems, the DNA and the neurons, they’ve only got one system. They can trade memories around heritably by swapping blood sexually. Maybe the way in which they “confront this problem” is that very early in their evolutionary history, they have the equivalent of the DNA that stores memories and processes, computes memories, and they swap around a bunch of it, and it adds up to something that reflects on itself and makes itself coherent, and then you’ve got a superintelligence before they have invented computers. And maybe that thing wasn’t aligned, but how do you even align it when you’re in that kind of situation? It’d be a very different angle on the problem.

Ryan: Do you think every advanced civilization is on the trajectory to creating a superintelligence at some point in its history?

Eliezer: Maybe there’s ones in universes with alternate physics where you just can’t do that. Their universe’s computational physics just doesn’t support that much computation. Maybe they never get there. Maybe their lifespans are long enough and their star lifespans short enough that they never get to the point of a technological civilization before their star does the equivalent of expanding or exploding or going out and their planet ends.

“Every alien species” covers a lot of territory, especially if you talk about alien species and universes with physics different from this one.

Ryan: Well, talking about our present universe, I’m curious if you’ve been confronted with the question of, well, then why haven’t we seen some sort of superintelligence in our universe when we look out at the stars? Sort of the Fermi paradox type of question. Do you have any explanation for that?

Eliezer: Oh, well, supposing that they got killed by their own AIs doesn’t help at all with that because then we’d see the AIs.

Ryan: And do you think that’s what happens? Yeah, it doesn’t help with that. We would see evidence of AIs, wouldn’t we?

Eliezer: Yeah.

Ryan: Yes. So why don’t we?

Eliezer: I mean, the same reason we don’t see evidence of the alien civilizations not with AIs.

And that reason is, although it doesn’t really have much to do with the whole AI thesis one way or another, because they’re too far away—or so says Robin Hanson, using a very clever argument about the apparent difficulty of hard steps in humanity’s evolutionary history to further induce the rough gap between the hard steps. … And, you know, I can’t really do justice to this. If you look up grabby aliens, you can…

Ryan: Grabby aliens?

David: I remember this.

Eliezer: Grabby aliens. You can find Robin Hanson’s very clever argument for how far away the aliens are…

Ryan: There’s an entire website, Bankless listeners, there’s an entire website called grabbyaliens.com you can go look at.

Eliezer: Yeah. And that contains by far the best answer I’ve seen, to:

“Where are they?” (Answer: too far away for us to see, even if they’re traveling here at nearly light speed.)
How far away are they?
And how do we know that?

(laughs) But, yeah.

Ryan: This is amazing.

Eliezer: There is not a very good way to simplify the argument, any more than there is to simplify the notion of zero-knowledge proofs. It’s not that difficult, but it’s just very not easy to simplify. But if you have a bunch of locks that are all of different difficulties, and a limited time in which to solve all the locks, such that anybody who gets through all the locks must have gotten through them by luck, all the locks will take around the same amount of time to solve, even if they’re all of very different difficulties. And that’s the core of Robin Hanson’s argument for how far away the aliens are, and how do we know that? (shrugs)

Good Outcomes

Ryan: Eliezer, I know you’re very skeptical that there will be a good outcome when we produce an artificial general intelligence. And I said when, not if, because I believe that’s your thesis as well, of course. But is there the possibility of a good outcome? I know you are working on AI alignment problems, which leads me to believe that you have greater than zero amount of hope for this project. Is there the possibility of a good outcome? What would that look like, and how do we go about achieving it?

Eliezer: It looks like me being wrong. I basically don’t see on-model hopeful outcomes at this point. We have not done those things that it would take to earn a good outcome, and this is not a case where you get a good outcome by accident.

If you have a bunch of people putting together a new operating system, and they’ve heard about computer security, but they’re skeptical that it’s really that hard, the chance of them producing a secure operating system is effectively zero.

That’s basically the situation I see ourselves in with respect to AI alignment. I have to be wrong about something—which I certainly am. I have to be wrong about something in a way that makes the problem easier rather than harder for those people who don’t think that alignment’s going to be all that hard.

If you’re building a rocket for the first time ever, and you’re wrong about something, it’s not surprising if you’re wrong about something. It’s surprising if the thing that you’re wrong about causes the rocket to go twice as high on half the fuel you thought was required and be much easier to steer than you were afraid of.

Ryan: So, are you…

David: Where the alternative was, “If you’re wrong about something, the rocket blows up.”

Eliezer: Yeah. And then the rocket ignites the atmosphere, is the problem there.

O rather: a bunch of rockets blow up, a bunch of rockets go places… The analogy I usually use for this is, very early on in the Manhattan Project, they were worried about “What if the nuclear weapons can ignite fusion in the nitrogen in the atmosphere?” And they ran some calculations and decided that it was incredibly unlikely for multiple angles, so they went ahead, and were correct. We’re still here. I’m not going to say that it was luck, because the calculations were actually pretty solid.

An AI is like that, but instead of needing to refine plutonium, you can make nuclear weapons out of a billion tons of laundry detergent. The stuff to make them is fairly widespread. It’s not a tightly controlled substance. And they spit out gold up until they get large enough, and then they ignite the atmosphere, and you can’t calculate how large is large enough. And a bunch of the CEOs running these projects are making fun of the idea that it’ll ignite the atmosphere.

It’s not a very helpful situation.

David: So the economic incentive to produce this AI—one of the things why ChatGPT has sparked the imaginations of so many people is that everyone can imagine products. Products are being imagined left and right about what you can do with something like ChatGPT. There’s this meme at this point of people leaving to go start their ChatGPT startup.

The metaphor is that what you’re saying is that there’s this generally available resource spread all around the world, which is ChatGPT, and everyone’s hammering it in order to make it spit out gold. But you’re saying if we do that too much, all of a sudden the system will ignite the whole entire sky, and then we will all…

Eliezer: Well, no. You can run ChatGPT any number of times without igniting the atmosphere. That’s about what research labs at Google and Microsoft—counting DeepMind as part of Google and counting OpenAI as part of Microsoft—that’s about what the research labs are doing, bringing more metaphorical Plutonium together than ever before. Not about how many times you run the things that have been built and not destroyed the world yet.

You can do any amount of stuff with ChatGPT and not destroy the world. It’s not that smart. It doesn’t get smarter every time you run it.

Ryan’s Childhood Questions

Ryan: Can I ask some questions that the 10-year-old in me wants to really ask about this? I’m asking these questions because I think a lot of listeners might be thinking them too, so knock off some of these easy answers for me.

If we create some sort of unaligned, let’s call it “bad” AI, why can’t we just create a whole bunch of good AIs to go fight the bad AIs and solve the problem that way? Can there not be some sort of counterbalance in terms of aligned human AIs and evil AIs, and there be some sort of battle of the artificial minds here?

Eliezer: Nobody knows how to create any good AIs at all. The problem isn’t that we have 20 good AIs and then somebody finally builds an evil AI. The problem is that the first very powerful AI is evil, nobody knows how to make it good, and then it kills everybody before anybody can make it good.

Ryan: So there is no known way to make a friendly, human-aligned AI whatsoever, and you don’t know of a good way to go about thinking through that problem and designing one. Neither does anyone else, is what you’re telling us.

Eliezer: I have some idea of what I would do if there were more time. Back in the day, we had more time. Humanity squandered it. I’m not sure there’s enough time left now. I have some idea of what I would do if I were in a 25-year-old body and had $10 billion.

Ryan: That would be the island scenario of “You’re God for 10 years and you get all the researchers on an island and go really hammer for 10 years at this problem”?

Eliezer: If I have buy-in from a major government that can run actual security precautions and more than just $10 billion, then you could run a whole Manhattan Project about it, sure.

Ryan: This is another question that the 10-year-old in me wants to know. Why is it that, Eliezer, people listening to this episode or people listening to the concerns or reading the concerns that you’ve written down and published, why can’t everyone get on board who’s building an AI and just all agree to be very, very careful? Is that not a sustainable game-theoretic position to have? Is this a coordination problem, more of a social problem than anything else? Or, like, why can’t that happen?

I mean, we have so far not destroyed the world with nuclear weapons, and we’ve had them since the 1940s.

Eliezer: Yeah, this is harder than nuclear weapons. This is a lot harder than nuclear weapons.

Ryan: Why is this harder? And why can’t we just coordinate to just all agree internationally that we’re going to be very careful, put restrictions on this, put regulations on it, do something like that?

Eliezer: Current heads of major labs seem to me to be openly contemptuous of these issues. That’s where we’re starting from. The politicians do not understand it.

There are distortions of these ideas that are going to sound more appealing to them than “everybody suddenly falls over dead”, which is a thing that I think actually happens. “Everybody falls over dead” just doesn’t inspire the monkey political parts of our brain somehow. Because it’s not like, “Oh no, what if terrorists get the AI first?” It’s like, it doesn’t matter who gets it first. Everybody falls over dead.

And yeah, so you’re describing a world coordinating on something that is relatively hard to coordinate. So, could we, if we tried starting today, prevent anyone from getting a billion pounds of laundry detergent in one place worldwide, control the manufacturing of laundry detergent, only have it manufactured in particular places, not concentrate lots of it together, enforce it on every country?

Y’know, if it was legible, if it was clear that a billion pounds of laundry detergent in one place would end the world, if you could calculate that, if all the scientists calculated it arrived at the same answer and told the politicians that maybe, maybe humanity would survive, even though smaller amounts of laundry detergent spit out gold.

The threshold can’t be calculated. I don’t know how you’d convince the politicians. We definitely don’t seem to have had much luck convincing those CEOs whose job depends on them not caring, to care. Caring is easy to fake. It’s easy to hire a bunch of people to be your “AI safety team” and redefine “AI safety” as having the AI not say naughty words. Or, you know, I’m speaking somewhat metaphorically here for reasons.

But, you know, it’s the basic problem that we have is like trying to build a secure OS before we run up against a really smart attacker. And there’s all kinds of, like, fake security. “It’s got a password file! This system is secure! It only lets you in if you type a password!” And if you never go up against a really smart attacker, if you never go far out of distribution against a powerful optimization process looking for holes, you know, then how does a bureaucracy come to know that what they’re doing is not the level of computer security that they need? The way you’re supposed to find this out, the way that scientific fields historically find this out, the way that fields of computer science historically find this out, the way that crypto found this out back in the early days, is by having the disaster happen!

And we’re not even that good at learning from relatively minor disasters! You know, like, COVID swept the world. Did the FDA or the CDC learn anything about “Don’t tell hospitals that they’re not allowed to use their own tests to detect the coming plague”? Are we installing UV-C lights in public spaces or in ventilation systems to prevent the next respiratory pandemic? You know, we lost a million people and we sure did not learn very much as far as I can tell for next time.

We could have an AI disaster that kills a hundred thousand people—how do you even do that? Robotic cars crashing into each other? Have a bunch of robotic cars crashing into each other! It’s not going to look like that was the fault of artificial general intelligence because they’re not going to put AGIs in charge of cars. They’re going to pass a bunch of regulations that’s going to affect the entire AGI disaster or not at all.

What does the winning world even look like here? How in real life did we get from where we are now to this worldwide ban, including against North Korea and, you know, some one rogue nation whose dictator doesn’t believe in all this nonsense and just wants the gold that these AIs spit out? How did we get there from here? How do we get to the point where the United States and China signed a treaty whereby they would both use nuclear weapons against Russia if Russia built a GPU cluster that was too large? How did we get there from here?

David: Correct me if I’m wrong, but this seems to be kind of just like a topic of despair? I’m talking to you now and hearing your thought process about, like, there is no known solution and the trajectory’s not great. Do you think all hope is lost here?

Eliezer: I’ll keep on fighting until the end, which I wouldn’t do if I had literally zero hope. I could still be wrong about something in a way that makes this problem somehow much easier than it currently looks. I think that’s how you go down fighting with dignity.

Ryan: “Go down fighting with dignity.” That’s the stage you think we’re at.

I want to just double-click on what you were just saying. Part of the case that you’re making is humanity won’t even see this coming. So it’s not like a coordination problem like global warming where every couple of decades we see the world go up by a couple of degrees, things get hotter, and we start to see these effects over time. The characteristics or the advent of an AGI in your mind is going to happen incredibly quickly, and in such a way that we won’t even see the disaster until it’s imminent, until it’s upon us…?

Eliezer: I mean, if you want some kind of, like, formal phrasing, then I think that superintelligence will kill everyone before non-superintelligent AIs have killed one million people. I don’t know if that’s the phrasing you’re looking for there.

Ryan: I think that’s a fairly precise definition, and why? What goes into that line of thought?

Eliezer: I think that the current systems are actually very weak. I don’t know, maybe I could use the analogy of Go, where you had systems that were finally competitive with the pros, where “pro” is like the set of ranks in Go, and then a year later, they were challenging the world champion and winning. And then another year, they threw out all the complexities and the training from human databases of Go games and built a new system, AlphaGo Zero, that trained itself from scratch. No looking at the human playbooks, no special-purpose code, just a general purpose game-player being specialized to Go, more or less.

And, three days—there’s a quote from Gwern about this, which I forget exactly, but it was something like, “We know how long AlphaGo Zero, or AlphaZero (two different systems), was equivalent to a human Go player. And it was, like, 30 minutes on the following floor of such-and-such DeepMind building.”

Maybe the first system doesn’t improve that quickly, and they build another system that does—And all of that with AlphaGo over the course of years, going from “it takes a long time to train” to “it trains very quickly and without looking at the human playbook”, that’s not with an artificial intelligence system that improves itself, or even that gets smarter as you run it, the way that human beings (not just as you evolve them, but as you run them over the course of their own lifetimes) improve.

So if the first system doesn’t improve fast enough to kill everyone very quickly, they will build one that’s meant to spit out more gold than that.

And there could be weird things that happen before the end. I did not see ChatGPT coming, I did not see Stable Diffusion coming, I did not expect that we would have AIs smoking humans in rap battles before the end of the world. Ones that are clearly much dumber than us.

Ryan: It’s kind of a nice send-off, I guess, in some ways.

Trying to Resist

Ryan: So you said that your hope is not zero, and you are planning to fight to the end. What does that look like for you? I know you’re working at MIRI, which is the Machine Intelligence Research Institute. This is a non-profit that I believe that you’ve set up to work on these AI alignment and safety issues. What are you doing there? What are you spending your time on? How do we actually fight until the end? If you do think that an end is coming, how do we try to resist?

Eliezer: I’m actually on something of a sabbatical right now, which is why I have time for podcasts. It’s a sabbatical from, you know, like, been doing this 20 years. It became clear we were all going to die. I felt kind of burned out, taking some time to rest at the moment. When I dive back into the pool, I don’t know, maybe I will go off to Conjecture or Anthropic or one of the smaller concerns like Redwood Research—Redwood Research being the only ones I really trust at this point, but they’re tiny—and try to figure out if I can see anything clever to do with the giant inscrutable matrices of floating point numbers.

Maybe I just write, continue to try to explain in advance to people why this problem is hard instead of as easy and cheerful as the current people who think they’re pessimists think it will be. I might not be working all that hard compared to how I used to work. I’m older than I was. My body is not in the greatest of health these days. Going down fighting doesn’t necessarily imply that I have the stamina to fight all that hard. I wish I had prettier things to say to you here, but I do not.

Ryan: No, this is… We intended to save probably the last part of this episode to talk about crypto, the metaverse, and AI and how this all intersects. But I gotta say, at this point in the episode, it all kind of feels pointless to go down that track.

We were going to ask questions like, well, in crypto, should we be worried about building sort of a property rights system, an economic system, a programmable money system for the AIs to sort of use against us later on? But it sounds like the easy answer from you to those questions would be, yeah, absolutely. And by the way, none of that matters regardless. You could do whatever you’d like with crypto. This is going to be the inevitable outcome no matter what.

Let me ask you, what would you say to somebody listening who maybe has been sobered up by this conversation? If a version of you in your 20s does have the stamina to continue this battle and to actually fight on behalf of humanity against this existential threat, where would you advise them to spend their time? Is this a technical issue? Is this a social issue? Is it a combination of both? Should they educate? Should they spend time in the lab? What should a person listening to this episode do with these types of dire straits?

Eliezer: I don’t have really good answers. It depends on what your talents are. If you’ve got the very deep version of the security mindset, the part where you don’t just put a password on your system so that nobody can walk in and directly misuse it, but the kind where you don’t just encrypt the password file even though nobody’s supposed to have access to the password file in the first place, and that’s already an authorized user, but the part where you hash the passwords and salt the hashes. If you’re the kind of person who can think of that from scratch, maybe take your hand at alignment.

If you can think of an alternative to the giant inscrutable matrices, then, you know, don’t tell the world about that. I’m not quite sure where you go from there, but maybe you work with Redwood Research or something.

A whole lot of this problem is that even if you do build an AI that’s limited in some way, somebody else steals it, copies it, runs it themselves, and takes the bounds off the for loops and the world ends.

So there’s that. You think you can do something clever with the giant inscrutable matrices? You’re probably wrong. If you have the talent to try to figure out why you’re wrong in advance of being hit over the head with it, and not in a way where you just make random far-fetched stuff up as the reason why it won’t work, but where you can actually keep looking for the reason why it won’t work…

We have people in crypto[graphy] who are good at breaking things, and they’re the reason why anything is not on fire. Some of them might go into breaking AI systems instead, because that’s where you learn anything.

You know: Any fool can build a crypto[graphy] system that they think will work. Breaking existing cryptographical systems is how we learn who the real experts are. So maybe the people finding weird stuff to do with AIs, maybe those people will come up with some truth about these systems that makes them easier to align than I suspect.

How do I put it… The saner outfits do have uses for money. They don’t really have scalable uses for money, but they do burn any money literally at all. Like, if you gave MIRI a billion dollars, I would not know how to…

Well, at a billion dollars, I might try to bribe people to move out of AI development, that gets broadcast to the whole world, and move to the equivalent of an island somewhere—not even to make any kind of critical discovery, but just to remove them from the system. If I had a billion dollars.

If I just have another $50 million, I’m not quite sure what to do with that, but if you donate that to MIRI, then you at least have the assurance that we will not randomly spray money on looking like we’re doing stuff and we’ll reserve it, as we are doing with the last giant crypto donation somebody gave us until we can figure out something to do with it that is actually helpful. And MIRI has that property. I would say probably Redwood Research has that property.

Yeah. I realize I’m sounding sort of disorganized here, and that’s because I don’t really have a good organized answer to how in general somebody goes down fighting with dignity.

MIRI and Education

Ryan: I know a lot of people in crypto. They are not as in touch with artificial intelligence, obviously, as you are, and the AI safety issues and the existential threat that you’ve presented in this episode. They do care a lot and see coordination problems throughout society as an issue. Many have also generated wealth from crypto, and care very much about humanity not ending. What sort of things has MIRI, the organization I was talking about earlier, done with funds that you’ve received from crypto donors and elsewhere? And what sort of things might an organization like that pursue to try to stave this off?

Eliezer: I mean, I think mostly we’ve pursued a lot of lines of research that haven’t really panned out, which is a respectable thing to do. We did not know in advance that those lines of research would fail to pan out. If you’re doing research that you know will work, you’re probably not really doing any research. You’re just doing a pretense of research that you can show off to a funding agency.

We try to be real. We did things where we didn’t know the answer in advance. They didn’t work, but that was where the hope lay, I think. But, you know, having a research organization that keeps it real that way, that’s not an easy thing to do. And if you don’t have this very deep form of the security mindset, you will end up producing fake research and doing more harm than good, so I would not tell all the successful cryptocurrency people to run off and start their own research outfits.

Redwood Research—I’m not sure if they can scale using more money, but you can give people more money and wait for them to figure out how to scale it later if they’re the kind who won’t just run off and spend it, which is what MIRI aspires to be.

Ryan: And you don’t think the education path is a useful path? Just educating the world?

Eliezer: I mean, I would give myself and MIRI credit for why the world isn’t just walking blindly into the whirling razor blades here, but it’s not clear to me how far education scales apart from that. You can get more people aware that we’re walking directly into the whirling razor blades, because even if only 10% of the people can get it, that can still be a bunch of people. But then what do they do? I don’t know. Maybe they’ll be able to do something later.

Can you get all the people? Can you get all the politicians? Can you get the people whose job incentives are against them admitting this to be a problem? I have various friends who report, like, “Ah yes, if you talk to researchers at OpenAI in private, they are very worried and say that they cannot be that worried in public.”

How Long Do We Have?

Ryan: This is all a giant Moloch trap, is sort of what you’re telling us. I feel like this is the part of the conversation where we’ve gotten to the end and the doctor has said that we have some sort of terminal illness. And at the end of the conversation, I think the patient, David and I, have to ask the question, “Okay, doc, how long do we have?” Seriously, what are we talking about here if you turn out to be correct? Are we talking about years? Are we talking about decades? What’s your idea here?

David: What are you preparing for, yeah?

Eliezer: How the hell would I know? Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer. How on earth would I know?

It could be three years. It could be 15 years. We could get that AI winter I was hoping for, and it could be 16 years. I’m not really seeing 50 without some kind of giant civilizational catastrophe. And to be clear, whatever civilization arises after that would probably, I’m guessing, end up stuck in just the same trap we are.

Ryan: I think the other thing that the patient might do at the end of a conversation like this is to also consult with other doctors. I’m kind of curious who we should talk to on this quest. Who are some people that if people in crypto want to hear more about this or learn more about this, or even we ourselves as podcasters and educators want to pursue this topic, who are the other individuals in the AI alignment and safety space you might recommend for us to have a conversation with?

Eliezer: Well, the person who actually holds a coherent technical view, who disagrees with me, is named Paul Christiano. He does not write Harry Potter fan fiction, and I expect him to have a harder time explaining himself in concrete terms. But that is the main technical voice of opposition. If you talk to other people in the effective altruism or AI alignment communities who disagree with this view, they are probably to some extent repeating back their misunderstandings of Paul Christiano’s views.

You could try Ajeya Cotra, who’s worked pretty directly with Paul Christiano and I think sometimes aspires to explain these things that Paul is not the best at explaining. I’ll throw out Kelsey Piper as somebody who would be good at explaining—like, would not claim to be a technical person on these issues, but is good at explaining the part that she does know.

Who else disagrees with me? I’m sure Robin Hanson would be happy to come on… well, I’m not sure he’d be happy to come on this podcast, but Robin Hanson disagrees with me, and I kind of feel like the famous argument we had back in the early 2010s, late 2000s about how this would all play out—I basically feel like this was the Yudkowsky position, this is the Hanson position, and then reality was over here, well to the Yudkowsky side of the Yudkowsky position in the Yudkowsky-Hanson debate. But Robin Hanson does not feel that way, and would probably be happy to expound on that at length.

I don’t know. It’s not hard to find opposing viewpoints. The ones that’ll stand up to a few solid minutes of cross-examination from somebody who knows which parts to cross-examine, that’s the hard part.

Bearish Hope

Ryan: You know, I’ve read a lot of your writings and listened to you on previous podcasts. One was in 2018 on the Sam Harris podcast. This conversation feels to me like the most dire you’ve ever seemed on this topic. And maybe that’s not true. Maybe you’ve sort of always been this way, but it seems like the direction of your hope that we solve this issue has declined. I’m wondering if you feel like that’s the case, and if you could sort of summarize your take on all of this as we close out this episode and offer, I guess, any concluding thoughts here.

Eliezer: I mean, I don’t know if you’ve got a time limit on this episode? Or is it just as long as it runs?

Ryan: It’s as long as it needs to be, and I feel like this is a pretty important topic. So you answer this however you want.

Eliezer: Alright. Well, there was a conference one time on “What are we going to do about looming risk of AI disaster?”, and Elon Musk attended that conference. And I was like,: Maybe this is it. Maybe this is when the powerful people notice, and it’s one of the relatively more technical powerful people who could be noticing this. And maybe this is where humanity finally turns and starts… not quite fighting back, because there isn’t an external enemy here, but conducting itself with… I don’t know. Acting like it cares, maybe?

And what came out of that conference, well, was OpenAI, which was fairly nearly the worst possible way of doing anything. This is not a problem of “Oh no, what if secret elites get AI?” It’s that nobody knows how to build the thing. If we do have an alignment technique, it’s going to involve running the AI with a bunch of careful bounds on it where you don’t just throw all the cognitive power you have at something. You have limits on the for loops.

And whatever it is that could possibly save the world, like go out and turn all the GPUs and the server clusters into Rubik’s cubes or something else that prevents the world from ending when somebody else builds another AI a few weeks later—anything that could do that is an artifact where somebody else could take it and take the bounds off the for loops and use it to destroy the world.

So let’s open up everything! Let’s accelerate everything! It was like GPT-3’s version, though GPT-3 didn’t exist back then—but it was like ChatGPT’s blind version of throwing the ideals at a place where they were exactly the wrong ideals to solve the problem.

And the problem is that demon summoning is easy and angel summoning is much harder. Open sourcing all the demon summoning circles is not the correct solution. And I’m using Elon Musk’s own terminology here. He talked about AI as “summoning the demon”, which, not accurate, but—and then the solution was to put a demon summoning circle in every household.

And, why? Because his friends were calling him Luddites once he’d expressed any concern about AI at all. So he picked a road that sounded like “openness” and “accelerating technology”! So his friends would stop calling him “Luddite”.

It was very much the worst—you know, maybe not the literal, actual worst possible strategy, but so very far pessimal.

And that was it.

That was like… that was me in 2015 going like, “Oh. So this is what humanity will elect to do. We will not rise above. We will not have more grace, not even here at the very end.”

So that is, you know, that is when I did my crying late at night and then picked myself up and fought and fought and fought until I had run out all the avenues that I seem to have the capabilities to do. There’s, like, more things, but they require scaling my efforts in a way that I’ve never been able to make them scale. And all of it’s pretty far-fetched at this point anyways.

So, you know, that—so what’s, you know, what’s changed over the years? Well, first of all, I ran out some remaining avenues of hope. And second, things got to be such a disaster, such a visible disaster, the AI has got powerful enough and it became clear enough that, you know, we do not know how to align these things, that I could actually say what I’ve been thinking for a while and not just have people go completely, like, “What are you saying about all this?”

You know, now the stuff that was obvious back in 2015 is, you know, starting to become visible in the distance to others and not just completely invisible. That’s what changed over time.

The End Goal

Ryan: What kind of… What do you hope people hear out of this episode and out of your comments? Eliezer in 2023, who is sort of running on the last fumes of, of hope. Yeah, what do you, what do you want people to get out of this episode? What are you planning to do?

Eliezer: I don’t have concrete hopes here. You know, when everything is in ruins, you might as well speak the truth, right? Maybe somebody hears it, somebody figures out something I didn’t think of.

I mostly expect that this does more harm than good in the modal universe, because a bunch of people are like, “Oh, I have this brilliant, clever idea,” which is, you know, something that I was arguing against in 2003 or whatever, but you know, maybe somebody out there with the proper level of pessimism hears and thinks of something I didn’t think of.

I suspect that if there’s hope at all, it comes from a technical solution, because the difference between technical problems and political problems is at least the technical problems have solutions in principle. At least the technical problems are solvable. We’re not on course to solve this one, but I think anybody who’s hoping for a political solution has frankly not understood the technical problem.

They do not understand what it looks like to try to solve the political problem to such a degree that the world is not controlled by AI because they don’t understand how easy it is to destroy the world with AI, given that the clock keeps ticking forward.

They’re thinking that they just have to stop some bad actor, and that’s why they think there’s a political solution.

But yeah, I don’t have concrete hopes. I didn’t come on this episode out of any concrete hope.

I have no takeaways except, like, don’t make this thing worse.

Don’t, like, go off and accelerate AI more. Don’t—f you have a brilliant solution to alignment, don’t be like, “Ah yes, I have solved the whole problem. We just use the following clever trick.”

You know, “Don’t make things worse” isn’t very much of a message, especially when you’re pointing people at the field at all. But I have no winning strategy. Might as well go on this podcast as an experiment and say what I think and see what happens. And probably no good ever comes of it, but you might as well go down fighting, right?

If there’s a world that survives, maybe it’s a world that survives because of a bright idea somebody had after listening to listening to this podcast—that was brighter, to be clear, than the usual run of bright ideas that don’t work.

Ryan: Eliezer, I want to thank you for coming on and talking to us today. I do.

I don’t know if, by the way, you’ve seen that movie that David was referencing earlier, the movie Don’t Look Up, but I sort of feel like that news anchor, who’s talking to the scientist—is it Leonardo DiCaprio, David? And, uh, the scientist is talking about kind of dire straits for the world. And the news anchor just really doesn’t know what to do. I’m almost at a loss for words at this point.

David: I’ve had nothing for a while now.

Ryan: But one thing I can say is I appreciate your honesty. I appreciate that you’ve given this a lot of time and given this a lot of thought. Everyone, anyone who has heard you speak or read anything you’ve written knows that you care deeply about this issue and have given it a tremendous amount of your life force, in trying to educate people about it.

And, um, thanks for taking the time to do that again today. I’ll—I guess I’ll just let the audience digest this episode in the best way they know how. But, um, I want to reflect everybody in crypto and everybody listening to Bankless—their thanks for you coming on and explaining.

Eliezer: Thanks for having me. We’ll see what comes of it.

Ryan: Action items for you, Bankless nation. We always end with some action items. Not really sure where to refer folks to today, but one thing I know we can refer folks to is MIRI, which is the machine research intelligence institution that Eliezer has been talking about through the episode. That is at intelligence.org, I believe. And some people in crypto have donated funds to this in the past. Vitalik Buterin is one of them. You can take a look at what they’re doing as well. That might be an action item for the end of this episode.

Um, got to end with risks and disclaimers—man, this seems very trite, but our legal experts have asked us to say these at the end of every episode. “Crypto is risky. You could lose everything…”

Eliezer: (laughs)

David: Apparently not as risky as AI, though.

Ryan: —But we’re headed west! This is the frontier. It’s not for everyone, but we’re glad you’re with us on the Bankless journey. Thanks a lot.

Eliezer: And we are grateful for the crypto community’s support. Like, it was possible to end with even less grace than this.

Ryan: Wow. (laughs)

Eliezer: And you made a difference.

Ryan: We appreciate you.

Eliezer: You really made a difference.

Ryan: Thank you.

Q&A

Ryan: [… Y]ou gave up this quote, from I think someone who’s an executive director at MIRI: “We’ve given up hope, but not the fight.”

Can you reflect on that for a bit? So it’s still possible to fight this, even if we’ve given up hope? And even if you’ve given up hope? Do you have any takes on this?

Eliezer: I mean, what else is there to do? You don’t have good ideas. So you take your mediocre ideas, and your not-so-great ideas, and you pursue those until the world ends. Like, what’s supposed to be better than that?

Ryan: We had some really interesting conversation flow out of this episode, Eliezer, as you can imagine. And David and I want to relay some questions that the community had for you, and thank you for being gracious enough to help with those questions in today’s Twitter Spaces.

I’ll read something from Luke ethwalker. “Eliezer has one pretty flawed point in his reasoning. He assumes that AI would have no need or use for humans because we have atoms that could be used for better things. But how could an AI use these atoms without an agent operating on its behalf in the physical world? Even in his doomsday scenario, the AI relied on humans to create the global, perfect killing virus. That’s a pretty huge hole in his argument, in my opinion.”

What’s your take on this? That maybe AIs will dominate the digital landscape but because humans have a physical manifestation, we can still kind of beat the superintelligent AI in our physical world?

Eliezer: If you were an alien civilization of a billion John von Neumanns, thinking at 10,000 times human speed, and you start out connected to the internet, you would want to not be just stuck on the internet, you would want to build that physical presence. You would not be content solely with working through human hands, despite the many humans who’d be lined up, cheerful to help you, you know. Bing already has its partisans. (laughs)

You wouldn’t be content with that, because the humans are very slow, glacially slow. You would like fast infrastructure in the real world, reliable infrastructure. And how do you build that, is then the question, and a whole lot of advanced analysis has been done on this question. I would point people again to Eric Drexler’s Nanosystems.

And, sure, if you literally start out connected to the internet, then probably the fastest way — maybe not the only way, but it’s, you know, an easy way — is to get humans to do things. And then humans do those things. And then you have the desktop — not quite desktop, but you have the nanofactories, and then you don’t need the humans anymore. And this need not be advertised to the world at large while it is happening.

David: So I can understand that perspective, like in the future, we will have better 3D printers — distant in the future, we will have ways where the internet can manifest in the physical world. But I think this argument does ride on a future state with technology that we don’t have today. Like, I don’t think if I was the internet — and that kind of is this problem, right? Like, this superintelligent AI just becomes the internet because it’s embedded in the internet. If I was the internet, how would I get myself to manifest in real life?

And now, I am not an expert on the current state of robotics, or what robotics are connected to the internet. But I don’t think we have too strong of tools today to start to create in the real world manifestations of an internet-based AI. So like, would you say that this part of this problem definitely depends on some innovation, at like the robotics level?

Eliezer: No, it depends on the AI being smart. It doesn’t depend on the humans having this technology; it depends on the AI being able to invent the technology.

This is, like, the central problem: the thing is smarter. Not in the way that the average listener to this podcast probably has an above average IQ, the way that humans are smarter than chimpanzees.

What does that let humans do? Does it let humans be, like, really clever in how they play around with the stuff that’s on the ancestral savanna? Make clever use of grass, clever use of trees?

The humans invent technology. They build the technology. The technology is not there until the humans invent it, the humans conceive it.

The problem is, humans are not the upper bound. We don’t have the best possible brains for that kind of problem. So the existing internet is more than connected enough to people and devices, that you could build better technology than that if you had invented the technology because you were thinking much, much faster and better than a human does.

Ryan: Eliezer, this is a question from stirs, a Bankless Nation listener. He wants to ask the question about your explanation of why the AI will undoubtedly kill us. That seems to be your conclusion, and I’m wondering if you could kind of reinforce that claim. Like, for instance — and this is something David and I discussed after the episode, when we were debriefing on this — why exactly wouldn’t an AI, or couldn’t an AI just blast off of the Earth and go somewhere more interesting, and leave us alone? Like, why does it have to take our atoms and reassemble them? Why can’t it just, you know, set phasers to ignore?

Eliezer: It could if it wanted to. But if it doesn’t want to, there is some initial early advantage. You get to colonize the universe slightly earlier if you consume all of the readily accessible energy on the Earth’s surface as part of your blasting off of the Earth process.

It would only need to care for us by a very tiny fraction to spare us, this I agree. Caring a very tiny fraction is basically the same problem as 100% caring. It’s like, well, could you have a computer system that is usually like the Disk Operating System, but a tiny fraction of the time it’s Windows 11? Writing that is just as difficult as writing Windows 11. We still have to write all the Windows 11 software. Getting it to care a tiny little bit is the same problem as getting it to care 100%.

Ryan: So Eliezer, is this similar to the relationship that humans have with other animals, planet Earth? I would say largely we really don’t… I mean, obviously, there’s no animal Bill of Rights. Animals have no legal protection in the human world, and we kind of do what we want and trample over their rights. But it doesn’t mean we necessarily kill all of them. We just largely ignore them.

If they’re in our way, you know, we might take them out. And there have been whole classes of species that have gone extinct through human activity, of course; but there are still many that we live alongside, some successful species as well. Could we have that sort of relationship with an AI? Why isn’t that reasonably high probability in your models?

Eliezer So first of all, all these things are just metaphors. AI is not going to be exactly like humans to animals.

Leaving that aside for a second, the reason why this metaphor breaks down is that although the humans are smarter than the chickens, we’re not smarter than evolution, natural selection, cumulative optimization power over the last billion years and change. (You know, there’s evolution before that but it’s pretty slow, just, like, single-cell stuff.)

There are things that cows can do for us, that we cannot do for ourselves. In particular, make meat by eating grass. We’re smarter than the cows, but there’s a thing that designed the cows; and we’re faster than that thing, but we’ve been around for much less time. So we have not yet gotten to the point of redesigning the entire cow from scratch. And because of that, there’s a purpose to keeping the cow around alive.

And humans, furthermore, being the kind of funny little creatures that we are — some people care about cows, some people care about chickens. They’re trying to fight for the cows and chickens having a better life, given that they have to exist at all. And there’s a long complicated story behind that. It’s not simple, the way that humans ended up in that [??]. It has to do with the particular details of our evolutionary history, and unfortunately it’s not just going to pop up out of nowhere.

But I’m drifting off topic here. The basic answer to the question “where does that analogy break down?” is that I expect the superintelligences to be able to do better than natural selection, not just better than the humans.

David: So I think your answer is that the separation between us and a superintelligent AI is orders of magnitude larger than the separation between us and a cow, or even us than an ant. Which, I think a large amount of this argument resides on this superintelligence explosion — just going up an exponential curve of intelligence very, very quickly, which is like the premise of superintelligence.

And Eliezer, I want to try and get an understanding of… A part of this argument about “AIs are going come kill us” is buried in the Moloch problem. And Bankless listeners are pretty familiar with the concept of Moloch — the idea of coordination failure. The idea that the more that we coordinate and stay in agreement with each other, we actually create a larger incentive to defect.

And the way that this is manifesting here, is that even if we do have a bunch of humans, which understand the AI alignment problem, and we all agree to only safely innovate in AI, to whatever degree that means, we still create the incentive for someone to fork off and develop AI faster, outside of what would be considered safe.

And so I’m wondering if you could, if it does exist, give us the sort of lay of the land, of all of these commercial entities? And what, if at all, they’re doing to have, I don’t know, an AI alignment team?

So like, for example, OpenAI. Does OpenAI have, like, an alignment department? With all the AI innovation going on, what does the commercial side of the AI alignment problem look like? Like, are people trying to think about these things? And to what degree are they being responsible?

Eliezer: It looks like OpenAI having a bunch of people who it pays to do AI ethics stuff, but I don’t think they’re plugged very directly into Bing. And, you know, they’ve got that department because back when they were founded, some of their funders were like, “Well, but ethics?” and OpenAI was like “Sure, we can buy some ethics. We’ll take this group of people, and we’ll put them over here and we’ll call them an alignment research department”.

And, you know, the key idea behind ChatGPT is RLHF, which was invented by Paul Christiano. Paul Christiano had much more detailed ideas, and somebody might have reinvented this one, but anyway. I don’t think that went through OpenAI, but I could be mistaken. Maybe somebody will be like “Well, actually, Paul Christiano was working at OpenAI at the time”, I haven’t checked the history in very much detail.

A whole lot of the people who were most concerned with this “ethics” left OpenAI, and founded Anthropic. And I’m still not sure that Anthropic has sufficient leadership focus in that direction.

You know, like, put yourself in the shoes of a corporation! You can spend some little fraction of your income on putting together a department of people who will write safety papers. But then the actual behavior that we’ve seen, is they storm ahead, and they use one or two of the ideas that came out from anywhere in the whole [alignment] field. And they get as far as that gets them. And if that doesn’t get them far enough, they just keep storming ahead at maximum pace, because, you know, Microsoft doesn’t want to lose to Google, and Google doesn’t want to lose to Microsoft.

David: So it sounds like your attitude on the efforts of AI alignment in commercial entities is, like, they’re not even doing 1% of what they need to be doing.

Eliezer: I mean, they could spend [10?] times as much money and that would not get them to 10% of what they need to be doing.

It’s not just a problem of “oh, they they could spend the resources, but they don’t want to”. It’s a question of “how do we even spend the resources to get the info that they need”.

But that said, not knowing how to do that, not really understanding that they need to do that, they are just charging ahead anyways.

Ryan: Eliezer, is OpenAI the most advanced AI project that you’re aware of?

Eliezer: Um, no, but I’m not going to go name the competitor, because then people will be like, “Oh, I should go work for them”, you know? I’d rather they didn’t.

Ryan: So it’s like, OpenAI is this organization that was kind of — you were talking about it at the end of the episode, and for crypto people who aren’t aware of some of the players in the field — were they spawned from that 2015 conference that you mentioned? It’s kind of a completely open-source AI project?

Eliezer: That was the original suicidal vision, yes. But…

Ryan: And now they’re bent on commercializing the technology, is that right?

Eliezer: That’s an improvement, but not enough of one, because they’re still generating lots of noise and hype and directing more resources into the field, and storming ahead with the safety that they have instead of the safety that they need, and setting bad examples. And getting Google riled up and calling back in Larry Page and Sergey Brin to head up Google’s AI projects and so on. So, you know, it could be worse! It would be worse if they were open sourcing all the technology. But what they’re doing is still pretty bad.

Ryan: What should they be doing, in your eyes? Like, what would be responsible use of this technology?

I almost get the feeling that, you know, your take would be “stop working on it altogether”? And, of course, you know, to an organization like OpenAI that’s going to be heresy, even if maybe that’s the right decision for humanity. But what should they be doing?

Eliezer: I mean, if you literally just made me dictator of OpenAI, I would change the name to “ClosedAI”. Because right now, they’re making it look like being “closed” is hypocrisy. They’re, like, being “closed” while keeping the name “OpenAI”, and that itself makes it looks like closure is like not this thing that you do cooperatively so that humanity will not die, but instead this sleazy profit-making thing that you do while keeping the name “OpenAI”.

So that’s very bad; change the name to “ClosedAI”, that’s step one.

Next. I don’t know if they can break the deal with Microsoft. But, you know, cut that off. None of this. No more hype. No more excitement. No more getting famous and, you know, getting your status off of like, “Look at how much closer we came to destroying the world! You know, we’re not there yet. But, you know, we’re at the forefront of destroying the world!” You know, stop grubbing for the Silicon Valley bragging cred of visibly being the leader.

Take it all closed. If you got to make money, make money selling to businesses in a way that doesn’t generate a lot of hype and doesn’t visibly push the field.And then try to figure out systems that are more alignable and not just more powerful. And at the end of that, they would fail, because, you know, it’s not easy to do that. And the world would be destroyed. But they would have died with more dignity. Instead of being like, “Yeah, yeah, let’s like push humanity off the cliff ourselves for the ego boost!”, they would have done what they could, and then failed.

David: Eliezer, do you think anyone who’s building AI — Elon Musk, Sam Altman at OpenAI – do you think progressing AI is fundamentally bad?

Eliezer: I mean, there are narrow forms of progress, especially if you didn’t open-source them, that would be good. Like, you can imagine a thing that, like, pushes capabilities a bit, but is much more alignable.

There are people working in the field who I would say are, like, sort of unabashedly good. Like, Chris Olah is taking a microscope to these giant inscrutable matrices and trying to figure out what goes on inside there. Publishing that might possibly even push capabilities a little bit, because if people know what’s going on inside there, they can make better ones. But the question of like, whether to closed-source that is, like, much more fraught than the question of whether to closed-source the stuff that’s just pure capabilities.

But that said, the people who are just like, “Yeah, yeah, let’s do more stuff! And let’s tell the world how we did it, so they can do it too!” That’s just, like, unabashedly bad.

David: So it sounds like you do see paths forward in which we can develop AI in responsible ways. But it’s really this open-source, open-sharing-of-information to allow anyone and everyone to innovate on AI, that’s really the path towards doom. And so we actually need to keep this knowledge private. Like, normally knowledge…

Eliezer: No, no, no, no. Open-sourcing all this stuff is, like, a less dignified path straight off the edge. I’m not saying that all we need to do is keep everything closed and in the right hands and it will be fine. That will also kill you.

But that said, if you have stuff and you do not know how to make it not kill everyone, then broadcasting it to the world is even less dignified than being like, “Okay, maybe we should keep working on this until we can figure out how to make it not kill everyone.”

And then the other people will, like, go storm ahead on their end and kill everyone. But, you know, you won’t have personally slaughtered Earth. And that is more dignified.

Ryan: Eliezer, I know I was kind of shaken after our episode, not having heard the full AI alignment story, at least listened to it for a while.

And I think that in combination with the sincerity through which you talk about these subjects, and also me sort of seeing these things on the horizon, this episode was kind of shaking for me and caused a lot of thought.

But I’m noticing there is a cohort of people who are dismissing this take and your take specifically in this episode as Doomerism. This idea that every generation thinks it’s, you know, the end of the world and the last generation.

What’s your take on this critique that, “Hey, you know, it’s been other things before. There was a time where it was nuclear weapons, and we would all end in a mushroom cloud. And there are other times where we thought a pandemic was going to kill everyone. And this is just the latest Doomerist AI death cult.”

I’m sure you’ve heard that before. How do you respond?

Eliezer: That if you literally know nothing about nuclear weapons or artificial intelligence, except that somebody has claimed of both of them that they’ll destroy the world, then sure, you can’t tell the difference. As far as you can tell, nuclear weapons were claimed to destroy the world, and then they didn’t destroy the world, and then somebody claimed that about AI.

So, you know, Laplace’s rule of induction: at most a 1/3 probability that AI will destroy the world, if nuclear weapons and AI are the only case.

You can bring in so many more cases than that. Why, people should have known in the first place that nuclear weapons wouldn’t destroy the world! Because their next door neighbor once said that the sky was falling, and that didn’t happen; and if their next door weapon was [??], how could the people saying that nuclear weapons would destroy the world be right?

And basically, as long as people are trying to run off of models of human psychology, to derive empirical information about the world, they’re stuck. They’re in a trap they can never get out of. They’re going to always be trying to psychoanalyze the people talking about nuclear weapons or whatever. And the only way you can actually get better information is by understanding how nuclear weapons work, understanding what the international equilibrium with nuclear weapons looks like. And the international equilibrium, by the way, is that nobody profits from setting off small numbers of nuclear weapons, especially given that they know that large numbers of nukes would follow. And, you know, that’s why they haven’t been used yet. There was nobody who made a buck by starting a nuclear war. The nuclear war was clear, the nuclear war was legible. People knew what would happen if they fired off all the nukes.

The analogy I sometimes try to use with artificial intelligence is, “Well, suppose that instead you could make nuclear weapons out of a billion pounds of laundry detergent. And they spit out gold until you make one that’s too large, whereupon it ignites the atmosphere and kills everyone. And you can’t calculate exactly how large is too large. And the international situation is that the private research labs spitting out gold don’t want to hear about igniting the atmosphere.” And that’s the technical difference. You need to be able to tell whether or not that is true as a scientific claim about how reality, the universe, the environment, artificial intelligence, actually works. What actually happens when the giant inscrutable matrices go past a certain point of capability? It’s a falsifiable hypothesis.

You know, if it fails to be falsified, then everyone is dead, but that doesn’t actually change the basic dynamic here, which is, you can’t figure out how the world works by psychoanalyzing the people talking about it.

David: One line of questioning that has come up inside of the Bankless Nation Discord is the idea that we need to train AI with data, lots of data. And where are we getting that data? Well, humans are producing that data. And when humans produce that data, by nature of the fact that it was produced by humans, that data has our human values embedded in it somehow, some way, just by the aggregate nature of all the data in the world, which was created by humans that have certain values. And then AI is trained on that data that has all the human values embedded in it. And so there’s actually no way to create an AI that isn’t trained on data that is created by humans, and that data has human values in it.

Is there anything to this line of reasoning about a potential glimmer of hope here?

Eliezer: There’s a distant glimmer of hope, which is that an AI that is trained on tons of human data in this way probably understands some things about humans. And because of that, there’s a branch of research hope within alignment, which is something that like, “Well, this AI, to be able to predict humans, needs to be able to predict the thought processes that humans are using to make their decisions. So can we thereby point to human values inside of the knowledge that the AI has?”

And this is, like, very nontrivial, because the simplest theory that you use to predict what humans decide next, does not have what you might term “valid morality under reflection” as a clearly labeled primitive chunk inside it that is directly controlling the humans, and which you need to understand on a scientific level to understand the humans.

The humans are full of hopes and fears and thoughts and desires. And somewhere in all of that is what we call “morality”, but it’s not a clear, distinct chunk, where an alien scientist examining humans and trying to figure out just purely on an empirical level “how do these humans work?” would need to point to one particular chunk of the human brain and say, like, “Ahh, that circuit there, the morality circuit!”

So it’s not easy to point to inside the AI’s understanding. There is not currently any obvious way to actually promote that chunk of the AI’s understanding to then be in control of the AI’s planning process. As it must be complicatedly pointed to, because it’s not just a simple empirical chunk for explaining the world.

And basically, I don’t think that is actually going to be the route you should try to go down. You should try to go down something much simpler than that. The problem is not that we are going to fail to convey some complicated subtlety of human value. The problem is that we do not know how to align an AI on a task like “put two identical strawberries on a plate” without destroying the world.

(Where by “put two identical strawberries on the plate”, the concept is that’s invoking enough power that it’s not safe AI that can build two strawberries identical down to the cellular level. Like, that’s a powerful AI. Aligning it isn’t simple. If it’s powerful enough to do that, it’s also powerful enough to destroy the world, etc.)

David: There’s like a number of other lines of logic I could try to go down, but I think I would start to feel like I’m in the bargaining phase of death. Where it’s like “Well, what about this? What about that?”

But maybe to summate all of the arguments, is to say something along the lines of like, “Eliezer, how much room do you give for the long tail of black swan events? But these black swan events are actually us finding a solution for this thing.” So, like, a reverse black swan event where we actually don’t know how we solve this AI alignment problem. But really, it’s just a bet on human ingenuity. And AI hasn’t taken over the world yet. But there’s space between now and then, and human ingenuity will be able to fill that gap, especially when the time comes?

Like, how much room do you leave for the long tail of just, like, “Oh, we’ll discover a solution that we can’t really see today”?

Eliezer: I mean, on the one hand, that hope is all that’s left, and all that I’m pursuing. And on the other hand, in the process of actually pursuing that hope I do feel like I’ve gotten some feedback indicating that this hope is not necessarily very large.

You know, when you’ve got stage four cancer, is there still hope that your body will just rally and suddenly fight off the cancer? Yes, but it’s not what usually happens. And I’ve seen people come in and try to direct their ingenuity at the alignment problem and most of them all invent the same small handful of bad solutions. And it’s harder than usual to direct human ingenuity at this.

A lot of them are just, like — you know, with capabilities ideas, you run out and try them and they mostly don’t work. And some of them do work and you publish the paper, and you get your science [??], and you get your ego boost, and maybe you get a job offer someplace.

And with the alignment stuff you can try to run through the analogous process, but the stuff we need to align is mostly not here yet. You can try to invent the smaller large language models that are public, you can go to work at a place that has access to larger large language models, you can try to do these very crude, very early experiments, and getting the large language models to at least not threaten your users with death —

— which isn’t the same problem at all. It just kind of looks related.

But you’re at least trying to get AI systems that do what you want them to do, and not do other stuff; and that is, at the very core, a similar problem.

But the AI systems are not very powerful, they’re not running into all sorts of problems that you can predict will crop up later. And people just, kind of — like, mostly people short out. They do pretend work on the problem. They’re desperate to help, they got a grant, they now need to show the people who made the grant that they’ve made progress. They, you know, paper mill stuff.

So the human ingenuity is not functioning well right now. You cannot be like, “Ah yes, this present field full of human ingenuity, which is working great, and coming up with lots of great ideas, and building up its strength, will continue at this pace and make it to the finish line in time!”

The capability stuff is storming on ahead. The human ingenuity that’s being directed at that is much larger, but also it’s got a much easier task in front of it.

The question is not “Can human ingenuity ever do this at all?” It’s “Can human ingenuity finish doing this before OpenAI blows up the world?”

Ryan: Well, Eliezer, if we can’t trust in human ingenuity, is there any possibility that we can trust in AI ingenuity? And here’s what I mean by this, and perhaps you’ll throw a dart in this as being hopelessly naive.

But is there the possibility we could ask a reasonably intelligent, maybe almost superintelligent AI, how we might fix the AI alignment problem? And for it to give us an answer? Or is that really not how superintelligent AIs work?

Eliezer: I mean, if you literally build a superintelligence and for some reason it was motivated to answer you, then sure, it could answer you.

Like, if Omega comes along from a distant supercluster and offers to pay the local superintelligence lots and lots of money (or, like, mass or whatever) to give you a correct answer, then sure, it knows the correct answer; it can give you the correct answers.

If it wants to do that, you must have already solved the alignment problem. This reduces the problem of solving alignment to the problem of solving alignment. No progress has been made here.

And, like, working on alignment is actually one of the most difficult things you could possibly try to align.

Like, if I had the health and was trying to die with more dignity by building a system and aligning it as best I could figure out how to align it, I would be targeting something on the order of “build two strawberries and put them on a plate”. But instead of building two identical strawberries and putting them on a plate, you — don’t actually do this, this is not the best thing you should do —

— but if for example you could safely align “turning all the GPUs into Rubik’s cubes”, then that would prevent the world from being destroyed two weeks later by your next follow-up competitor.

And that’s much easier to align an AI on than trying to get the AI to solve alignment for you. You could be trying to build something that would just think about nanotech, just think about the science problems, the physics problems, the chemistry problems, the synthesis pathways.

(The open-air operation to find all the GPUs and turn them into Rubik’s cubes would be harder to align, and that’s why you shouldn’t actually try to do that.)

My point here is: whereas [with] alignment, you’ve got to think about AI technology and computers and humans and intelligent adversaries, and distant superintelligences who might be trying to exploit your AI’s imagination of those distant superintelligences, and ridiculous weird problems that would take so long to explain.

And it just covers this enormous amount of territory, where you’ve got to understand how humans work, you’ve got to understand how adversarial humans might try to exploit and break an AI system — because if you’re trying to build an aligned AI that’s going to run out and operate in the real world, it would have to be resilient to those things.

And they’re just hoping that the AI is going to do their homework for them! But it’s a chicken and egg scenario. And if you could actually get an AI to help you with something, you would not try to get it to help you with something as weird and not-really-all-that-effable as alignment. You would try to get it to help with something much simpler that could prevent the next AGI down the line from destroying the world.

Like nanotechnology. There’s a whole bunch of advanced analysis that’s been done of it, and the kind of thinking that you have to do about it is so much more straightforward and so much less fraught than trying to, you know… And how do you even tell if it’s lying about alignment?

It’s hard to tell whether I’m telling you the truth about all this alignment stuff, right? Whereas if I talk about the tensile strength of sapphire, this is easier to check through the lens of logic.

David: Eliezer, I think one of the reasons why perhaps this episode impacted Ryan – this was an analysis from a Bankless Nation community member — that this episode impacted Ryan a little bit more than it impacted me is because Ryan’s got kids, and I don’t. And so I’m curious, like, what do you think — like, looking 10, 20, 30 years in the future, where you see this future as inevitable, do you think it’s futile to project out a future for the human race beyond, like, 30 years or so?

Eliezer: Timelines are very hard to project. 30 years does strike me as unlikely at this point. But, you know, timing is famously much harder to forecast than saying that things can be done at all. You know, you got your people saying it will be 50 years out two years before it happens, and you got your people saying it’ll be two years out 50 years before it happens. And, yeah, it’s… Even if I knew exactly how the technology would be built, and exactly who was going to build it, I still wouldn’t be able to tell you how long the project would take because of project management chaos.

Now, since I don’t know exactly the technology used, and I don’t know exactly who’s going to build it, and the project may not even have started yet, how can I possibly figure out how long it’s going to take?

Ryan: Eliezer, you’ve been quite generous with your time to the crypto community, and we just want to thank you. I think you’ve really opened a lot of eyes. This isn’t going to be our last AI podcast at Bankless, certainly. I think the crypto community is going to dive down the rabbit hole after this episode. So thank you for giving us the 400-level introduction into it.

As I said to David, I feel like we waded straight into the deep end of the pool here. But that’s probably the best way to address the subject matter. I’m wondering as we kind of close this out, if you could leave us — it is part of the human spirit to keep and to maintain slivers of hope here or there. Or as maybe someone you work with put it – to fight the fight, even if the hope is gone.

100 years in the future, if humanity is still alive and functioning, if a superintelligent AI has not taken over, but we live in coexistence with something of that caliber — imagine if that’s the case, 100 years from now. How did it happen?

Is there some possibility, some sort of narrow pathway by which we can navigate this? And if this were 100 years from now the case, how could you imagine it would have happened?

Eliezer: For one thing, I predict that if there’s a glorious transhumanist future (as it is sometimes conventionally known) at the end of this, I don’t predict it was there by getting like, “coexistence” with superintelligence. That’s, like, some kind of weird, inappropriate analogy based off of humans and cows or something.

I predict alignment was solved. I predict that if the humans are alive at all, that the superintelligences are being quite nice to them.

I have basic moral questions about whether it’s ethical for humans to have human children, if having transhuman children is an option instead. Like, these humans running around? Are they, like, the current humans who wanted eternal youth but, like, not the brain upgrades? Because I do see the case for letting an existing person choose “No, I just want eternal youth and no brain upgrades, thank you.” But then if you’re deliberately having the equivalent of a very crippled child when you could just as easily have a not crippled child.

Like, should humans in their present form be around together? Are we, like, kind of too sad in some ways? I have friends, to be clear, who disagree with me so much about this point. (laughs) But yeah, I’d say that the happy future looks like beings of light having lots of fun in a nicely connected computing fabric powered by the Sun, if we haven’t taken the sun apart yet. Maybe there’s enough real sentiment in people that you just, like, clear all the humans off the Earth and leave the entire place as a park. And even, like, maintain the Sun, so that the Earth is still a park even after the Sun would have ordinarily swollen up or dimmed down.

Yeah, like… That was always the things to be fought for. That was always the point, from the perspective of everyone who’s been in this for a long time. Maybe not literally everyone, but like, the whole old crew.

Ryan: That is a good way to end it: with some hope. Eliezer, thanks for joining the crypto community on this collectibles call and for this follow-up Q&A. We really appreciate it.

michaelwong.eth: Yes, thank you, Eliezer.

Eliezer: Thanks for having me.

Browse

Yudkowsky on AGI risk on the Bankless podcast

Categories