December 2021 Newsletter

 |   |  Newsletters

Ngo’s view on alignment difficulty

 |   |  Analysis, Conversations

 

This post features a write-up by Richard Ngo on his views, with inline comments.

 

Color key:

  Chat     Google Doc content     Inline comments  

 

13. Follow-ups to the Ngo/Yudkowsky conversation

 

13.1. Alignment difficulty debate: Richard Ngo’s case

 

 

[Ngo][9:31]  (Sep. 25)

As promised, here’s a write-up of some thoughts from my end. In particular, since I’ve spent a lot of the debate poking Eliezer about his views, I’ve tried here to put forward more positive beliefs of my own in this doc (along with some more specific claims): [GDocs link]

[Soares: ✨] 

[Ngo]  (Sep. 25 Google Doc)

We take as a starting observation that a number of “grand challenges” in AI have been solved by AIs that are very far from the level of generality which people expected would be needed. Chess, once considered to be the pinnacle of human reasoning, was solved by an algorithm that’s essentially useless for real-world tasks. Go required more flexible learning algorithms, but policies which beat human performance are still nowhere near generalising to anything else; the same for StarCraft, DOTA, and the protein folding problem. Now it seems very plausible that AIs will even be able to pass (many versions of) the Turing Test while still being a long way from AGI.

[Yudkowsky][11:26]  (Sep. 25 comment)

Now it seems very plausible that AIs will even be able to pass (many versions of) the Turing Test while still being a long way from AGI.

I remark:  Restricted versions of the Turing Test.  Unrestricted passing of the Turing Test happens after the world ends.  Consider how smart you’d have to be to pose as an AGI to an AGI; you’d need all the cognitive powers of an AGI as well as all of your human powers.

[Ngo][11:24]  (Sep. 29 comment)

Perhaps we can quantify the Turing test by asking something like:

  • What percentile of competence is the judge?
  • What percentile of competence are the humans who the AI is meant to pass as?
  • How much effort does the judge put in (measured in, say, hours of strategic preparation)?

Does this framing seem reasonable to you? And if so, what are the highest numbers for each of these metrics that correspond to a Turing test which an AI could plausibly pass before the world ends?

[Ngo]  (Sep. 25 Google Doc)

I expect this trend to continue until after we have AIs which are superhuman at mathematical theorem-proving, programming, many other white-collar jobs, and many types of scientific research. It seems like Eliezer doesn’t. I’ll highlight two specific disagreements which seem to play into this.

[Yudkowsky][11:28]  (Sep. 25 comment)

doesn’t

Eh?  I’m pretty fine with something proving the Riemann Hypothesis before the world ends.  It came up during my recent debate with Paul, in fact.

Not so fine with something designing nanomachinery that can be built by factories built by proteins.  They’re legitimately different orders of problem, and it’s no coincidence that the second one has a path to pivotal impact, and the first does not.

[Ngo]  (Sep. 25 Google Doc)

A first disagreement is related to Eliezer’s characterisation of GPT-3 as a shallow pattern-memoriser. I think there’s a continuous spectrum between pattern-memorisation and general intelligence. In order to memorise more and more patterns, you need to start understanding them at a high level of abstraction, draw inferences about parts of the patterns based on other parts, and so on. When those patterns are drawn from the real world, then this process leads to the gradual development of a world-model.

This position seems more consistent with the success of deep learning so far than Eliezer’s position (although my advocacy of it loses points for being post-hoc; I was closer to Eliezer’s position before the GPTs). It also predicts that deep learning will lead to agents which can reason about the world in increasingly impressive ways (although I don’t have a strong position on the extent to which new architectures and algorithms will be required for that). I think that the spectrum from less to more intelligent animals (excluding humans) is a good example of what it looks like to gradually move from pattern-memorisation to increasingly sophisticated world-models and abstraction capabilities.

[Yudkowsky][11:30]  (Sep. 25 comment)

In order to memorise more and more patterns, you need to start understanding them at a high level of abstraction, draw inferences about parts of the patterns based on other parts, and so on.

Correct.  You can believe this and not believe that exactly GPT-like architectures can keep going deeper until their overlap of a greater number of patterns achieves the same level of depth and generalization as human depth and generalization from fewer patterns, just like pre-transformer architectures ran into trouble in memorizing deeper patterns than the shallower ones those earlier systems could memorize.

[Ngo]  (Sep. 25 Google Doc)

I expect that Eliezer won’t claim that pattern-memorisation is unrelated to general intelligence, but will claim that a pattern-memoriser needs to undergo a sharp transition in its cognitive algorithms before it can reason reliably about novel domains (like open scientific problems) – with his main argument for that being the example of the sharp transition undergone by humans.

However, it seems unlikely to me that humans underwent a major transition in our underlying cognitive algorithms since diverging from chimpanzees, because our brains are so similar to those of chimps, and because our evolution from chimps didn’t take very long. This evidence suggests that we should favour explanations for our success which don’t need to appeal to big algorithmic changes, if we have any such explanations; and I think we do. More specifically, I’d characterise the three key differences between humans and chimps as:

  1. Humans have bigger brains.
  2. Humans have a range of small adaptations primarily related to motivation and attention, such as infant focus on language and mimicry, that make us much better at cultural learning.
  3. Humans grow up in a rich cultural environment.

[Ngo][9:13]  (Sep. 23 comment on earlier draft)

bigger brains

I recall a 3-4x difference; but this paper says 5-6x for frontal cortex: https://www.nature.com/articles/nn814

[Tallinn][3:24]  (Sep. 26 comment)

language and mimicry

“apes are unable to ape sounds” claims david deutsch in “the beginning of infinity”

[Barnes][8:09]  (Sep. 23 comment on earlier draft)

[Humans grow up in a rich cultural environment.]

much richer cultural environment including deliberate teaching

[Ngo]  (Sep. 25 Google Doc)

I claim that the discontinuity between the capabilities of humans and chimps is mainly explained by the general intelligence of chimps not being aimed in the direction of learning the skills required for economically valuable tasks, which in turn is mainly due to chimps lacking the “range of small adaptations” mentioned above.

My argument is a more specific version of Paul’s claim that chimp evolution was not primarily selecting for doing things like technological development. In particular, it was not selecting for them because no cumulative cultural environment existed while chimps were evolving, and selection for the application of general intelligence to technological development is much stronger in a cultural environment. (I claim that the cultural environment was so limited before humans mainly because cultural accumulation is very sensitive to transmission fidelity.)

By contrast, AIs will be trained in a cultural environment (including extensive language use) from the beginning, so this won’t be a source of large gains for later systems.

[Ngo][6:01]  (Sep. 22 comment on earlier draft)

more specific version of Paul’s claim

Based on some of Paul’s recent comments, this may be what he intended all along; though I don’t recall his original writings on takeoff speeds making this specific argument.

[Shulman][14:23]  (Sep. 25 comment)

(I claim that the cultural environment was so limited before humans mainly because cultural accumulation is very sensitive to transmission fidelity.)

There can be other areas with superlinear effects from repeated application of  a skill. There’s reason to think that the most productive complex industries tend to have that character.

Making individual minds able to correctly execute long chains of reasoning by reducing per-step error rate could plausibly have very superlinear effects in programming, engineering, management, strategy, persuasion, etc. And you could have new forms of ‘super-culture’ that don’t work with humans.

https://ideas.repec.org/a/eee/jeborg/v85y2013icp1-10.htmlhttps://ideas.repec.org/a/eee/jeborg/v85y2013icp1-10.html

[Ngo]  (Sep. 25 Google Doc)

If true, this argument would weigh against Eliezer’s claims about agents which possess a core of general intelligence being able to easily apply that intelligence to a wide range of tasks. And I don’t think that Eliezer has a compelling alternative explanation of the key cognitive differences between chimps and humans (the closest I’ve seen in his writings is the brainstorming at the end of this post).

If this is the case, I notice an analogy between Eliezer’s argument against Kurzweil, and my argument against Eliezer. Eliezer attempted to put microfoundations underneath the trend line of Moore’s law, which led to a different prediction than Kurzweil’s straightforward extrapolation. Similarly, my proposed microfoundational explanation of the chimp-human gap gives rise to a different prediction than Eliezer’s more straightforward, non-microfoundational extrapolation.

[Yudkowsky][11:39]  (Sep. 25 comment)

Similarly, my proposed microfoundational explanation of the chimp-human gap gives rise to a different prediction than Eliezer’s more straightforward, non-microfoundational extrapolation.

Eliezer does not use “non-microfoundational extrapolations” for very much of anything, but there are obvious reasons why the greater Earth does not benefit from me winning debates through convincingly and correctly listing all the particular capabilities you need to add over and above what GPToid architectures can achieve, in order to achieve AGI.  Nobody else with a good model of larger reality will publicly describe such things in a way they believe is correct.  I prefer not to argue convincingly but wrongly.  But, no, it is not Eliezer’s way to sound confident about anything unless he thinks he has a more detailed picture of the microfoundations than the one you are currently using yourself.

[Ngo][11:40]  (Sep. 29 comment)

Good to know; apologies for the incorrect inference.

Given that this seems like a big sticking point in the debate overall, do you have any ideas about how to move forward while avoiding infohazards?

[Ngo]  (Sep. 25 Google Doc)

My position makes some predictions about hypothetical cases:

  1. If chimpanzees had the same motivational and attention-guiding adaptations towards cultural learning and cooperation that humans do, and were raised in equally culturally-rich environments, then they could become economically productive workers in a range of jobs (primarily as manual laborers, but plausibly also for operating machinery, etc).
    1. Results from chimps raised in human families, like Washoe, seem moderately impressive, although still very uncertain. There’s probably a lot of bias towards positive findings – but on the other hand, it’s only been done a handful of times, and I expect that more practice at it would lead to much better results.
    2. Comparisons between humans and chimps which aren’t raised in similar ways to humans are massively biased towards humans. For the purposes of evaluating general intelligence, comparisons between chimpanzees and feral children seem fairer (although it’s very hard to know how much the latter were affected by non-linguistic childhoods as opposed to abuse or pre-existing disabilities).
  2. Consider a hypothetical species which has the same level of “general intelligence” that chimpanzees currently have, but is as well-adapted to the domains of abstract reasoning and technological development as chimpanzee behaviour is to the domain of physical survival (e.g. because they evolved in an artificial environment where their fitness was primarily determined by their intellectual contributions). I claim that this species would have superhuman scientific research capabilities, and would be able to make progress in novel areas of science (analogously to how chimpanzees can currently learn to navigate novel physical landscapes).
    1. Insofar as Eliezer doubts this, but does believe that this species could outperform a society of village idiots at scientific research, then he needs to explain why the village-idiot-to-Einstein gap is so significant in this context but not in others.
    2. However, this is a pretty weird thought experiment, and maybe doesn’t add much to our existing intuitions about AIs. My main intention here is to point at how animal behaviour is really really well-adapted to physical environments, in a way which makes people wonder what it would be like to be really really well-adapted to intellectual environments.
  3. I claim that the difficulty of human-level oracle AGIs matching humans Consider an AI which has been trained only to answer questions, and is now human-level at doing so. I claim that the difficulty of this AI matching humans at a range of real-world tasks (without being specifically trained to do so) would be much closer to the difficulty of teaching chimps to do science, than the difficulty of teaching adult humans to do abstract reasoning about a new domain.
    1. The analogy here is: chimps have reasonably general intelligence, but it’s hard for them to apply it to science because they weren’t trained to apply intelligence to that. Likewise, human-level oracle AGIs have general intelligence, but it’ll be hard for them to apply it to influencing the world because they weren’t trained to apply intelligence to that.

[Barnes][8:21]  (Sep. 23 comment on earlier draft)

village-idiot-to-Einstein gap

I wonder to what extent you can model within-species intelligence differences partly just as something like hyperparameter search – if you have a billion humans with random variation in their neural/cognitive traits, the top human will be a lot better than average. Then you could say something like:

  • humans are the dumbest species you could have where the distribution of intelligence in each generation is sufficient for cultural accumulation
  • that by itself might not imply a big gap from chimps
  • but human society has much larger population, so the smartest individuals are much smarter

[Ngo][9:05]  (Sep. 23 comment on earlier draft)

I think Eliezer’s response (which I’d agree with) would be that the cognitive difference between the best humans and normal humans is strongly constrained by the fact that we’re all one species who can interbreed with each other. And so our cognitive variation can’t be very big compared with inter-species variation (at the top end at least; although it could at the bottom end via things breaking).

[Barnes][9:35]  (Sep. 23 comment on earlier draft)

I think that’s not obviously true – it’s definitely possible that there’s a lot of random variation due to developmental variation etc. If that’s the case then population size could create large within-species differences

[Yudkowsky][11:46]  (Sep. 25 comment)

oracle AGIs

Remind me of what this is?  Surely you don’t just mean the AI that produces plans it doesn’t implement itself, because that AI becomes an agent by adding an external switch that routes its outputs to a motor; it can hardly be much cognitively different from an agent.  Then what do you mean, “oracle AGI”?

(People tend to produce shallow specs of what they mean by “oracle” that make no sense in my microfoundations, a la “Just drive red cars but not blue cars!”, leading to my frequent reply, “Sorry, still AGI-complete in terms of the machinery you have to build to do that.”)

[Ngo][11:44]  (Sep. 29 comment)

Edited to clarify what I meant in this context (and remove the word “oracle” altogether).

[Yudkowsky][12:01]  (Sep. 29 comment)

My reply holds just as much to “AIs that answer questions”; what restricted question set do you imagine suffices to save the world without dangerously generalizing internal engines?

[Barnes][8:15]  (Sep. 23 comment on earlier draft)

The analogy here is: chimps have reasonably general intelligence, but it’s hard for them to apply it to science because they weren’t trained to apply intelligence to that. Likewise, human-level oracle AGIs have general intelligence, but it’ll be hard for them to apply it to influencing the world because they weren’t trained to apply intelligence to that.

this is not intuitive to me; it seems pretty plausible that the subtasks of predicting the world and of influencing the world are much more similar than the subtasks of surviving in a chimp society are to the subtasks of doing science

[Ngo][8:59]  (Sep. 23 comment on earlier draft)

I think Eliezer’s position is that all of these tasks are fairly similar if you have general intelligence. E.g. he argued that the difference between very good theorem-proving and influencing the world is significantly smaller than people expect. So even if you’re right, I think his position is too strong for your claim to help him. (I expect him to say that I’m significantly overestimating the extent to which chimps are running general cognitive algorithms).

[Barnes][9:33]  (Sep. 23 comment on earlier draft)

I wasn’t trying to defend his position, just disagreeing with you 😛

[Ngo]  (Sep. 25 Google Doc)

More specific details

Here are three training regimes which I expect to contribute to AGI:

  • Self-supervised training – e.g. on internet text, code, books, videos, etc.
  • Task-based RL – agents are rewarded (likely via human feedback, and some version of iterated amplification) for doing well on bounded tasks.
  • Open-ended RL – agents are rewarded for achieving long-term goals in rich environments.

[Yudkowsky][11:56]  (Sep. 25 comment)

bounded tasks

There’s an interpretation of this I’d agree with, but all of the work is being carried by the boundedness of the tasks, little or none via the “human feedback” part which I shrug at, and none by the “iterated amplification” part since I consider that tech unlikely to exist before the world ends.

[Ngo]  (Sep. 25 Google Doc)

Most of my probability of catastrophe comes from AGIs trained primarily via open-ended RL. Although IA makes these scenarios less likely by making task-based RL more powerful, it doesn’t seem to me that IA tackles the hardest case (of aligning agents trained via open-ended RL) head-on. But disaster from open-ended RL also seems a long way away – mainly because getting long-term real-world feedback is very slow, and I expect it to be hard to create sufficiently rich artificial environments. By that point I do expect the strategic landscape to be significantly different, because of the impact of task-based RL.

[Yudkowsky][11:57]  (Sep. 25 comment)

a long way away

Oh, definitely, at the present rates of progress we’ve got years, plural.

The history of futurism says that even saying that tends to be unreliable in the general case (people keep saying it right up until the Big Thing actually happens) and also that it’s rather a difficult form of knowledge to obtain more than a few years out.

[Yudkowsky][12;01]  (Sep. 25 comment)

hard to create sufficiently rich artificial environments

Disagree; I don’t think that making environments more difficult in a way that challenges the environment inside will prove to be a significant AI development bottleneck.  Making simulations easy enough for current AIs to do interesting things in them, but hard enough that the things they do are not completely trivial, takes some work relevant to current levels of AI intelligence.  I think that making those environments more tractably challenging for smarter AIs is not likely to be nearly a bottleneck in progress, compared to making the AIs smarter and able to solve the environment.  It’s a one-way-hash, P-vs-NP style thing – not literally, just that general relationship between it taking a lower amount of effort to pose a problem such that solving it requires a higher amount of effort.

[Ngo]  (Sep. 25 Google Doc)

Perhaps the best way to pin down disagreements in our expectations about the effects of the strategic landscape is to identify some measures that could help to reduce AGI risk, and ask how seriously key decision-makers would need to take AGI risk for each measure to be plausible, and how powerful and competent they would need to be for that measure to make a significant difference. Actually, let’s lump these metrics together into a measure of “amount of competent power applied”. Some benchmarks, roughly in order (and focusing on the effort applied by the US):

  • Banning chemical/biological weapons
  • COVID
    • Key points: mRNA vaccines, lockdowns, mask mandates
  • Nuclear non-proliferation
  • The International Space Station
    • Cost to US: ~$75 billion
  • Climate change
    • US expenditure: >$154 billion (but not very effectively)
  • Project Apollo
    • Wikipedia says that Project Apollo “was the largest commitment of resources ($156 billion in 2019 US dollars) ever made by any nation in peacetime. At its peak, the Apollo program employed 400,000 people and required the support of over 20,000 industrial firms and universities.”
  • WW1
  • WW2

[Yudkowsky][12:02]  (Sep. 25 comment)

WW2

This level of effort starts to buy significant amounts of time.  This level will not be reached, nor approached, before the world ends.

[Ngo]  (Sep. 25 Google Doc)

Here are some wild speculations (I just came up with this framework, and haven’t thought about these claims very much):

  1. The US and China preventing any other country from becoming a leader in AI requires about as much competent power as banning chemical/biological weapons.
  2. The US and China enforcing a ban on AIs above a certain level of autonomy requires about as much competent power as the fight against climate change.
    1. In this scenario, all the standard forces which make other types of technological development illegal have pushed towards making autonomous AGI illegal too.
  3. Launching a good-faith joint US-China AGI project requires about as much competent power as launching Project Apollo.
    1. According to this article, Kennedy (and later Johnson) made several offers (some of which were public) of a joint US-USSR Moon mission, which Khrushchev reportedly came close to accepting. Of course this is a long way from actually doing a joint project (and it’s not clear how reliable the source is), but it still surprised me a lot, given that I viewed the “space race” as basically a zero-sum prestige project. If your model predicted this, I’d be interested to hear why.

[Yudkowsky][12:07]  (Sep. 25 comment)

The US and China preventing any other country from becoming a leader in AI requires about as much competent power as banning chemical/biological weapons.

I believe this is wholly false.  On my model it requires closer to WW1 levels of effort.  I don’t think you’re going to get it without credible threats of military action leveled at previously allied countries.

AI is easier and more profitable to build than chemical / biological weapons, and correspondingly harder to ban.  Existing GPU factories need to be shut down and existing GPU clusters need to be banned and no duplicate of them can be allowed to arise, across many profiting countries that were previously military allies of the United States, which – barring some vast shift in world popular and elite opinion against AI, which is also not going to happen – those countries would be extremely disinclined to sign, especially if the treaty terms permitted the USA and China to forge ahead.

The reason why chem weapons bans were much easier was that people did not like chem weapons.  They were awful.  There was a perceived common public interest in nobody having chem weapons.  It was understood popularly and by elites to be a Prisoner’s Dilemma situation requiring enforcement to get to the Pareto optimum.  Nobody was profiting tons off the infrastructure that private parties could use to make chem weapons.

An AI ban is about as easy as banning advanced metal-forging techniques in current use so nobody can get ahead of the USA and China in making airplanes.  That would be HARD and likewise require credible threats of military action against former allies.

“AI ban is as easy as a chem weapons ban” seems to me like politically crazy talk.  I’d expect a more politically habited person to confirm this.

[Shulman][14:32]  (Sep. 25 comment)

AI ban much, much harder than chemical weapons ban. Indeed chemical weapons were low military utility, that was central to the deal, and they have still been used subsequently.

An AI ban is about as easy as banning advanced metal-forging techniques in current use so nobody can get ahead of the USA and China in making airplanes. That would be HARD and likewise require credible threats of military action against former allies.

If large amounts of compute relative to today are needed (and presumably Eliezer rejects this), the fact that there is only a single global leading node chip supply chain makes it vastly easier than metal forging, which exists throughout the world and is vastly cheaper.

Sharing with allies (and at least embedding allies to monitor US compliance) also reduces the conflict side.

OTOH, if compute requirements were super low then it gets a lot worse.

And the biological weapons ban failed completely: the Soviets built an enormous bioweapons program, the largest ever, after agreeing to the ban, and the US couldn’t even tell for sure they were doing so.

[Yudkowsky][18:15]  (Oct. 4 comment)

I’ve updated somewhat off of Carl Shulman’s argument that there’s only one chip supply chain which goes through eg a single manufacturer of lithography machines (ASML), which could maybe make a lock on AI chips possible with only WW1 levels of cooperation instead of WW2.

That said, I worry that, barring WW2 levels, this might not last very long if other countries started duplicating the supply chain, even if they had to go back one or two process nodes on the chips?  There’s a difference between the proposition “ASML has a lock on the lithography market right now” and “if aliens landed and seized ASML, Earth would forever after be unable to build another lithography plant”.  I mean, maybe that’s just true because we lost technology and can’t rebuild old bridges either, but it’s at least less obvious.

Launching Tomahawk cruise missiles at any attempt anywhere to build a new ASML, is getting back into “military threats against former military allies” territory and hence what I termed WW2 levels of cooperation.

[Shulman][18:30]  (Oct. 4 comment)

China has been trying for some time to build its own and has failed with tens of billions of dollars (but has captured some lagging node share), but would be substantially more likely to succeed with a trillion dollar investment. That said, it is hard to throw money at these things and the tons of tacit knowledge/culture/supply chain networks are tough to replicate. Also many ripoffs of the semiconductor subsidies have occurred. Getting more NASA/Boeing and less SpaceX is a plausible outcome even with huge investment.

They are trying to hire people away from the existing supply chain to take its expertise and building domestic skills with the lagging nodes.

[Yudkowsky][19:14]  (Oct. 4 comment)

Does that same theory predict that if aliens land and grab some but not all of the current ASML personnel, Earth is thereby successfully taken hostage for years, because Earth has trouble rebuilding ASML, which had the irreproducible lineage of masters and apprentices dating back to the era of Lost Civilization?  Or would Earth be much better at this than China, on your model?

[Shulman][19:31]  (Oct. 4 comment)

I’ll read that as including the many suppliers of ASML (one EUV machine has over 100,000 parts, many incredibly fancy or unique). It’s just a matter of how many years it takes. I think Earth fails to rebuild that capacity in 2 years but succeeds in 10.

“A study this spring by Boston Consulting Group and the Semiconductor Industry Association estimated that creating a self-sufficient chip supply chain would take at least $1 trillion and sharply increase prices for chips and products made with them…The situation underscores the crucial role played by ASML, a once obscure company whose market value now exceeds $285 billion. It is “the most important company you never heard of,” said C.J. Muse, an analyst at Evercore ISI.”

https://www.nytimes.com/2021/07/04/technology/tech-cold-war-chips.html

[Yudkowsky][19:59]  (Oct. 4 comment)

No in 2 years, yes in 10 years sounds reasonable to me for this hypothetical scenario, as far as I know in my limited knowledge.

[Yudkowsky][12:10]  (Sep. 25 comment)

Launching a good-faith joint US-China AGI project requires about as much competent power as launching Project Apollo.

It’s really weird, relative to my own model, that you put the item that the US and China can bilaterally decide to do all by themselves, without threats of military action against their former allies, as more difficult than the items that require conditions imposed on other developed countries that don’t want them.

Political coordination is hard.  No, seriously, it’s hard.  It comes with a difficulty penalty that scales with the number of countries, how complete the buy-in has to be, and how much their elites and population don’t want to do what you want them to do relative to how much elites and population agree that it needs doing (where this very rapidly goes to “impossible” or “WW1/WW2” as they don’t particularly want to do your thing).

[Ngo]  (Sep. 25 Google Doc)

So far I haven’t talked about how much competent power I actually expect people to apply to AI governance. I don’t think it’s useful for Eliezer and me to debate this directly, since it’s largely downstream from most of the other disagreements we’ve had. In particular, I model him as believing that there’ll be very little competent power applied to prevent AI risk from governments and wider society, partly because he expects a faster takeoff than I do, and partly because he has a lower opinion of governmental competence than I do. But for the record, it seems likely to me that there’ll be as much competent effort put into reducing AI risk by governments and wider society as there has been into fighting COVID; and plausibly (but not likely) as much as fighting climate change.

One key factor is my expectation that arguments about the importance of alignment will become much stronger as we discover more compelling examples of misalignment. I don’t currently have strong opinions about how compelling the worst examples of misalignment before catastrophe are likely to be; but identifying and publicising them seems like a particularly effective form of advocacy, and one which we should prepare for in advance.

The predictable accumulation of easily-accessible evidence that AI risk is important is one example of a more general principle: that it’s much easier to understand, publicise, and solve problems as those problems get closer and more concrete. This seems like a strong effect to me, and a key reason why so many predictions of doom throughout history have failed to come true, even when they seemed compelling at the time they were made.

Upon reflection, however, I think that even taking this effect into account, the levels of competent power required for the interventions mentioned above are too high to justify the level of optimism about AI governance that I started our debate with. On the other hand, I found Eliezer’s arguments about consequentialism less convincing than I expected. Overall I’ve updated that AI risk is higher than I previously believed; though I expect my views to be quite unsettled while I think more, and talk to more people, about specific governance interventions and scenarios.

 

Conversation on technology forecasting and gradualism

 |   |  Analysis, Conversations

 

This post is a transcript of a multi-day discussion between Paul Christiano, Richard Ngo, Eliezer Yudkowsky, Rob Bensinger, Holden Karnofsky, Rohin Shah, Carl Shulman, Nate Soares, and Jaan Tallinn, following up on the Yudkowsky/Christiano debate in 1, 2, 3, and 4.

 

Color key:

 Chat by Paul, Richard, and Eliezer   Other chat 

 

12. Follow-ups to the Christiano/Yudkowsky conversation

 

12.1. Bensinger and Shah on prototypes and technological forecasting

 

[Bensinger][16:22] 

Quoth Paul:

seems like you have to make the wright flyer much better before it’s important, and that it becomes more like an industry as that happens, and that this is intimately related to why so few people were working on it

Is this basically saying ‘the Wright brothers didn’t personally capture much value by inventing heavier-than-air flying machines, and this was foreseeable, which is why there wasn’t a huge industry effort already underway to try to build such machines as fast as possible.’ ?

My maybe-wrong model of Eliezer says here ‘the Wright brothers knew a (Thielian) secret’, while my maybe-wrong model of Paul instead says:

  • They didn’t know a secret — it was obvious to tons of people that you could do something sorta like what the Wright brothers did and thereby invent airplanes; the Wright brothers just had unusual non-monetary goals that made them passionate to do a thing most people didn’t care about.
  • Or maybe it’s better to say: they knew some specific secrets about physics/engineering, but only because other people correctly saw ‘there are secrets to be found here, but they’re stamp-collecting secrets of little economic value to me, so I won’t bother to learn the secrets’. ~Everyone knows where the treasure is located, and ~everyone knows the treasure won’t make you rich.

[Yudkowsky][17:24]

My model of Paul says there could be a secret, but only because the industry was tiny and the invention was nearly worthless directly.

[Cotra: ➕]
[Christiano][17:53]

I mean, I think they knew a bit of stuff, but it generally takes a lot of stuff to make something valuable, and the more people have been looking around in an area the more confident you can be that it’s going to take a lot of stuff to do much better, and it starts to look like an extremely strong regularity for big industries like ML or semiconductors

it’s pretty rare to find small ideas that don’t take a bunch of work to have big impacts

I don’t know exactly what a thielian secret is (haven’t read the reference and just have a vibe)

straightening it out a bit, I have 2 beliefs that combine disjunctively: (i) generally it takes a lot of work to do stuff, as a strong empirical fact about technology, (ii) generally if the returns are bigger there are more people working on it, as a slightly-less-strong fact about sociology

[Bensinger][18:09]

secrets = important undiscovered information (or information that’s been discovered but isn’t widely known), that you can use to get an edge in something. https://www.lesswrong.com/posts/ReB7yoF22GuerNfhH/thiel-on-secrets-and-indefiniteness

There seems to be a Paul/Eliezer disagreement about how common these are in general. And maybe a disagreement about how much more efficiently humanity discovers and propagates secrets as you scale up the secret’s value?

[Yudkowsky][18:35]

Many times it has taken much work to do stuff; there’s further key assertions here about “It takes $100 billion” and “Multiple parties will invest $10B first” and “$10B gets you a lot of benefit first because scaling is smooth and without really large thresholds”.

Eliezer is like “ah, yes, sometimes it takes 20 or even 200 people to do stuff, but core researchers often don’t scale well past 50, and there aren’t always predecessors that could do a bunch of the same stuff” even though Eliezer agrees with “it often takes a lot of work to do stuff”. More premises are needed for the conclusion, that one alone does not distinguish Eliezer and Paul by enough.

[Bensinger][20:03]

My guess is that everyone agrees with claims 1, 2, and 3 here (please let me know if I’m wrong!):

1. The history of humanity looks less like Long Series of Cheat Codes World, and more like Well-Designed Game World.

In Long Series of Cheat Codes World, human history looks like this, over and over: Some guy found a cheat code that totally outclasses everyone else and makes him God or Emperor, until everyone else starts using the cheat code too (if the Emperor allows it). After which things are maybe normal for another 50 years, until a new Cheat Code arises that makes its first adopters invincible gods relative to the previous tech generation, and then the cycle repeats.

In Well-Designed Game World, you can sometimes eke out a small advantage, and the balance isn’t perfect, but it’s pretty good and the leveling-up tends to be gradual. A level 100 character totally outclasses a level 1 character, and some level transitions are a bigger deal than others, but there’s no level that makes you a god relative to the people one level below you.

2. General intelligence took over the world once. Someone who updated on that fact but otherwise hasn’t thought much about the topic should not consider it ‘bonkers’ that machine general intelligence could take over the world too, even though they should still consider it ‘bonkers’ that eg a coffee startup could take over the world.

(Because beverages have never taken over the world before, whereas general intelligence has; and because our inside-view models of coffee and of general intelligence make it a lot harder to imagine plausible mechanisms by which coffee could make someone emperor, kill all humans, etc., compared to general intelligence.)

(In the game analogy, the situation is a bit like ‘I’ve never found a crazy cheat code or exploit in this game, but I haven’t ruled out that there is one, and I heard of a character once who did a lot of crazy stuff that’s at least suggestive that she might have had a cheat code.’)

3. AGI is arising in a world where agents with science and civilization already exist, whereas humans didn’t arise in such a world. This is one reason to think AGI might not take over the world, but it’s not a strong enough consideration on its own to make the scenario ‘bonkers’ (because AGIs are likely to differ from humans in many respects, and it wouldn’t obviously be bonkers if the first AGIs turned out to be qualitatively way smarter, cheaper to run, etc.).

If folks agree with the above, then I’m confused about how one updates from the above epistemic state to ‘bonkers’.

It was to a large extent physics facts that determined how easy it was to understand the feasibility of nukes without (say) decades of very niche specialized study. Likewise, it was physics facts that determined you need rare materials, many scientists, and a large engineering+infrastructure project to build a nuke. In a world where the physics of nukes resulted in it being some PhD’s quiet ‘nobody thinks this will work’ project like Andrew Wiles secretly working on a proof of Fermat’s Last Theorem for seven years, that would have happened.

If an alien came to me in 1800 and told me that totally new physics would let future humans build city-destroying superbombs, then I don’t see why I should have considered it bonkers that it might be lone mad scientists rather than nations who built the first superbomb. The ‘lone mad scientist’ scenario sounds more conjunctive to me (assumes the mad scientist knows something that isn’t widely known, AND has the ability to act on that knowledge without tons of resources), so I guess it should have gotten less probability, but maybe not dramatically less?

‘Mad scientist builds city-destroying weapon in basement’ sounds wild to me, but I feel like almost all of the actual unlikeliness comes from the ‘city-destroying weapons exist at all’ part, and then the other parts only moderately lower the probability.

Likewise, I feel like the prima-facie craziness of basement AGI mostly comes from ‘generally intelligence is a crazy thing, it’s wild that anything could be that high-impact’, and a much smaller amount comes from ‘it’s wild that something important could happen in some person’s basement’.

It does structurally make sense to me that Paul might know things I don’t about GPT-3 and/or humans that make it obvious to him that we roughly know the roadmap to AGI and it’s this.

If the entire ‘it’s bonkers that some niche part of ML could crack open AGI in 2026 and reveal that GPT-3 (and the mainstream-in-2026 stuff) was on a very different part of the tech tree’ view is coming from a detailed inside-view model of intelligence like this, then that immediately ends my confusion about the argument structure.

I don’t understand why you think you have the roadmap, and given a high-confidence roadmap I’m guessing I’d still put more probability than you on someone finding a very different, shorter path that works too. But the argument structure “roadmap therefore bonkers” makes sense to me.

If there are meant to be other arguments against ‘high-impact AGI via niche ideas/techniques’ that are strong enough to make it bonkers, then I remain confused about the argument structure and how it can carry that much weight.

I can imagine an inside-view model of human cognition, GPT-3 cognition, etc. that tells you ‘AGI coming from nowhere in 3 years is bonkers’; I can’t imagine an ML-is-a-reasonably-efficient-market argument that does the same, because even a perfectly efficient market isn’t omniscient and can still be surprised by undiscovered physics facts that tell you ‘nukes are relatively easy to build’ and ‘the fastest path to nukes is relatively hard to figure out’.

(Caveat: I’m using the ‘basement nukes’ and ‘Fermat’s last theorem’ analogy because it helps clarify the principles involved, not because I think AGI will be that extreme on the spectrum.)

[Yudkowsky: +1]

Oh, I also wouldn’t be confused by a view like “I think it’s 25% likely we’ll see a more Eliezer-ish world. But it sounds like Eliezer is, like, 90% confident that will happen, and that level of confidence (and/or the weak reasoning he’s provided for that confidence) seems bonkers to me.”

The thing I’d be confused by is e.g. “ML is efficient-ish, therefore the out-of-the-blue-AGI scenario itself is bonkers and gets, like, 5% probability.”

Read more »

More Christiano, Cotra, and Yudkowsky on AI progress

 |   |  Analysis, Conversations

 

This post is a transcript of a discussion between Paul Christiano, Ajeya Cotra, and Eliezer Yudkowsky (with some comments from Rob Bensinger, Richard Ngo, and Carl Shulman), continuing from 1, 2, and 3.

 

Color key:

 Chat by Paul and Eliezer   Other chat 

 

10.2. Prototypes, historical perspectives, and betting

 

[Bensinger][4:25]

I feel confused about the role “innovations are almost always low-impact” plays in slow-takeoff-ish views.

Suppose I think that there’s some reachable algorithm that’s different from current approaches, and can do par-human scientific reasoning without requiring tons of compute.

The existence or nonexistence of such an algorithm is just a fact about the physical world. If I imagine one universe where such an algorithm exists, and another where it doesn’t, I don’t see why I should expect that one of those worlds has more discontinuous change in GWP, ship sizes, bridge lengths, explosive yields, etc. (outside of any discontinuities caused by the advent of humans and the advent of AGI)? What do these CS facts have to do with the other facts?

But AI Impacts seems to think there’s an important connection, and a large number of facts of the form ‘steamships aren’t like nukes’ seem to undergird a lot of Paul’s confidence that the scenario I described —

(“there’s some reachable algorithm that’s different from current approaches, and can do par-human scientific reasoning without requiring tons of compute.”)

— is crazy talk. (Unless I’m misunderstanding. As seems actually pretty likely to me!)

(E.g., Paul says “To me your model just seems crazy, and you are saying it predicts crazy stuff at the end but no crazy stuff beforehand”, and one of the threads of the timelines conversation has been Paul asking stuff like “do you want to give any example other than nuclear weapons of technologies with the kind of discontinuous impact you are describing?”.)

Possibilities that came to mind for me:

1. The argument is ‘reality keeps surprising us with how continuous everything else is, so we seem to have a cognitive bias favoring discontinuity, so we should have a skeptical prior about our ability to think our way to ‘X is discontinuous’ since our brains are apparently too broken to do that well?

(But to get from 1 to ‘discontinuity models are batshit’ we surely need something more probability-mass-concentrating than just a bias argument?)

2. The commonality between steamship sizes, bridge sizes, etc. and AGI is something like ‘how tractable is the world?’. A highly tractable world, one whose principles are easy to understand and leverage, will tend to have more world-shatteringly huge historical breakthroughs in various problems, and will tend to see a larger impact from the advent of humans and the advent of AGI.

Our world looks much less tractable, so even if there’s a secret sauce to building AGI, we should expect the resultant AGI to be a lot less impactful.

[Ngo][5:06]

I endorse #2 (although I think more weakly than Paul does) and would also add #3: another commonality is something like “how competitive is innovation?”

[Shulman][8:22]

@RobBensinger It’s showing us a fact about the vast space of ideas and technologies we’ve already explored that they are not so concentrated and lumpy that the law of large numbers doesn’t work well as a first approximation in a world with thousands or millions of people contributing. And that specifically includes past computer science innovation.

So the ‘we find a secret sauce algorithm that causes a massive unprecedented performance jump, without crappier predecessors’ is a ‘separate, additional miracle’ at exactly the same time as the intelligence explosion is getting going. You can get hyperbolic acceleration from increasing feedbacks from AI to AI hardware and software, including crazy scale-up at the end, as part of a default model. But adding on to it that AGI is hit via an extremely large performance jump of a type that is very rare, takes a big probability penalty.

And the history of human brains doesn’t seem to provide strong evidence of a fundamental software innovation, vs hardware innovation and gradual increases in selection applied to cognition/communication/culture.

The fact that, e.g. AIs are mastering so much math and language while still wielding vastly infrahuman brain-equivalents, and crossing human competence in many domains (where there was ongoing effort) over decades is significant evidence for something smoother than the development of modern humans and their culture.

That leaves me not expecting a simultaneous unusual massive human concentrated algorithmic leap with AGI, although I expect wildly accelerating progress from increasing feedbacks at that time. Crossing a given milestone is disproportionately likely to happen in the face of an unusually friendly part/jump of a tech tree (like AlexNet/the neural networks->GPU transition) but still mostly not, and likely not from an unprecedented in computer science algorithmic change.

https://aiimpacts.org/?s=cross+

[Cotra: 👍]

[Yudkowsky][11:26][11:37]

The existence or nonexistence of such an algorithm is just a fact about the physical world. If I imagine one universe where such an algorithm exists, and another where it doesn’t, I don’t see why I should expect that one of those worlds has more discontinuous change in GWP, ship sizes, bridge lengths, explosive yields, etc. (outside of any discontinuities caused by the advent of humans and the advent of AGI)? What do these CS facts have to do with the other facts?

I want to flag strong agreement with this. I am not talking about change in ship sizes because that is relevant in any visible way on my model; I’m talking about it in hopes that I can somehow unravel Carl and Paul’s model, which talks a whole lot about this being Relevant even though that continues to not seem correlated to me across possible worlds.

I think a lot in terms of “does this style of thinking seem to have any ability to bind to reality”? A lot of styles of thinking in futurism just don’t.

I imagine Carl and Paul as standing near the dawn of hominids asking, “Okay, let’s try to measure how often previous adaptations resulted in simultaneous fitness improvements across a wide range of environmental challenges” or “what’s the previous record on an organism becoming more able to survive in a different temperature range over a 100-year period” or “can we look at the variance between species in how high they fly and calculate how surprising it would be for a species to make it out of the atmosphere”

And all of reality is standing somewhere else, going on ahead to do its own thing.

Now maybe this is not the Carl and Paul viewpoint but if so I don’t understand how not. It’s not that viewpoint plus a much narrower view of relevance, because AI Impacts got sent out to measure bridge sizes.

I go ahead and talk about these subjects, in part because maybe I can figure out some way to unravel the viewpoint on its own terms, in part because maybe Carl and Paul can show that they have a style of thinking that works in its own right and that I don’t understand, and in part because people like Paul’s nonconcrete cheerful writing better and prefer to live there mentally and I have to engage on their terms because they sure won’t engage on mine.

But I do not actually think that bridge lengths or atomic weapons have anything to do with this.

Carl and Paul may be doing something sophisticated but wordless, where they fit a sophisticated but wordless universal model of technological permittivity to bridge lengths, then have a wordless model of cognitive scaling in the back of their minds, then get a different prediction of Final Days behavior, then come back to me and say, “Well, if you’ve got such a different prediction of Final Days behavior, can you show me some really large bridges?”

But this is not spelled out in the writing – which, I do emphasize, is a social observation that would be predicted regardless, because other people have not invested a ton of character points in the ability to spell things out, and a supersupermajority would just plain lack the writing talent for it.

And what other EAs reading it are thinking, I expect, is plain old Robin-Hanson-style reference class tennis of “Why would you expect intelligence to scale differently from bridges, where are all the big bridges?”

[Cotra][11:36][11:40]

(Just want to interject that Carl has higher P(doom) than Paul and has also critiqued Paul for not being more concrete, and I doubt that this is the source of the common disagreements that Paul/Carl both have with Eliezer)

From my perspective the thing the AI impacts investigation is asking is something like “When people are putting lots of resources into improving some technology, how often is it the case that someone can find a cool innovation that improves things a lot relative to the baseline?” I think that your response to that is something like “Sure, if the broad AI market were efficient and everyone were investigating the right lines of research, then AI progress might be smooth, but AGI would have also been developed way sooner. We can’t safely assume that AGI is like an industry where lots of people are pushing toward the same thing”

But it’s not assuming a great structural similarity between bridges and AI, except that they’re both things that humans are trying hard to find ways to improve

[Yudkowsky][11:42]

I can imagine writing responses like that, if I was engaging on somebody else’s terms. As with Eliezer-2012’s engagement with Pat Modesto against the careful proof that HPMOR cannot possibly become one of the measurably most popular fanfictions, I would never think anything like that inside my own brain.

Maybe I just need to do a thing that I have not done before, and set my little $6000 Roth IRA to track a bunch of investments that Carl and/or Paul tell me to make, so that my brain will actually track the results, and I will actually get a chance to see this weird style of reasoning produce amazing results.

[Bensinger][11:44]

Sure, if the broad AI market were efficient and everyone were investigating the right lines of research, then AI progress might be smooth

Presumably also “‘AI progress’ subsumes many different kinds of cognition, we don’t currently have baby AGIs, and when we do figure out how to build AGI the very beginning of the curve (the Wright flyer moment, or something very shortly after) will correspond to a huge capability increase.”

[Yudkowsky][11:46]

I think there’s some much larger scale in which it’s worth mentioning that on my own terms of engagement I do not naturally think like this. I don’t feel like you could get Great Insight by figuring out what the predecessor technologies must have been of the Wright Flyer, finding industries that were making use of them, and then saying Behold the Heralds of the Wright Flyer. It’s not a style of thought binding upon reality.

They built the Wright Flyer. It flew. Previous stuff didn’t fly. It happens. Even if you yell a lot at reality and try to force it into an order, that’s still what your actual experience of the surprising Future will be like, you’ll just be more surprised by it.

Like you can super want Technologies to be Heralded by Predecessors which were Also Profitable but on my native viewpoint this is, like, somebody with a historical axe to grind, going back and trying to make all the history books read like this, when I have no experience of people who were alive at the time making gloriously correct futuristic predictions using this kind of thinking.

[Cotra][11:53]

I think Paul’s view would say:

  • Things certainly happen for the first time
  • When they do, they happen at small scale in shitty prototypes, like the Wright Flyer or GPT-1 or AlphaGo or the Atari bots or whatever
  • When they’re making a big impact on the world, it’s after a lot of investment and research, like commercial aircrafts in the decades after Kitty Hawk or like the investments people are in the middle of making now with AI that can assist with coding

Paul’s view says that the Kitty Hawk moment already happened for the kind of AI that will be super transformative and could kill us all, and like the historical Kitty Hawk moment, it was not immediately a huge deal

[Yudkowsky][11:56]

There is, I think, a really basic difference of thinking here, which is that on my view, AGI erupting is just a Thing That Happens and not part of a Historical Worldview or a Great Trend.

Human intelligence wasn’t part of a grand story reflected in all parts of the ecology, it just happened in a particular species.

Now afterwards, of course, you can go back and draw all kinds of Grand Trends into which this Thing Happening was perfectly and beautifully fitted, and yet, it does not seem to me that people have a very good track record of thereby predicting in advance what surprising news story they will see next – with some rare, narrow-superforecasting-technique exceptions, like the Things chart on a steady graph and we know solidly what a threshold on that graph corresponds to and that threshold is not too far away compared to the previous length of the chart.

One day the Wright Flyer flew. Anybody in the future with benefit of hindsight, who wanted to, could fit that into a grand story about flying, industry, travel, technology, whatever; if they’ve been on the ground at the time, they would not have thereby had much luck predicting the Wright Flyer. It can be fit into a grand story but on the ground it’s just a thing that happened. It had some prior causes but it was not thereby constrained to fit into a storyline in which it was the plot climax of those prior causes.

My worldview sure does permit there to be predecessor technologies and for them to have some kind of impact and for some company to make a profit, but it is not nearly as interested in that stuff, on a very basic level, because it does not think that the AGI Thing Happening is the plot climax of a story about the Previous Stuff Happening.

[Cotra][12:01]

The fact that you express this kind of view about AGI erupting one day is why I thought your thing in IEM was saying there was a major algorithmic innovation from chimps to humans, that humans were qualitatively and not just quantitatively better than chimps and this was not because of their larger brain size primarily. But I’m confused because up thread in the discussion of evolution you were emphasizing much more that there was an innovation between dinosaurs and primates, not that there was an innovation between chimps and humans, and you seemed more open to the chimp/human diff being quantitative and brain-size driven than I had thought you’d be. But being open to the chimp-human diff being quantitative/brain-size-driven suggests to me that you should be more open than you are to AGI being developed by slow grinding on the same shit, instead of erupting without much precedent?

[Yudkowsky][12:01]

I think you’re confusing a meta-level viewpoint with an object-level viewpoint.

The Wright Flyer does not need to be made out of completely different materials from all previous travel devices, in order for the Wright Flyer to be a Thing That Happened One Day which wasn’t the plot climax of a grand story about Travel and which people at the time could not have gotten very far in advance-predicting by reasoning about which materials were being used in which conveyances and whether those conveyances looked like they’d be about to start flying.

It is the very viewpoint to which I am objecting, which keeps on asking me, metaphorically speaking, to explain how the Wright Flyer could have been made of completely different materials in order for it to be allowed to be so discontinuous with the rest of the Travel story of which it is part.

On my viewpoint they’re just different stories so the Wright Flyer is allowed to be its own thing even though it is not made out of an unprecedented new kind of steel that floats.

[Cotra][12:06]

The claim I’m making is that Paul’s view predicts a lag and a lot of investment between the first flight and aircraft making a big impact on the travel industry, and predicts that the first flight wouldn’t have immediately made a big impact on the travel industry. In other words Kitty Hawk isn’t a discontinuity in the Paul view because the metrics he’d expect to be continuous are the ones that large numbers of people are trying hard to optimize, like cost per mile traveled or whatnot, not metrics that almost nobody is trying to optimize, like “height flown.”

In other words, it sounds like you’re saying:

  • Kitty Hawk is analogous to AGI erupting
  • Previous history of travel is analogous to pre-AGI history of AI

While Paul is saying:

  • Kitty Hawk is analogous to e.g. AlexNet
  • Later history of aircraft is analogous to the post-AlexNet story of AI which we’re in the middle of living, and will continue on to make huge Singularity-causing impacts on the world

[Yudkowsky][12:09]

Well, unfortunately, Paul and I both seem to believe that our models follow from observing the present-day world, rather than being incompatible with it, and so when we demand of each other that we produce some surprising bold prediction about the present-day world, we both tend to end up disappointed.

I would like, of course, for Paul’s surprisingly narrow vision of a world governed by tightly bound stories and predictable trends, to produce some concrete bold prediction of the next few years which no ordinary superforecaster would produce, but Paul is not under the impression that his own worldview is similarly strange and narrow, and so has some difficulty in answering this request.

[Cotra][12:09]

But Paul offered to bet with you about literally any quantity you choose?

[Yudkowsky][12:10]

I did assume that required an actual disagreement, eg, I cannot just go look up something superforecasters are very confident about and then demand Paul to bet against it.

[Cotra][12:12]

It still sounds to me like “take a basket of N performance metrics, bet that the model size to perf trend will break upward in > K of them within e.g. 2 or 3 years” should sound good to you, I’m confused why that didn’t. If it does and it’s just about the legwork then I think we could get someone to come up with the benchmarks and stuff for you

Or maybe the same thing but >K of them will break downward, whatever

We could bet about the human perception of sense in language models, for example

[Yudkowsky][12:14]

I am nervous about Paul’s definition of “break” and the actual probabilities to be assigned. You see, both Paul and I think our worldview is a very normal one that matches current reality quite well, so when we are estimating parameters like these, Paul is liable to do it empirically, and I am also liable to do it empirically as my own baseline, and if I point to a trend over time in how long it takes to go from par-human to superhuman performance decreasing, Imaginary Paul says “Ah, yes, what a fine trend, I will bet that things follow this trend” and Eliezer says “No that is MY trend, you don’t get to follow it, you have to predict that par-human to superhuman time will be constant” and Paul is like “lol no I get to be a superforecaster and follow trends” and we fail to bet.

Maybe I’m wrong in having mentally played the game out ahead that far, for it is, after all, very hard to predict the Future, but that’s where I’d foresee it failing.

[Cotra][12:16]

I don’t think you need to bet about calendar times from par-human to super-human, and any meta-trend in that quantity. It sounds like Paul is saying “I’ll basically trust the model size to perf trends and predict a 10x bigger model from the same architecture family will get the perf the trends predict,” and you’re pushing back against that saying e.g. that humans won’t find GPT-4 to be subjectively more coherent than GPT-3 and that Paul is neglecting that there could be major innovations in the future that bring down the FLOP/s to get a certain perf by a lot and bend the scaling laws. So why not bet that Paul won’t be as accurate as he thinks he is by following the scaling laws?

[Bensinger][12:17]

I think Paul’s view would say:

  • Things certainly happen for the first time
  • When they do, they happen at small scale in shitty prototypes, like the Wright Flyer or GPT-1 or AlphaGo or the Atari bots or whatever
  • When they’re making a big impact on the world, it’s after a lot of investment and research, like commercial aircrafts in the decades after Kitty Hawk or like the investments people are in the middle of making now with AI that can assist with coding

Paul’s view says that the Kitty Hawk moment already happened for the kind of AI that will be super transformative and could kill us all, and like the historical Kitty Hawk moment, it was not immediately a huge deal

“When they do, they happen at small scale in shitty prototypes, like the Wright Flyer or GPT-1 or AlphaGo or the Atari bots or whatever”

How shitty the prototype is should depend (to a very large extent) on the physical properties of the tech. So I don’t find it confusing (though I currently disagree) when someone says “I looked at a bunch of GPT-3 behavior and it’s cognitively sophisticated enough that I think it’s doing basically what humans are doing, just at a smaller scale. The qualitative cognition I can see going on is just that impressive, taking into account the kinds of stuff I think human brains are doing.”

What I find confusing is, like, treating ten thousand examples of non-AI, non-cognitive-tech continuities (nukes, building heights, etc.) as though they’re anything but a tiny update about ‘will AGI be high-impact’ — compared to the size of updates like ‘look at how smart and high-impact humans were’ and perhaps ‘look at how smart-in-the-relevant-ways GPT-3 is’.

Like, impactfulness is not a simple physical property, so there’s not much reason for different kinds of tech to have similar scales of impact (or similar scales of impact n years after the first prototype). Mainly I’m not sure to what extent we disagree about this, vs. this just being me misunderstanding the role of the ‘most things aren’t high-impact’ argument.

(And yeah, a random historical technology drawn from a hat will be pretty low-impact. But that base rate also doesn’t seem to me like it has much evidential relevance anymore when I update about what specific tech we’re discussing.)

[Cotra][12:18]

The question is not “will AGI be high impact” — Paul agrees it will, and for any FOOM quantity (like crossing a chimp-to-human-sized gap in a day or whatever) he agrees that will happen eventually too.

The technologies studies in the dataset spanned a wide range in their peak impact on society, and they’re not being used to forecast the peak impact of mature AI tech

[Bensinger][12:19]

Yeah, I’m specifically confused about how we know that the AGI Wright Flyer and its first successors are low-impact, from looking at how low-impact other technologies are (if that is in fact a meaningful-sized update on your view)

Not drawing a comparison about the overall impactfulness of AI / AGI (e.g., over fifteen years)

[Yudkowsky][12:21]

[So why not bet that Paul won’t be as accurate as he thinks he is by following the scaling laws?]

I’m pessimistic about us being able to settle on the terms of a bet like that (and even more so about being able to bet against Carl on it) but in broad principle I agree. The trouble is that if a trend is benchmarkable, I believe more in the trend continuing at least on the next particular time, not least because I believe in people Goodharting benchmarks.

I expect a human sense of intelligence to be harder to fool (even taking into account that it’s being targeted to a nonzero extent) but I also expect that to be much harder to measure and bet upon than the Goodhartable metrics. And I think our actual disagreement is more visible over portfolios of benchmarks breaking upward over time, but I also expect that if you ask Paul and myself to quantify our predictions, we both go, “Oh, my theory is the one that fits ordinary reality so obviously I will go look at superforecastery trends over ordinary reality to predict this specifically” and I am like, “No, Paul, if you’d had to predict that without looking at the data, your worldview would’ve predicted trends breaking down less often” and Paul is like “But Eliezer, shouldn’t you be predicting much more upward divergence than this.”

Again, perhaps I’m being overly gloomy.

[Cotra][12:23]

I think we should try to find ML predictions where you defer to superforecasters and Paul disagrees, since he said he would bet against superforecasters in ML

[Yudkowsky][12:24]

I am also probably noticeably gloomier and less eager to bet because the whole fight is taking place on grounds that Paul thinks is important and part of a connected story that continuously describes ordinary reality, and that I think is a strange place where I can’t particularly see how Paul’s reasoning style works. So I’d want to bet against Paul’s overly narrow predictions by using ordinary superforecasting, and Paul would like to make his predictions using ordinary superforecasting.

I am, indeed, more interested in a place where Paul wants to bet against superforecasters. I am not guaranteeing up front I’ll bet with them because superforecasters did not call AlphaGo correctly and I do not think Paul has zero actual domain expertise. But Paul is allowed to pick up generic epistemic credit including from me by beating superforecasters because that credit counts toward believing a style of thought is even working literally at all; separately from the question of whether Paul’s superforecaster-defying prediction also looks like a place where I’d predict in some opposite direction.

Definitely, places where Paul disagrees with superforecasters are much more interesting places to mine for bets.

I am happy to hear about those.

[Cotra][12:27]

I think what Paul was saying last night is you find superforecasters betting on some benchmark performance, and he just figures out which side he’d take (and he expects in most/all superforecaster predictions that he would not be deferential, there’s a side he would take)

 

10.3. Predictions and betting (continued)

 

[Christiano][12:29]

not really following along with the conversation, but my desire to bet about “whatever you want” was driven in significant part by frustration with Eliezer repeatedly saying things like “people like Paul get surprised by reality” and me thinking that’s nonsense

[Yudkowsky][12:29]

So the Yudkowskian viewpoint is something like… trends in particular technologies held fixed, will often break down; trends in Goodhartable metrics, will often stay on track but come decoupled from their real meat; trends across multiple technologies, will experience occasional upward breaks when new algorithms on the level of Transformers come out. For me to bet against superforecasters I have to see superforecasters saying something different, which I do not at this time actually know to be the case. For me to bet against Paul betting against superforecasters, the different thing Paul says has to be different from my own direction of disagreement with superforecasters.

[Christiano][12:30]

I still think that if you want to say “this sort of reasoning is garbage empirically” then you ought to be willing to bet about something. If we are just saying “we agree about all of the empirics, it’s just that somehow we have different predictions about AGI” then that’s fine and symmetrical.

[Yudkowsky][12:30]

I have been trying to revise that towards a more nvc “when I try to operate this style of thought myself, it seems to do a bad job of retrofitting and I don’t understand how it says X but not Y”.

[Christiano][12:30]

even then presumably if you think it’s garbage you should be able to point to some particular future predictions where it would be garbage?

if you used it

and then I can either say “no, I don’t think that’s a valid application for reason X” or “sure, I’m happy to bet”

and it’s possible you can’t find any places where it sticks its neck out in practice (even in your version), but then I’m again just rejecting the claim that it’s empirically ruled out

[Yudkowsky][12:31]

I also think that we’d have an easier time betting if, like, neither of us could look at graphs over time, but we were at least told the values in 2010 and 2011 to anchor our estimates over one year, or something like that.

Though we also need to not have a bunch of existing knowledge of the domain which is hard.

[Christiano][12:32]

I think this might be derailing some broader point, but I am provisionally mostly ignoring your point “this doesn’t work in practice” if we can’t find places where we actually foresee disagreements

(which is fine, I don’t think it’s core to your argument)

[Yudkowsky][12:33]

Paul, you’ve previously said that you’re happy to bet against ML superforecasts. That sounds promising. What are examples of those? Also I must flee to lunch and am already feeling sort of burned and harried; it’s possible I should not ignore the default doomedness of trying to field questions from multiple sources.

[Christiano][12:33]

I don’t know if superforecasters make public bets on ML topics, I was saying I’m happy to bet on ML topics and if your strategy is “look up what superforecasters say” that’s fine and doesn’t change my willingness to bet

I think this is probably not as promising as either (i) dig in on the arguments that are most in dispute (seemed to be some juicier stuff earlier though I’m just focusing on work today) , or (ii) just talking generally about what we expect to see in the next 5 years so that we can at least get more of a vibe looking back

[Shulman][12:35]

You can bet on the Metaculus AI Tournament forecasts.

https://www.metaculus.com/ai-progress-tournament/

[Yudkowsky][13:13]

I worry that trying to jump straight ahead to Let’s Bet is being too ambitious too early on a cognitively difficult problem of localizing disagreements.

Our prophecies of the End Times’s modal final days seem legit different; my impulse would be to try to work that backwards, first, in an intuitive sense of “well which prophesied world would this experience feel more like living in?”, and try to dig deeper there before deciding that our disagreements have crystallized into short-term easily-observable bets.

We both, weirdly enough, feel that our current viewpoints are doing a great job of permitting the present-day world, even if, presumably, we both think the other’s worldview would’ve done worse at predicting that world in advance. This cannot be resolved in an instant by standard techniques known to me. Let’s try working back from the End Times instead.

I have already stuck out my neck a little and said that, as we start to go past $50B invested in a model, we are starting to live at least a little more in what feels like the Paulverse, not because my model prohibits this, but because, or so I think, Paul’s model more narrowly predicts it.

It does seem like the sort of generically weird big thing that could happen, to me, even before the End Times, there are corporations that could just decide to do that; I am hedging around this exactly because it does feel to my gut like that is a kind of headline I could read one day and have it still be years before the world ended, so I may need to be stingy with those credibility points inside of what I expect to be reality.

But if we get up to $10T to train a model, that is much more strongly Paulverse; it’s not that this falsifies the Eliezerverse considered in isolation, but it is much more narrowly characteristic of the Words of Paul coming to pass; it feels much more to my gut that, in agreeing to this, I am not giving away Bayes points inside my own mainline.

If ordinary salaries for ordinary fairly-good programmers get up to $20M/year, this is not prohibited by my AI models per se; but it sure sounds like the world becoming less ordinary than I expected it to stay, and like it is part of Paul’s Prophecy much more strongly than it is part of Eliezer’s Prophecy.

That’s two ways that I could concede a great victory to the Paulverse. They both have the disadvantages (from my perspective) that the Paulverse, though it must be drawing probability mass from somewhere in order to stake it there, is legitimately not – so far as I know – forced to claim that these things happen anytime soon. So they are ways for the Paulverse to win, but not ways for the Eliezerverse to win.

That I have said even this much, I claim, puts Paul in at least a little tiny bit of debt to me epistemic-good-behavior-wise; he should be able to describe events which would start to make him worry he was living in the Eliezerverse, even if his model did not narrowly rule them out, and even if those events had not been predicted by the Eliezerverse to occur within a narrowly prophesied date such that they would not thereby form a bet the Eliezerverse could clearly lose as well as win.

I have not had much luck in trying to guess what the real Paul will say about issues like this one. My last attempt was to say, “Well, what shouldn’t happen, besides the End Times themselves, before world GDP has doubled over a four-year period?” And Paul gave what seems to me like an overly valid reply, which, iirc and without looking it up, was along the lines of, “well, nothing that would double world GDP in a 1-year period”.

When I say this is overly valid, I mean that it follows too strongly from Paul’s premises, and he should be looking for something less strong than that on which to make a beginning discovery of disagreement – maybe something which Paul’s premises don’t strongly forbid to him, but which nonetheless looks more like the Eliezerverse or like it would be relatively more strongly predicted by Eliezer’s Prophecy.

I do not model Paul as eagerly or strongly agreeing with, say, “The Riemann Hypothesis should not be machine-proven” or “The ABC Conjecture should not be machine-proven” before world GDP has doubled. It is only on Eliezer’s view that proving the Riemann Hypothesis is about as much of a related or unrelated story to AGI, as are particular benchmarks of GDP.

On Paul’s view as I am trying to understand and operate it, this benchmark may be correlated with AGI in time in the sense that most planets wouldn’t do it during the Middle Ages before they had any computers, but it is not part of the story of AGI, it is not part of Paul’s Prophecy; because it doesn’t make a huge amount of money and increase GDP and get a huge ton of money flowing into investments in useful AI.

(From Eliezer’s perspective, you could tell a story about how a stunning machine proof of the Riemann Hypothesis got Bezos to invest $50 billion in training a successor model and that was how the world ended, and that would be a just-as-plausible model as some particular economic progress story, of how Stuff Happened Because Other Stuff Happened; it sounds like the story of OpenAI or of Deepmind’s early Atari demo, which is to say, it sounds to Eliezer like history. Whereas on Eliezer!Paul’s view, that’s much more of a weird coincidence because it involves Bezos’s unforced decision rather than the economic story of which AGI is capstone, or so it seems to me trying to operate Paul’s view.)

And yet Paul might still, I hope, be able to find something like “The Riemann Hypothesis is machine-proven”, which even though it is not very much of an interesting part of his own Prophecy because it’s not part of the economic storyline, sounds to him like the sort of thing that the Eliezerverse thinks happens as you get close to AGI, which the Eliezerverse says is allowed to start happening way before world GDP would double in 4 years; and as it happens I’d agree with that characterization of the Eliezerverse.

So Paul might say, “Well, my model doesn’t particularly forbid that the Riemann Hypothesis gets machine-proven before world GDP has doubled in 4 years or even started to discernibly break above trend by much; but that does sound more like we are living in the Eliezerverse than in the Paulverse.”

I am not demanding this particular bet because it seems to me that the Riemann Hypothesis may well prove to be unfairly targetable for current ML techniques while they are still separated from AGI by great algorithmic gaps. But if on the other hand Paul thinks that, I dunno, superhuman performance on stuff like the Riemann Hypothesis does tend to be more correlated with economically productive stuff because it’s all roughly the same kind of capability, and lol never mind this “algorithmic gap” stuff, then maybe Paul is willing to pick that example; which is all the better for me because I do suspect it might decouple from the AI of the End, and so I think I have a substantial chance of winning and being able to say “SEE!” to the assembled EAs while there’s still a year or two left on the timeline.

I’d love to have credibility points on that timeline, if Paul doesn’t feel as strong an anticipation of needing them.

[Christiano][15:43]

1/3 that RH has an automated proof before sustained 7%/year GWP growth?

I think the clearest indicator is that we have AI that ought to be able to e.g. run the fully automated factory-building factory (not automating mines or fabs, just the robotic manufacturing and construction), but it’s not being deployed or is being deployed with very mild economic impacts

another indicator is that we have AI systems that can fully replace human programmers (or other giant wins), but total investment in improving them is still small

another indicator is a DeepMind demo that actually creates a lot of value (e.g. 10x larger than DeepMind’s R&D costs? or even comparable to DeepMind’s cumulative R&D costs if you do the accounting really carefully and I definitely believe it and it wasn’t replaceable by Brain), it seems like on your model things should “break upwards” and in mine that just doesn’t happen that much

sounds like you may have >90% on automated proof of RH before a few years of 7%/year growth driven by AI? so that would give a pretty significant odds ratio either way

I think “stack more layers gets stuck but a clever idea makes crazy stuff happen” is generally going to be evidence for your view

That said, I’d mostly reject AlphaGo as an example, because it’s just plugging in neural networks to existing go algorithms in almost the most straightforward way and the bells and whistles don’t really matter. But if AlphaZero worked and AlphaGo didn’t, and the system accomplished something impressive/important (like proving RH, or being significantly better at self-contained programming tasks), then that would be a surprise.

And I’d reject LSTM -> transformer or MoE as an example because the quantitative effect size isn’t that big.

But if something like that made the difference between “this algorithm wasn’t scaling before, and now it’s scaling,” then I’d be surprised.

And the size of jump that surprises me is shrinking over time. So in a few years even getting the equivalent of a factor of 4 jump from some clever innovation would be very surprising to me.

[Yudkowsky][17:44]

sounds like you may have >90% on automated proof of RH before a few years of 7%/year growth driven by AI? so that would give a pretty significant odds ratio either way

I emphasize that this is mostly about no on the GDP growth before the world ending, rather than yes on the RH proof, i.e., I am not 90% on RH before the end of the world at all. Not sure I’m over 50% on it happening before the end of the world at all.

Should it be a consequence of easier earlier problems than full AGI? Yes, on my mainline model; but also on my model, it’s a particular thing and maybe the particular people and factions doing stuff don’t get around to that particular thing.

I guess if I stare hard at my brain it goes ‘ehhhh maybe 65% if timelines are relatively long and 40% if it’s like the next 5 years’, because the faster stuff happens, the less likely anyone is to get around to proving RH in particular or announcing that they’ve done so if they did.

And if the econ threshold is set as low as 7%/yr, I start to worry about that happening in longer-term scenarios, just because world GDP has never been moving at a fixed rate over a log chart. the “driven by AI” part sounds very hard to evaluate. I want, I dunno, some other superforecaster or Carl to put a 90% credible bound on ‘when world GDP growth hits 7% assuming little economically relevant progress in AI’ before I start betting at 80%, let alone 90%, on what should happen before then. I don’t have that credible bound already loaded and I’m not specialized in it.

I’m wondering if we’re jumping ahead of ourselves by trying to make a nice formal Bayesian bet, as prestigious as that might be. I mean, your 1/3 was probably important for you to say, as it is higher than I might have hoped, and I’d ask you if you really mean for that to be an upper bound on your probability or if that’s your actual probability.

But, more than that, I’m wondering if, in the same vague language I used before, you’re okay with saying a little more weakly, “RH proven before big AI-driven growth in world GDP, sounds more Eliezerverse than Paulverse.”

It could be that this is just not actually true because you do not think that RH is coupled to econ stuff in the Paul Prophecy one way or another, and my own declarations above do not have the Eliezerverse saying it enough more strongly than that. If you don’t actually see this as a distinguishing Eliezerverse thing, if it wouldn’t actually make you say “Oh no maybe I’m in the Eliezerverse”, then such are the epistemic facts.

And the size of jump that surprises me is shrinking over time. So in a few years even getting the equivalent of a factor of 4 jump from some clever innovation would be very surprising to me.

This sounds potentially more promising to me – seems highly Eliezerverse, highly non-Paul-verse according to you, and its negation seems highly oops-maybe-I’m-in-the-Paulverse to me too. How many years is a few? How large a jump is shocking if it happens tomorrow?

 

11. September 24 conversation

 

11.1. Predictions and betting (continued 2)

 

[Christiano][13:15]

I think RH is not that surprising, it’s not at all clear to me where “do formal math” sits on the “useful stuff AI could do” spectrum, I guess naively I’d put it somewhere “in the middle” (though the analogy to board games makes it seem a bit lower, and there is a kind of obvious approach to doing this that seems to be working reasonably well so that also makes it seem lower), and 7% GDP growth is relatively close to the end (ETA: by “close to the end” I don’t mean super close to the end, just far enough along that there’s plenty of time for RH first)

I do think that performance jumps are maybe more dispositive, but I’m afraid that it’s basically going to go like this: there won’t be metrics that people are tracking that jump up, but you’ll point to new applications that people hadn’t considered before, and I’ll say “but those new applications aren’t that valuable” whereas to you they will look more analogous to a world-ending AGI coming out from the blue

like for AGZ I’ll be like “well it’s not really above the deep learning trend if you run it backwards” and you’ll be like “but no one was measuring it before! you can’t make up the trend in retrospect!” and I’ll be like “OK, but the reason no one was measuring it before was that it was worse than traditional go algorithms until like 2 years ago and the upside is not large enough that you should expect a huge development effort for a small edge”

[Yudkowsky][13:43]

“factor of 4 jump from some clever innovation” – can you say more about that part?

[Christiano][13:53]

like I’m surprised if a clever innovation does more good than spending 4x more compute

[Yudkowsky][15:04]

I worry that I’m misunderstanding this assertion because, as it stands, it sounds extremely likely that I’d win. Would transformers vs. CNNs/RNNs have won this the year that the transformers paper came out?

[Christiano][15:07]

I’m saying that it gets harder over time, don’t expect wins as big as transformers

I think even transformers probably wouldn’t make this cut though?

certainly not vs CNNs

vs RNNs I think the comparison I’d be using to operationalize it is translation, as measured in the original paper

they do make this cut for translation, looks like the number is like 100 >> 4

100x for english-german, more like 10x for english-french, those are the two benchmarks they cite

but both more than 4x

I’m saying I don’t expect ongoing wins that big

I think the key ambiguity is probably going to be about what makes a measurement established/hard-to-improve

[Yudkowsky][15:21]

this sounds like a potentially important point of differentiation; I do expect more wins that big.

the main thing that I imagine might make a big difference to your worldview, but not mine, is if the first demo of the big win only works slightly better (although that might also be because they were able to afford much less compute than the big players, which I think your worldview would see as a redeeming factor for my worldview?) but a couple of years later might be 4x or 10x as effective per unit compute (albeit that other innovations would’ve been added on by then to make the first innovation work properly, which I think on your worldview is like The Point or something)

clarification: by “transformers vs CNNs” I don’t mean transformers on ImageNet, I mean transformers vs. contemporary CNNs, RNNs, or both, being used on text problems.

I’m also feeling a bit confused because eg Standard Naive Kurzweilian Accelerationism makes a big deal about the graphs keeping on track because technologies hop new modes as needed. what distinguishes your worldview from saying that no further innovations are needed for AGI or will give a big compute benefit along the way? is it that any single idea may only ever produce a smaller-than-4X benefit? is it permitted that a single idea plus 6 months of engineering fiddly details produce a 4X benefit?

all this aside, “don’t expect wins as big as transformers” continues to sound to me like a very promising point for differentiating Prophecies.

[Christiano][15:50]

I think the relevant feature of the innovation is that the work to find it is small relative to the work that went into the problem to date (though there may be other work on other avenues)

[Yudkowsky][15:52]

in, like, a local sense, or a global sense? if there’s 100 startups searching for ideas collectively with $10B of funding, and one of them has an idea that’s 10x more efficient per unit compute on billion-dollar problems, is that “a small amount of work” because it was only a $100M startup, or collectively an appropriate amount of work?

[Christiano][15:53]

I’m calling that an innovation because it’s a small amount of work

[Yudkowsky][15:54]

(maybe it would be also productive if you pointed to more historical events like Transformers and said ‘that shouldn’t happen again’, because I didn’t realize there was anything you thought was like that. AlphaFold 2?)

[Christiano][15:54]

like, it’s not just a claim about EMH, it’s also a claim about the nature of progress

I think AlphaFold counts and is probably if anything a bigger multiplier, it’s just uncertainty over how many people actually worked on the baselines

[Yudkowsky][15:54]

when should we see headlines like those subside?

[Christiano][15:55]

I mean, I think they are steadily subsiding

as areas grow

[Yudkowsky][15:55]

have they already begun to subside relative to 2016, on your view?

(guess that was ninjaed)

[Christiano][15:55]

I would be surprised to see a 10x today on machine translation

[Yudkowsky][15:55]

where that’s 10x the compute required to get the same result?

[Christiano][15:55]

though not so surprised that we can avoid talking about probabilities

yeah

or to make it more surprising, old sota with 10x less compute

[Yudkowsky][15:56]

yeah I was about to worry that people wouldn’t bother spending 10x the cost of a large model to settle our bet

[Christiano][15:56]

I’m more surprised if they get the old performance with 10x less compute though, so that way around is better on all fronts

[Yudkowsky][15:57]

one reads papers claiming this all the time, though?

[Christiano][15:57]

like, this view also leads me to predict that if I look at the actual amount of manpower that went into alphafold, it’s going to be pretty big relative to the other people submitting to that protein folding benchmark

[Yudkowsky][15:57]

though typically for the sota of 2 years ago

[Christiano][15:58]

not plausible claims on problems people care about

I think the comparison is to contemporary benchmarks from one of the 99 other startups who didn’t find the bright idea

that’s the relevant thing on your view, right?

[Yudkowsky][15:59]

I would expect AlphaFold and AlphaFold 2 to involve… maybe 20 Deep Learning researchers, and for 1-3 less impressive DL researchers to have been the previous limit, if the field even tried that much; I would not be the least surprised if DM spent 1000x the compute on AlphaFold 2, but I’d be very surprised if the 1-3 large research team could spend that 1000x compute and get anywhere near AlphaFold 2 results.

[Christiano][15:59]

and then I’m predicting that number is already <10 for machine translation and falling (maybe I shouldn’t talk about machine translation or at least not commit to numbers given that I know very little about it, but whatever that’s my estimate), and for other domains it will be <10 by the time they get as crowded as machine translation, and for transformative tasks they will be <2

isn’t there an open-source replication of alphafold?

we could bet about its performance relative to the original

[Yudkowsky][16:00]

it is enormously easier to do what’s already been done

[Christiano][16:00]

I agree

[Yudkowsky][16:00]

I believe the open-source replication was by people who were told roughly what Deepmind had done, possibly more than roughly

on the Yudkowskian view, those 1-3 previous researchers just would not have thought of doing things the way Deepmind did them

[Christiano][16:01]

anyway, my guess is generally that if you are big relative to previous efforts in the area you can make giant improvements, if you are small relative to previous efforts you might get lucky (or just be much smarter) but that gets increasingly unlikely as the field gets bigger

like alexnet and transformers are big wins by groups who are small relative to the rest of the field, but transformers are much smaller than alexnet and future developments will continue to shrink

[Yudkowsky][16:02]

but if you’re the same size as previous efforts and don’t have 100x the compute, you shouldn’t be able to get huge improvements in the Paulverse?

[Christiano][16:03]

I mean, if you are the same size as all the prior effort put together?

I’m not surprised if you can totally dominate in that case, especially if prior efforts aren’t well-coordinated

and for things that are done by hobbyists, I wouldn’t be surprised if you can be a bit bigger than an individual hobbyist and dominate

[Yudkowsky][16:03]

I’m thinking something like, if Deepmind comes out with an innovation such that it duplicates old SOTA on machine translation with 1/10th compute, that still violates the Paulverse because Deepmind is not Paul!Big compared to all MTL efforts

though I am not sure myself how seriously Earth is taking MTL in the first place

[Christiano][16:04]

yeah, I think if DeepMind beats Google Brain by 10x compute next year on translation, that’s a significant strike against Paul

[Yudkowsky][16:05]

I know that Google offers it for free, I expect they at least have 50 mediocre AI people working on it, I don’t know whether or not they have 20 excellent AI people working on it and if they’ve ever tried training a 200B parameter non-MoE model on it

[Christiano][16:05]

I think not that seriously, but more seriously than 2016 and than anything else where you are seeing big swings

and so I’m less surprised than for TAI, but still surprised

[Yudkowsky][16:06]

I am feeling increasingly optimistic that we have some notion of what it means to not be within the Paulverse! I am not feeling that we have solved the problem of having enough signs that enough of them will appear to tell EA how to notice which universe it is inside many years before the actual End Times, but I sure do feel like we are making progress!

things that have happened in the past that you feel shouldn’t happen again are great places to poke for Eliezer-disagreements!

[Christiano][16:07]

I definitely think there’s a big disagreement here about what to expect for pre-end-of-days ML

but lots of concerns about details like what domains are crowded enough to be surprising and how to do comparisons

I mean, to be clear, I think the transformer paper having giant gains is also evidence against paulverse

it’s just that there are really a lot of datapoints, and some of them definitely go against paul’s view

to me it feels like the relevant thing for making the end-of-days forecast is something like “how much of the progress comes from ‘innovations’ that are relatively unpredictable and/or driven by groups that are relatively small, vs scaleup and ‘business as usual’ progress in small pieces?”

 

11.2. Performance leap scenario

 

[Yudkowsky][16:09]

my heuristics tell me to try wargaming out a particular scenario so we can determine in advance which key questions Paul asks

in 2023, Deepmind releases an MTL program which is suuuper impressive. everyone who reads the MTL of, say, a foreign novel, or uses it to conduct a text chat with a contractor in Indonesia, is like, “They’ve basically got it, this is about as good as a human and only makes minor and easily corrected errors.”

[Christiano][16:12]

I mostly want to know how good Google’s translation is at that time; and if DeepMind’s product is expensive or only shows gains for long texts, I want to know whether there is actually an economic niche for it that is large relative to the R&D cost.

like I’m not sure whether anyone works at all on long-text translation, and I’m not sure if it would actually make Google $ to work on it

great text chat with contractor in indonesia almost certainly meets that bar though

[Yudkowsky][16:14]

furthermore, Eliezer and Paul publicized their debate sufficiently to some internal Deepmind people who spoke to the right other people at Deepmind, that Deepmind showed a graph of loss vs. previous-SOTA methods, and Deepmind’s graph shows that their thing crosses the previous-SOTA line while having used 12x less compute for inference training.

(note that this is less… salient?… on the Eliezerverse per se, than it is as an important issue and surprise on the Paulverse, so I am less confident about part.)

a nitpicker would note that previous-SOTA metric they used is however from 1 year previously and the new model also uses Sideways Batch Regularization which the 1-year-previous SOTA graph didn’t use. on the other hand, they got 12x rather than 10x improvement so there was some error margin there.

[Christiano][16:15]

I’m OK if they don’t have the benchmark graph as long as they have some evaluation that other people were trying at, I think real-time chat probably qualifies

[Yudkowsky][16:15]

but then it’s harder to measure the 10x

[Christiano][16:15]

also I’m saying 10x less training compute, not inference (but 10x less inference compute is harder)

yes

[Yudkowsky][16:15]

or to know that Deepmind didn’t just use a bunch more compute

[Christiano][16:15]

in practice it seems almost certain that it’s going to be harder to evaluate

though I agree there are really clean versions where they actually measured a benchmark other people work on and can compare training compute directly

(like in the transformer paper)

[Yudkowsky][16:16]

literally a pessimal typo, I meant to specify training vs. inference and somehow managed to type “inference” instead

[Christiano][16:16]

I’m more surprised by the clean version

[Yudkowsky][16:17]

I literally don’t know what you’d be surprised by in the unclean version

was GPT-2 beating the field hard enough that it would have been surprising if they’d only used similar amounts of training compute

?

and how would somebody else judge that for a new system?

[Christiano][16:17]

I’d want to look at either human evals or logprob, I think probably not? but it’s possible it was

[Yudkowsky][16:19]

btw I also feel like the Eliezer model is more surprised and impressed by “they beat the old model with 10x less compute” than by “the old model can’t catch up to the new model with 10x more compute”

the Eliezerverse thinks in terms of techniques that saturate

such that you have to find new techniques for new training to go on helping

[Christiano][16:19]

it’s definitely way harder to win at the old task with 10x less compute

[Yudkowsky][16:19]

but for expensive models it seems really genuinely unlikely to me that anyone will give us this data!

[Christiano][16:19]

I think it’s usually the case that if you scale up far enough past previous sota, you will be able to find tons of techniques needed to make it work at the new scale

but I’m expecting it to be less of a big deal because all experiments will be roughly at the frontier of what is feasible

and so the new thing won’t be able to afford to go 10x bigger

unlike today when we are scaling up spending so fast

but this does make it harder for the next few years at least, which is maybe the key period

(it makes it hard if we are both close enough to the edge that “10x cheaper to get old results” seems unlikely but “getting new results that couldn’t be achieved with 10x more compute and old method” seems likely)

what I basically expect is to (i) roughly know how much performance you get from making models 10x bigger, (ii) roughly know how much someone beat the competition, and then you can compare the numbers

[Yudkowsky][16:22]

well, you could say, not in a big bet-winning sense, but in a mild trend sense, that if the next few years are full of “they spent 100x more on compute in this domain and got much better results” announcements, that is business as usual for the last few years and perfectly on track for the Paulverse; while the Eliezerverse permits but does not mandate that we will also see occasional announcements about brilliant new techniques, from some field where somebody already scaled up to the big models big compute, producing more impressive results than the previous big compute.

[Christiano][16:23]

(but “performance from making models 10x bigger” depends a lot on exactly how big they were and whether you are in a regime with unfavorable scaling)

[Yudkowsky][16:23]

so the Eliezerverse must be putting at least a little less probability mass on business-as-before Paulverse

[Christiano][16:24]

I am also expecting a general scale up in ML training runs over time, though it’s plausible that you also expect that until the end of days and just expect a much earlier end of days

[Yudkowsky][16:24]

I mean, why wouldn’t they?

if they’re purchasing more per unit of compute, they will quite often spend more on total compute (Jevons Paradox)

[Christiano][16:25]

that’s going to kill the “they spent 100x more compute” announcements soon enough

like, that’s easy when “100x more” means $1M, it’s a bit hard when “100x more” means $100M, it’s not going to happen except on the most important tasks when “100x more” means $10B

[Yudkowsky][16:26]

the Eliezerverse is full of weird things that somebody could apply ML to, and doesn’t have that many professionals who will wander down completely unwalked roads; and so is much more friendly to announcements that “we tried putting a lot of work and compute into protein folding, since nobody ever tried doing that seriously with protein folding before, look what came out” continuing for the next decade if the Earth lasts that long

[Christiano][16:27]

I’m not surprised by announcements like protein folding, it’s not that the world overall gets more and more hostile to big wins, it’s that any industry gets more and more hostile as it gets bigger (or across industries, they get more and more hostile as the stakes grow)

[Yudkowsky][16:28]

well, the Eliezerverse has more weird novel profitable things, because it has more weirdness; and more weird novel profitable things, because it has fewer people diligently going around trying all the things that will sound obvious in retrospect; but it also has fewer weird novel profitable things, because it has fewer novel things that are allowed to be profitable.

[Christiano][16:29]

(I mean, the protein folding thing is a datapoint against my view, but it’s not that much evidence and it’s not getting bigger over time)

yeah, but doesn’t your view expect more innovations for any given problem?

like, it’s not just that you think the universe of weird profitable applications is larger, you also think AI progress is just more driven by innovations, right?

otherwise it feels like the whole game is about whether you think that AI-automating-AI-progress is a weird application or something that people will try on

[Yudkowsky][16:30]

the Eliezerverse is more strident about there being lots and lots more stuff like “ReLUs” and “batch normalization” and “transformers” in the design space in principle, and less strident about whether current people are being paid to spend all day looking for them rather than putting their efforts someplace with a nice predictable payoff.

[Christiano][16:31]

yeah, but then don’t you see big wins from the next transformers?

and you think those just keep happening even as fields mature

[Yudkowsky][16:31]

it’s much more permitted in the Eliezerverse than in the Paulverse

[Christiano][16:31]

or you mean that they might slow down because people stop working on them?

[Yudkowsky][16:32]

this civilization has mental problems that I do not understand well enough to predict, when it comes to figuring out how they’ll affect the field of AI as it scales

that said, I don’t see us getting to AGI on Stack More Layers.

there may perhaps be a bunch of stacked layers in an AGI but there will be more ideas to it than that.

such that it would require far, far more than 10X compute to get the same results with a GPT-like architecture if that was literally possible

[Christiano][16:33]

it seems clear that it will be more than 10x relative to GPT

I guess I don’t know what GPT-like architecture means, but from what you say it seems like normal progress would result in a non-GPT-like architecture

so I don’t think I’m disagreeing with that

[Yudkowsky][16:34]

I also don’t think we’re getting there by accumulating a ton of shallow insights; I expect it takes at least one more big one, maybe 2-4 big ones.

[Christiano][16:34]

do you think transformers are a big insight?

(is adding soft attention to LSTMs a big insight?)

[Yudkowsky][16:34]

hard to deliver a verdict of history there

no

[Christiano][16:35]

(I think the intellectual history of transformers is a lot like “take the LSTM out of the LSTM with attention”)

[Yudkowsky][16:35]

“how to train deep gradient descent without activations and gradients blowing up or dying out” was a big insight

[Christiano][16:36]

that really really seems like the accumulation of small insights

[Yudkowsky][16:36]

though the history of that big insight is legit complicated

[Christiano][16:36]

like, residual connections are the single biggest thing

and relus also help

and batch normalization helps

and attention is better than lstms

[Yudkowsky][16:36]

and the inits help (like xavier)

[Christiano][16:36]

you could also call that the accumulation of big insights, but the point is that it’s an accumulation of a lot of stuff

mostly developed in different places

[Yudkowsky][16:37]

but on the Yudkowskian view the biggest insight of all was the one waaaay back at the beginning where they were initing by literally unrolling Restricted Boltzmann Machines

and people began to say: hey if we do this the activations and gradients don’t blow up or die out

it is not a history that strongly distinguishes the Paulverse from Eliezerverse, because that insight took time to manifest

it was not, as I recall, the first thing that people said about RBM-unrolling

and there were many little or not-really-so-little inventions that sustained the insight to deeper and deeper nets

and those little inventions did not correspond to huge capability jumps immediately in the hands of their inventors, with, I think, the possible exception of transformers

though also I think back then people just didn’t do as much SoTA-measuring-and-comparing

[Christiano][16:40]

(I think transformers are a significantly smaller jump than previous improvements)

also a thing we could guess about though

[Yudkowsky][16:40]

right, but did the people who demoed the improvements demo them as big capability jumps?

harder to do when you don’t have a big old well funded field with lots of eyes on SoTA claims

they weren’t dense in SoTA, I think?

anyways, there has not, so far as I know, been an insight of similar size to that last one, since then

[Christiano][16:42]

also 10-100x is still actually surprising to me for transformers

so I guess lesson learned

[Yudkowsky][16:43]

I think if you literally took pre-transformer SoTA, and the transformer paper plus the minimum of later innovations required to make transformers scale at all, then as you tried scaling stuff to GPT-1 scale, the old stuff would probably just flatly not work or asymptote?

[Christiano][16:44]

in general if you take anything developed at scale X and try to scale it way past X I think it won’t work

or like, it will work much worse than something that continues to get tweaked

[Yudkowsky][16:44]

I’m not sure I understand what you mean if you mean “10x-100x on transformers actually happened and therefore actually surprised me”

[Christiano][16:44]

yeah, I mean that given everything I know I am surprised that transformers were as large as a 100x improvement on translation

in that paper

[Yudkowsky][16:45]

though it may not help my own case, I remark that my generic heuristics say to have an assistant go poke a bit at that claim and see if your noticed confusion is because you are being more confused by fiction than by reality.

[Christiano][16:45]

yeah, I am definitely interested to understand a bit better what’s up there

but tentatively I’m sticking to my guns on the original prediction

if you have random 10-20 person teams getting 100x speedups versus prior sota

as we approach TAI

that’s so far from paulverse

[Yudkowsky][16:46]

like, not about this case specially, just sheer reflex from “this assertion in a science paper is surprising” to “go poke at it”. many unsurprising and hence unpoked assertions will also be false, of course, but the surprising ones even more so on average.

[Christiano][16:48]

anyway, seems like a good approach to finding a concrete disagreement

and even looking back at this conversation would be a start for diagnosing who is more right in hindsight

main thing is to say how quickly and in what industries I’m how surprised

[Yudkowsky][16:49]

I suspect you want to attach conditions to that surprise? Like, the domain must be sufficiently explored OR sufficiently economically important, because Paulverse also predicts(?) that as of a few years (3?? 2??? 15????) all the economically important stuff will have been poked with lots of compute already.

and if there’s economically important domains where nobody’s tried throwing $50M at a model yet, that also sounds like not-the-Paulverse?

[Christiano][16:50]

I think the economically important prediction doesn’t really need that much of “within a few years”

like the total stakes have just been low to date

none of the deep learning labs are that close to paying for themselves

so we’re not in the regime where “economic niche > R&D budget”

we are still in the paulverse-consistent regime where investment is driven by the hope of future wins

though paul is surprised that R&D budgets aren’t more larger than the economic value

[Yudkowsky][16:51]

well, it’s a bit of a shame from the Eliezer viewpoint that the Paulverse can’t be falsifiable yet, then, considering that in the Eliezerverse it is allowed (but not mandated) for the world to end while most DL labs haven’t paid for themselves.

albeit I’m not sure that’s true of the present world?

DM had that thing about “we just rejiggered cooling the server rooms for Google and paid back 1/3 of their investment in us” and that was years ago.

[Christiano][16:52]

I’ll register considerable skepticism

[Yudkowsky][16:53]

I don’t claim deep knowledge.

But if the imminence, and hence strength and falsifiability, of Paulverse assertions, depend on how much money all the deep learning labs are making, that seems like something we could ask OpenPhil to measure?

[Christiano][16:55]

it seems easier to just talk about ML tasks that people work on

it seems really hard to arbitrate the “all the important niches are invested in” stuff in a way that’s correlated with takeoff

whereas the “we should be making a big chunk of our progress from insights” seems like it’s easier

though I understand that your view could be disjunctive, of either “AI will have hidden secrets that yield great intelligence,” or “there are hidden secret applications that yield incredible profit”

(sorry that statement is crude / not very faithful)

should follow up on this in the future, off for now though

[Yudkowsky][16:58]

👋

 

Shulman and Yudkowsky on AI progress

 |   |  Analysis, Conversations

 

This post is a transcript of a discussion between Carl Shulman and Eliezer Yudkowsky, following up on a conversation with Paul Christiano and Ajeya Cotra.

 

Color key:

 Chat by Carl and Eliezer   Other chat 

 

9.14. Carl Shulman’s predictions

 

[Shulman][20:30]

I’ll interject some points re the earlier discussion about how animal data relates to the ‘AI scaling to AGI’ thesis.

1. In humans it’s claimed the IQ-job success correlation varies by job, For a scientist or doctor it might be 0.6+, for a low complexity job more like 0.4, or more like 0.2 for simple repetitive manual labor. That presumably goes down a lot with less in the way of hands, or focused on low density foods like baleen whales or grazers. If it’s 0.1 for animals like orcas or elephants, or 0.05, then there’s 4-10x less fitness return to smarts.

2. But they outmass humans by more than 4-10x. Elephants 40x, orca 60x+. Metabolically (20 watts divided by BMR of the animal) the gap is somewhat smaller though, because of metabolic scaling laws (energy scales with 3/4 or maybe 2/3 power, so ).

https://en.wikipedia.org/wiki/Kleiber%27s_law

If dinosaurs were poikilotherms, that’s a 10x difference in energy budget vs a mammal of the same size, although there is debate about their metabolism.

3. If we’re looking for an innovation in birds and primates, there’s some evidence of ‘hardware’ innovation rather than ‘software.’ Herculano-Houzel reports in The Human Advantage (summarizing much prior work neuron counting) different observational scaling laws for neuron number with brain mass for different animal lineages.

We were particularly interested in cellular scaling differences that might have arisen in primates. If the same rules relating numbers of neurons to brain size in rodents (6)

The brain of the capuchin monkey, for instance, weighing 52 g, contains >3× more neurons in the cerebral cortex and ≈2× more neurons in the cerebellum than the larger brain of the capybara, weighing 76 g.

[Editor’s Note: Quote source is “Cellular scaling rules for primate brains.”]

In rodents brain mass increases with neuron count n^1.6, whereas it’s close to linear (n^1.1) in primates. For cortex neurons and cortex mass 1.7 and 1.0. In general birds and primates are outliers in neuron scaling with brain mass.

Note also that bigger brains with lower neuron density have longer communication times from one side of the brain to the other. So primates and birds can have faster clock speeds for integrated thought than a large elephant or whale with similar neuron count.

4. Elephants have brain mass ~2.5x human, and 3x neurons, but 98% of those are in the cerebellum (vs 80% in or less in most animals; these are generally the tiniest neurons and seem to do a bunch of fine motor control). Human cerebral cortex has 3x the neurons of the elephant cortex (which has twice the mass). The giant cerebellum seems like controlling the very complex trunk.

https://nautil.us/issue/35/boundaries/the-paradox-of-the-elephant-brain

Blue whales get close to human neuron counts with much larger brains.

https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons

5. As Paul mentioned, human brain volume correlation with measures of cognitive function after correcting for measurement error on the cognitive side is in the vicinity of 0.3-0.4 (might go a bit higher after controlling for non-functional brain volume variation, lower from removing confounds). The genetic correlation with cognitive function in this study is 0.24:

https://www.nature.com/articles/s41467-020-19378-5

So it accounts for a minority of genetic influences on cognitive ability. We’d also expect a bunch of genetic variance that’s basically disruptive mutations in mutation-selection balance (e.g. schizophrenia seems to be a result of that, with schizophrenia alleles under negative selection, but a big mutational target, with the standing burden set by the level of fitness penalty for it; in niches with less return to cognition the mutational surface will be cleaned up less frequently and have more standing junk).

Other sources of genetic variance might include allocation of attention/learning (curiosity and thinking about abstractions vs immediate sensory processing/alertness), length of childhood/learning phase, motivation to engage in chains of thought, etc.

Overall I think there’s some question about how to account for the full genetic variance, but mapping it onto the ML experience with model size, experience and reward functions being key looks compatible with the biological evidence. I lean towards it, although it’s not cleanly and conclusively shown.

Regarding economic impact of AGI, I do not buy the ‘regulation strangles all big GDP boosts’ story.

The BEA breaks down US GDP by industry here (page 11):

https://www.bea.gov/sites/default/files/2021-06/gdp1q21_3rd_1.pdf

As I work through sectors and the rollout of past automation I see opportunities for large-scale rollout that is not heavily blocked by regulation. Manufacturing is still trillions of dollars, and robotic factories are permitted and produced under current law, with the limits being more about which tasks the robots work for at low enough cost (e.g. this stopped Tesla plans for more completely robotic factories). Also worth noting manufacturing is mobile and new factories are sited in friendly jurisdictions.

Software to control agricultural machinery and food processing is also permitted.

Warehouses are also low-regulation environments with logistics worth hundreds of billions of dollars. See Amazon’s robot-heavy warehouses limited by robotics software.

Driving is hundreds of billions of dollars, and Tesla has been permitted to use Autopilot, and there has been a lot of regulator enthusiasm for permitting self-driving cars with humanlike accident rates. Waymo still hasn’t reached that it seems and is lowering costs.

Restaurants/grocery stores/hotels are around a trillion dollars. Replacing humans in vision/voice tasks to take orders, track inventory (Amazon Go style), etc is worth hundreds of billions there and mostly permitted. Robotics cheap enough to replace low-wage labor there would also be valuable (although a lower priority than high-wage work if compute and development costs are similar).

Software is close to a half trillion dollars and the internals of software development are almost wholly unregulated.

Finance is over a trillion dollars, with room for AI in sales and management.

Sales and marketing are big and fairly unregulated.

In highly regulated and licensed professions like healthcare and legal services, you can still see a licensee mechanically administer the advice of the machine, amplifying their reach and productivity.

Even in housing/construction there’s still great profits to be made by improving the efficiency of what construction is allowed (a sector worth hundreds of billions).

If you’re talking about legions of super charismatic AI chatbots, they could be doing sales, coaching human manual labor to effectively upskill it, and providing the variety of activities discussed above. They’re enough to more than double GDP, even with strong Baumol effects/cost disease, I’d say.

Although of course if you have AIs that can do so much the wages of AI and hardware researchers will be super high, and so a lot of that will go into the intelligence explosion, while before that various weaknesses that prevent full automation of AI research will also mess up activity in these other sectors to varying degrees.

Re discontinuity and progress curves, I think Paul is right. AI Impacts went to a lot of effort assembling datasets looking for big jumps on progress plots, and indeed nukes are an extremely high percentile for discontinuity, and were developed by the biggest spending power (yes other powers could have bet more on nukes, but didn’t, and that was related to the US having more to spend and putting more in many bets), with the big gains in military power per $ coming with the hydrogen bomb and over the next decade.

https://aiimpacts.org/category/takeoff-speed/continuity-of-progress/discontinuous-progress-investigation/

For measurable hardware and software progress (Elo in games, loss on defined benchmarks), you have quite continuous hardware progress, and software progress that is on the same ballpark, and not drastically jumpy (like 10 year gains in 1), moreso as you get to metrics used by bigger markets/industries.

I also agree with Paul’s description of the prior Go trend, and how DeepMind increased $ spent on Go software enormously. That analysis was a big part of why I bet on AlphaGo winning against Lee Sedol at the time (the rest being extrapolation from the Fan Hui version and models of DeepMind’s process for deciding when to try a match).

[Yudkowsky][21:38]

I’m curious about how much you think these opinions have been arrived at independently by yourself, Paul, and the rest of the OpenPhil complex?

[Cotra][21:44]

Little of Open Phil’s opinions are independent of Carl, the source of all opinions

[Yudkowsky: 😆] [Ngo: 😆]

Read more »

Biology-Inspired AGI Timelines: The Trick That Never Works

 |   |  Analysis

– 1988 –

Hans Moravec:  Behold my book Mind Children.  Within, I project that, in 2010 or thereabouts, we shall achieve strong AI.  I am not calling it “Artificial General Intelligence” because this term will not be coined for another 15 years or so.

Eliezer (who is not actually on the record as saying this, because the real Eliezer is, in this scenario, 8 years old; this version of Eliezer has all the meta-heuristics of Eliezer from 2021, but none of that Eliezer’s anachronistic knowledge):  Really?  That sounds like a very difficult prediction to make correctly, since it is about the future, which is famously hard to predict.

Imaginary Moravec:  Sounds like a fully general counterargument to me.

Eliezer:  Well, it is, indeed, a fully general counterargument against futurism.  Successfully predicting the unimaginably far future – that is, more than 2 or 3 years out, or sometimes less – is something that human beings seem to be quite bad at, by and large.

Moravec:  I predict that, 4 years from this day, in 1992, the Sun will rise in the east.

Eliezer: Okay, let me qualify that.  Humans seem to be quite bad at predicting the future whenever we need to predict anything at all new and unfamiliar, rather than the Sun continuing to rise every morning until it finally gets eaten.  I’m not saying it’s impossible to ever validly predict something novel!  Why, even if that was impossible, how could know it for sure?  By extrapolating from my own personal inability to make predictions like that?  Maybe I’m just bad at it myself.  But any time somebody claims that some particular novel aspect of the far future is predictable, they justly have a significant burden of prior skepticism to overcome.

More broadly, we should not expect a good futurist to give us a generally good picture of the future.  We should expect a great futurist to single out a few rare narrow aspects of the future which are, somehow, exceptions to the usual rule about the future not being very predictable.

I do agree with you, for example, that we shall at some point see Artificial General Intelligence.  This seems like a rare predictable fact about the future, even though it is about a novel thing which has not happened before: we keep trying to crack this problem, we make progress albeit slowly, the problem must be solvable in principle because human brains solve it, eventually it will be solved; this is not a logical necessity, but it sure seems like the way to bet.  “AGI eventually” is predictable in a way that it is not predictable that, e.g., the nation of Japan, presently upon the rise, will achieve economic dominance over the next decades – to name something else that present-day storytellers of 1988 are talking about.

But timing the novel development correctly?  That is almost never done, not until things are 2 years out, and often not even then.  Nuclear weapons were called, but not nuclear weapons in 1945; heavier-than-air flight was called, but not flight in 1903.  In both cases, people said two years earlier that it wouldn’t be done for 50 years – or said, decades too early, that it’d be done shortly.  There’s a difference between worrying that we may eventually get a serious global pandemic, worrying that eventually a lab accident may lead to a global pandemic, and forecasting that a global pandemic will start in November of 2019.

Read more »

Visible Thoughts Project and Bounty Announcement

 |   |  News

Soares, Tallinn, and Yudkowsky discuss AGI cognition

 |   |  Analysis, Conversations, Guest Posts

 

This is a collection of follow-up discussions in the wake of Richard Ngo and Eliezer Yudkowsky’s first three conversations (1 and 2, 3).

 

Color key:

  Chat     Google Doc content     Inline comments  

 

7. Follow-ups to the Ngo/Yudkowsky conversation

 

[Bensinger][1:50]  (Nov. 23 follow-up comment)

Readers who aren’t already familiar with relevant concepts such as ethical injunctions should probably read Ends Don’t Justify Means (Among Humans), along with an introduction to the unilateralist’s curse.

 

7.1. Jaan Tallinn’s commentary

 

[Tallinn]  (Sep. 18 Google Doc)

meta

a few meta notes first:

  • i’m happy with the below comments being shared further without explicit permission – just make sure you respect the sharing constraints of the discussion that they’re based on;
  • there’s a lot of content now in the debate that branches out in multiple directions – i suspect a strong distillation step is needed to make it coherent and publishable;
  • the main purpose of this document is to give a datapoint how the debate is coming across to a reader – it’s very probable that i’ve misunderstood some things, but that’s the point;
  • i’m also largely using my own terms/metaphors – for additional triangulation.

 

pit of generality

it feels to me like the main crux is about the topology of the space of cognitive systems in combination with what it implies about takeoff. here’s the way i understand eliezer’s position:

there’s a “pit of generality” attractor in cognitive systems space: once an AI system gets sufficiently close to the edge (“past the atmospheric turbulence layer”), it’s bound to improve in catastrophic manner;

[Yudkowsky][11:10]  (Sep. 18 comment)

it’s bound to improve in catastrophic manner

I think this is true with quite high probability about an AI that gets high enough, if not otherwise corrigibilized, boosting up to strong superintelligence – this is what it means metaphorically to get “past the atmospheric turbulence layer”.

“High enough” should not be very far above the human level and may be below it; John von Neumann with the ability to run some chains of thought at high serial speed, access to his own source code, and the ability to try branches of himself, seems like he could very likely do this, possibly modulo his concerns about stomping his own utility function making him more cautious.

People noticeably less smart than von Neumann might be able to do it too.

An AI whose components are more modular than a human’s and more locally testable might have an easier time of the whole thing; we can imagine the FOOM getting rolling from something that was in some sense dumber than human.

But the strong prediction is that when you get well above the von Neumann level, why, that is clearly enough, and things take over and go Foom.  The lower you go from that threshold, the less sure I am that it counts as “out of the atmosphere”.  This epistemic humility on my part should not be confused for knowledge of a constraint on the territory that requires AI to go far above humans to Foom.  Just as DL-based AI over the 2010s scaled and generalized much faster and earlier than the picture I argued to Hanson in the Foom debate, reality is allowed to be much more ‘extreme’ than the sure-thing part of this proposition that I defend.

[Tallinn][4:07]  (Sep. 19 comment)

excellent, the first paragraph makes the shape of the edge of the pit much more concrete (plus highlights one constraint that an AI taking off probably needs to navigate — its own version of the alignment problem!)

as for your second point, yeah, you seem to be just reiterating that you have uncertainty about the shape of the edge, but no reason to rule out that it’s very sharp (though, as per my other comment, i think that the human genome ending up teetering right on the edge upper bounds the sharpness)

Read more »