Soares, Tallinn, and Yudkowsky discuss AGI cognition
This is a collection of follow-up discussions in the wake of Richard Ngo and Eliezer Yudkowsky’s first three conversations (1 and 2, 3).
Color key:
Chat | Google Doc content | Inline comments |
7. Follow-ups to the Ngo/Yudkowsky conversation
Readers who aren’t already familiar with relevant concepts such as ethical injunctions should probably read Ends Don’t Justify Means (Among Humans), along with an introduction to the unilateralist’s curse.
7.1. Jaan Tallinn’s commentary
meta
a few meta notes first:
- i’m happy with the below comments being shared further without explicit permission – just make sure you respect the sharing constraints of the discussion that they’re based on;
- there’s a lot of content now in the debate that branches out in multiple directions – i suspect a strong distillation step is needed to make it coherent and publishable;
- the main purpose of this document is to give a datapoint how the debate is coming across to a reader – it’s very probable that i’ve misunderstood some things, but that’s the point;
- i’m also largely using my own terms/metaphors – for additional triangulation.
pit of generality
it feels to me like the main crux is about the topology of the space of cognitive systems in combination with what it implies about takeoff. here’s the way i understand eliezer’s position:
there’s a “pit of generality” attractor in cognitive systems space: once an AI system gets sufficiently close to the edge (“past the atmospheric turbulence layer”), it’s bound to improve in catastrophic manner;
it’s bound to improve in catastrophic manner
I think this is true with quite high probability about an AI that gets high enough, if not otherwise corrigibilized, boosting up to strong superintelligence – this is what it means metaphorically to get “past the atmospheric turbulence layer”.
“High enough” should not be very far above the human level and may be below it; John von Neumann with the ability to run some chains of thought at high serial speed, access to his own source code, and the ability to try branches of himself, seems like he could very likely do this, possibly modulo his concerns about stomping his own utility function making him more cautious.
People noticeably less smart than von Neumann might be able to do it too.
An AI whose components are more modular than a human’s and more locally testable might have an easier time of the whole thing; we can imagine the FOOM getting rolling from something that was in some sense dumber than human.
But the strong prediction is that when you get well above the von Neumann level, why, that is clearly enough, and things take over and go Foom. The lower you go from that threshold, the less sure I am that it counts as “out of the atmosphere”. This epistemic humility on my part should not be confused for knowledge of a constraint on the territory that requires AI to go far above humans to Foom. Just as DL-based AI over the 2010s scaled and generalized much faster and earlier than the picture I argued to Hanson in the Foom debate, reality is allowed to be much more ‘extreme’ than the sure-thing part of this proposition that I defend.
excellent, the first paragraph makes the shape of the edge of the pit much more concrete (plus highlights one constraint that an AI taking off probably needs to navigate — its own version of the alignment problem!)
as for your second point, yeah, you seem to be just reiterating that you have uncertainty about the shape of the edge, but no reason to rule out that it’s very sharp (though, as per my other comment, i think that the human genome ending up teetering right on the edge upper bounds the sharpness)