Writing down something I’ve found myself repeating in different conversations:
If you’re looking for ways to help with the whole “the world looks pretty doomed” business, here’s my advice: look around for places where we’re all being total idiots.
Look for places where everyone’s fretting about a problem that some part of you thinks it could obviously just solve.
Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better.
Then do it better.
For a concrete example, consider Devansh. Devansh came to me last year and said something to the effect of, “Hey, wait, it sounds like you think Eliezer does a sort of alignment-idea-generation that nobody else does, and he’s limited here by his unusually low stamina, but I can think of a bunch of medical tests that you haven’t run, are you an idiot or something?” And I was like, “Yes, definitely, please run them, do you need money”.
I’m not particularly hopeful there, but hell, it’s worth a shot! And, importantly, this is the sort of attitude that can lead people to actually trying things at all, rather than assuming that we live in a more adequate world where all the (seemingly) dumb obvious ideas have already been tried.
Or, this is basically my model of how Paul Christiano manages to have a research agenda that seems at least internally coherent to me. From my perspective, he’s like, “I dunno, man, I’m not sure I can solve this, but I also think it’s not clear I can’t, and there’s a bunch of obvious stuff to try, that nobody else is even really looking at, so I’m trying it”. That’s the sort of orientation to the world that I think can be productive.
Or the shard theory folks. I think their idea is basically unworkable, but I appreciate the mindset they are applying to the alignment problem: something like, “Wait, aren’t y’all being idiots, it seems to me like I can just do X and then the thing will be aligned”.
I don’t think we’ll be saved by the shard theory folk; not everyone audaciously trying to save the world will succeed. But if someone does save us, I think there’s a good chance that they’ll go through similar “What the hell, are you all idiots?” phases, where they autonomously pursue a path that strikes them as obviously egregiously neglected, to see if it bears fruit.
Contrast this with, say, reading a bunch of people’s research proposals and explicitly weighing the pros and cons of each approach so that you can work on whichever seems most justified. This has more of a flavor of taking a reasonable-sounding approach based on an argument that sounds vaguely good on paper, and less of a flavor of putting out an obvious fire that for some reason nobody else is reacting to.
I dunno, maybe activities of the vaguely-good-on-paper character will prove useful as well? But I mostly expect the good stuff to come from people working on stuff where a part of them sees some way that everybody else is just totally dropping the ball.
In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone’s being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself.
That’s where I predict the good stuff will come from.
And if you don’t see any such ways?
Then don’t sweat it. Maybe you just can’t see something that will help right now. There don’t have to be ways you can help in a sizable way right now.
I don’t see ways to really help in a sizable way right now. I’m keeping my eyes open, and I’m churning through a giant backlog of things that might help a nonzero amount—but I think it’s important not to confuse this with taking meaningful bites out of a core problem the world is facing, and I won’t pretend to be doing the latter when I don’t see how to.
Like, keep your eye out. For sure, keep your eye out. But if nothing in the field is calling to you, and you have no part of you that says you could totally do better if you deconfused yourself some more and then handled things yourself, then it’s totally respectable to do something else with your hours.
If you don’t have an active sense that you could put out some visibly-raging fires yourself (maybe after skilling up a bunch, which you also have an active sense you could do), then I recommend stuff like cultivating your ability to get excited about things, and doing other cool stuff.
Sure, most stuff is lower-impact than saving the world from destruction. But if you can be enthusiastic about all the other cool ways to make the world better off around you, then I’m much more optimistic that you’ll be able to feel properly motivated to combat existential risk if and when an opportunity to do so arises. Because that opportunity, if you get one, probably isn’t going to suddenly unlock every lock on the box your heart hides your enthusiasm in, if your heart is hiding your enthusiasm.
See also Rob Wiblin’s “Don’t pursue a career for impact — think about the world’s most important, tractable and neglected problems and follow your passion.”
Or the Alignment Research Field Guide’s advice to “optimize for your own understanding” and chase the things that feel alive and puzzling to you, as opposed to dutifully memorizing other people’s questions and ideas. “[D]on’t ask “What are the open questions in this field?” Ask: “What are my questions in this field?””
I basically don’t think that big changes come from people who aren’t pursuing a vision that some part of them “believes in”, and I don’t think low-risk, low-reward, modest, incremental help can save us from here.
To be clear, when I say “believe in”, I don’t mean that you necessarily assign high probability to success! Nor do I mean that you’re willing to keep trying in the face of difficulties and uncertainties (though that sure is useful too).
English doesn’t have great words for me to describe what I mean here, but it’s something like: your visualization machinery says that it sees no obstacle to success, such that you anticipate either success or getting a very concrete lesson.
The possibility seems open to you, at a glance; and while you may suspect that there’s some hidden reason that the possibility is not truly open, you have an opportunity here to test whether that’s so, and to potentially learn why this promising-looking idea fails.
(Or maybe it will just work. It’s been known to happen, in many a scenario where external signs and portents would have predicted failure.)