July 2019 Newsletter

Hubinger et al.'s “Risks from Learned Optimization in Advanced Machine Learning Systems”, one of our new core resources on the alignment problem, is now available on arXiv, the AI Alignment Forum, and LessWrong.

In other news, we received an Ethereum donation worth $230,910 from Vitalik Buterin — the inventor and co-founder of Ethereum, and now our third-largest all-time supporter!

Also worth highlighting, from the Open Philanthropy Project's Claire Zabel and Luke Muehlhauser: there's a pressing need for security professionals in AI safety and biosecurity.

It’s more likely than not that within 10 years, there will be dozens of GCR-focused roles in information security, and some organizations are already looking for candidates that fit their needs (and would hire them now, if they found them).

It’s plausible that some people focused on high-impact careers (as many effective altruists are) would be well-suited to helping meet this need by gaining infosec expertise and experience and then moving into work at the relevant organizations.

Other updates

Mesa Optimization: What It Is, And Why We Should Care — Rohin Shah's consistently excellent Alignment Newsletter discusses “Risks from Learned Optimization…” and other recent AI safety work.
MIRI Research Associate Stuart Armstrong releases his Research Agenda v0.9: Synthesising a Human's Preferences into a Utility Function.
OpenAI and MIRI staff help talk Munich student Connor Leahy out of releasing an attempted replication of OpenAI's GPT-2 model. (LessWrong discussion.) Although Leahy's replication attempt wasn't successful, write-ups like his suggest that OpenAI's careful discussion surrounding GPT-2 is continuing to prompt good reassessments of publishing norms within ML. Quoting from Leahy's postmortem:

Sometime in the future we will have reached a point where the consequences of our research are beyond what we can discover in a one-week evaluation cycle. And given my recent experiences with GPT2, we might already be there. The more complex and powerful our technology becomes, the more time we should be willing to spend in evaluating its consequences. And if we have doubts about safety, we should default to caution.

We tend to live in an ever accelerating world. Both the industrial and academic R&D cycles have grown only faster over the decades. Everyone wants “the next big thing” as fast as possible. And with the way our culture is now, it can be hard to resist the pressures to adapt to this accelerating pace. Your career can depend on being the first to publish a result, as can your market share.

We as a community and society need to combat this trend, and create a healthy cultural environment that allows researchers to take their time. They shouldn’t have to fear repercussions or ridicule for delaying release. Postponing a release because of added evaluation should be the norm rather than the exception. We need to make it commonly accepted that we as a community respect others’ safety concerns and don’t penalize them for having such concerns, even if they ultimately turn out to be wrong. If we don’t do this, it will be a race to the bottom in terms of safety precautions.
From Abram Demski: Selection vs. Control; Does Bayes Beat Goodhart?; and Conceptual Problems with Updateless Decision Theory and Policy Selection
Vox's Future Perfect Podcast interviews Jaan Tallinn and discusses MIRI's role in originating and propagating AI safety memes.
The AI Does Not Hate You, journalist Tom Chivers' well-researched book about the rationality community and AI risk, is out in the UK.

News and links

Other recent AI safety write-ups: David Krueger's Let's Talk About “Convergent Rationality”; Paul Christiano's Aligning a Toy Model of Optimization; and Owain Evans, William Saunders, and Andreas Stuhlmüller's Machine Learning Projects on Iterated Distillation and Amplification
From DeepMind: Vishal Maini puts together an AI reading list, Victoria Krakovna recaps the ICLR Safe ML workshop, and Pushmeet Kohli discusses AI safety on the 80,000 Hours Podcast.
The EA Foundation is awarding grants for “efforts to reduce risks of astronomical suffering (s-risks) from advanced artificial intelligence”; apply by Aug. 11.
Additionally, if you're a young AI safety researcher (with a PhD) based at a European university or nonprofit, you may want to apply for ~$60,000 in funding from the Bosch Center for AI.

Browse

July 2019 Newsletter

Other updates

News and links

Categories