Holden Karnofsky on Transparent Research Analyses

 |   |  Conversations

Holden Karnofsky is the co-founder of GiveWell, which finds outstanding giving opportunities and publishes the full details of its analysis to help donors decide where to give. GiveWell tracked ~$9.6 million in donations made on the basis of its recommendations in 2012. It has historically sought proven, cost-effective, scalable giving opportunities, but its new initiative, GiveWell Labs, is more broadly researching the question of how to give as well as possible.

Luke Muehlhauser: GiveWell has gained respect for its high-quality analyses of some difficult-to-quantify phenomena: the impacts of particular philanthropic interventions. You’ve written about your methods for facing this challenge in several blog posts, for example (1) Futility of standardized metrics: an example, (2) In defense of the streetlight effect, (3) Why we can’t take expected value estimates literally, (4) What it takes to evaluate impact, (5) Some considerations against more investment in cost-effectiveness estimates, (6) Maximizing cost-effectiveness via critical inquiry, (7) Some history behind our shifting approach to research, (8) Our principles for assessing research, (9) Surveying the research on a topic, (10) How we evaluate a study, and (11) Passive vs. rational vs. quantified.

In my first question I’d like to ask about one particular thing you’ve done to solve one particular problem with analyses of difficult-to-quantify phenomena. The problem I have in mind is that it’s often difficult for readers to know how much they should trust a given analysis of a difficult-to-quantify phenomenon. In mathematics research it’s often pretty straightforward for other mathematicians to tell what’s good and what’s not. But what about analyses that combine intuitions, expert opinion, multiple somewhat-conflicting scientific studies, general research in a variety of “soft” sciences, and so on? In such cases it can be difficult for readers to distinguish high-quality and low-quality analyses, and it can be hard for readers to tell whether the analysis is biased in particular ways.

One thing GiveWell has done to address this problem is to strive for an unusually high degree of transparency in its analyses. For example, your analyses often include:

  • Good summaries and bolding to make analyses more skimmable
  • External evaluations
  • Public conversation notes with credentialed experts
  • Detailed discussion of what you think and why
  • Lots of citations, direct quotes from studies, and footnotes
  • Archived copies of cited websites and papers

Do you agree with this interpretation of what you’re doing? What are some other things you do to make it easier for readers to tell how they should update their views in response to your analyses?

Holden Karnofsky: Yes, that’s a good characterization of what we’re trying to do.

I’d say that our transparency efforts fall into two categories: “support” (making sure that all the relevant information about our decision-making processes is available to those who are seeking it) and “outreach” (proactively raising topics and views and inviting people to question us about them). We’ve found that “support” is often not enough: in order to get meaningful engagement on our work and help people become confident in it, we need to have high-level conversations in which we invite back-and-forth. Otherwise, people often feel that they don’t have time to evaluate everything we’ve written, and don’t engage with it.

Charity reviews, and most other pages on our main website (not the blog), tend to be in a “support” framework. The core of this framework is that for every statement we make, it should be clear what the basis of the statement is. If the basis of the statement is simply “our guess,” that should be clear (so we often use language like “We’d guess that ___” rather than simply stating our beliefs without such a qualifier). If the basis of the statement is a more complex chain of reasoning that doesn’t conceptually fit in the space, then such reasoning should be laid out elsewhere and linked to in the context of our statement. And if the basis of our statement is a document, website, or conversation, we use a footnote. Our conventions for footnotes go beyond what is common in academic papers: we generally aim to include the key quote (not just a citation) in a footnote, and we take steps (such as Webcite) to ensure that the original document and full context can be accessed by the reader even in the event that the original host of the source takes it down or changes it. We also try to structure charity reviews in logical ways and provide summary content at different levels of detail, so that it’s always relatively quick to find out what we think on a given topic and determine how to drill down on a particular piece of it.

Most of the actual engagement we get with our views comes via “outreach” methods, particularly the blog and conference calls and in-person events (as well as one-on-one’s with major donors to GiveWell and our top charities). The blog is probably the single most effective way of getting people to engage with, understand, and trust our research. Via the discussions these mediums create, we gain better understandings of where people’s biggest questions and disagreements are, and we try to address these in further blog posts and FAQs. “Outreach” methods are organized around “What people want to know & what we want to tell them” rather than “What we think about charity X or intervention Y” and are often more informal, though they will link back to “support” content as appropriate.

In both frameworks, we rarely publish a piece of content without going over it and asking, “What are the key points here? Can they be summarized at the top, and/or bolded in the body?”

Luke: You identify GiveWell as part of the effective altruism movement, and you also write that “Effective altruism is unusual and controversial” — a contrarian position, we might say.

Robin Hanson notes that “On average, contrarian views are less accurate than standard views… Honest contrarians who expect reasonable outsiders to give their contrarian view more than normal credence should point to strong outside indicators that correlate enough with contrarians tending more to be right.”

Do you agree with Hanson’s analysis in that post? If so, then do you think GiveWell typically expects reasonable outsiders to give your contrarian views higher than normal credence, and are you able to point to “strong outside indicators that correlate enough with contrarians tending to be more right”?

Holden: I don’t fully agree with that post. It implies that contrarian arguments need “strong outside indicators”; I think of a good argument as a combination of inside- and outside-view arguments, and enough strength in one area can make up for serious weakness in another. I’d frame it differently. I’d say that anyone who is trying to change minds through rational persuasion needs to think through whose minds they’re trying to change and how much mental energy such people can reasonably be expected to put into evaluating their arguments (bearing in mind that reasonably strong low-detail arguments can create enough intrigue to get the audience to increase their mental energy investment and understand higher-detail arguments). What I agree with in Robin’s post is that there is a non-trivial hurdle to overcome when espousing contrarian views, and (at least initially) a limited amount of mental energy one can expect the audience to invest in one’s argument.

In our case, we’ve so far largely targeted people who already buy into effective altruism. So we don’t deal with much of a “contrarian” dynamic for that specifically. Where I think we do have to “overcome the hurdles to contrarianism” is where we recommend charities that people haven’t necessarily heard of, as opposed to charities that elite conventional wisdom recommends. In making the case for such charities, we do have to think about how to put a low-detail case up front rather than just refer people to our lengthy body of research as a whole. We try to follow the principle of “summarize our main points, and make it clear how to drill down on any particular one of them.” That allows people to gain confidence in us via spot-check rather than by going through everything we’ve written (which I believe very few people have done).

In addition, tools like the blog can convince people over time that we’re generally trustworthy and intelligent on the relevant issues, which can allow them to buy into our claims without evaluating all the details of them. (I perceive many people as having followed a similar path to supporting MIRI.)

But even with regard to recommending not-widely-known charities, we have a lot of wind at our backs by virtue of whom we’re targeting. Much of our audience consists of people who already buy into effective altruism and already feel that the quality of dialogue around where to give is extraordinarily low. They also often come to us via referrals from trusted friends or media sources. So they arrive at our site very ready to believe that our claims are plausible.

As a side point, I think effective altruism falls somewhere on the spectrum between “contrarian view” and “unusual taste.” My commitment to effective altruism is probably better characterized as “wanting/choosing to be an effective altruist” than as “believing that effective altruism is correct.” I think that relieves some of the burden of having to “evaluate” effective altruism, though certainly not all of it.

Luke: Recently you’ve begun a new program within GiveWell called GiveWell Labs, which aims to investigate the effectiveness of causes that are even more difficult to analyze than e.g. global health interventions. In global health, you often get to learn from medical science, multiple randomized controlled trials for specific interventions, and so on. But it’s much harder to investigate the cost effectiveness of interventions that aim to improve science, improve political or economic policy, reduce catastrophic risks, etc.

So I would imagine that developing GiveWell Labs has forced you to develop additional tools for analyzing complex phenomena, and for communicating your analyses to others. What are some things you’ve learned so far in the process of developing GiveWell Labs?

Holden: We’re certainly developing new methods of analysis and evaluation. Our working framework for shallow investigations replaces “proven, cost-effective, scalable charities” with “important, tractable, non-crowded causes” in terms of what we’re looking for. Much of our work so far has been more qualitative in nature, aiming to clarify and understand the basic landscape of causes rather than assess the extent to which approaches are “proven.” And we’ve also been doing a fair amount of “immersion” – trying to learn broadly about a field without pre-choosing our set of critical questions and goals (for example, as I’ve started to investigate scientific research, I’ve read about half of a biology textbook and taken multiple chances to attend meetings of scientists). Relative to our traditional work, more of our learning so far on the Labs front comes from conversations as opposed to studies, and as a result our emphasis on (and volume of) conversation notes has increased dramatically.

With that said, it’s still very early. I think we’re fairly far from having concrete recommendations, and it’s when we have concrete recommendations that it becomes much easier and more productive to work on engaging and convincing outsiders. We have certainly made attempts to put out what we’ve learned, using the same basic tools I mentioned before: “support” oriented pages (e.g., shallow investigation writeups) and “outreach” oriented communications (e.g., Labs-focused blog posts and Labs-focused in-person Q&A’s).

Luke: I think GiveWell provides a helpful model of how to do valuable analytic research on difficult-to-quantify phenomena. Are there other groups doing a similar thing (but perhaps on other subjects) that you admire, or that you think are doing some important things right, or that you think are worth imitating in certain ways? You’ve praised the Cochrane Collaboration in the past… are there others?


  • We’re definitely fans of Cochrane. Their work is consistently “transparent” in the sense that one can read a review, get a sense of the big picture, know where to drill down on anything one wants to drill on, and ultimately – if one wants – answer just about any question about where their conclusions are coming form.
  • We’ve been impressed with the usefulness and, often, transparency of the Center for Global Development‘s work (more at this blog post).
  • I’m generally a fan of Nate Silver. While I wish he disclosed the full details of his models, I usually feel like I have a pretty good idea of what he’s conceptually doing, without having to put much proactive effort into understanding it.
  • While there are many others whose work we use and/or enjoy, those are the only people that jump to mind in terms of “consistently providing transparent, convincing, reasonable analysis of difficult-to-analyze topics and/or topics relevant to GiveWell’s work.” I often see isolated cases of such work, such as particularly good academic papers.

Luke: Why do you think the sort of transparency you’re providing is relatively rare, given that it seems to have some credibility benefits?

Holden: The sort of transparency we’re providing takes a lot of work. Some of this is because our approach is relatively rare – for example, we’ve had to iterate a lot (and deal with some awkward situations) on being able to share notes from conversations. We’ve spoken to some organizations that seem potentially interested in increasing transparency, but with the processes and relationships they’ve built up, getting all the third parties they work with to buy in would be a huge project and struggle. It’s easier for us because we’ve put this goal in our DNA from the beginning, but even for us the challenge of getting third parties to be comfortable with what we’re doing can be significant.

Even if we weren’t dealing with resistance from third parties, the time we put into writing up our thoughts would be a significant chunk of the total time we spend on research. Maybe half. So in some sense it seems like we could learn twice as much if we didn’t feel the need to share our learning widely.

With that said, I think the benefits are significant as well, and I think they go beyond the credibility and influence boosts and feedback we get. Just the process of writing up and supporting our thinking often makes us think more clearly. We used to wait until our research felt “done” to write it up, but we found that the process of writing it up constantly caused us to rethink our conclusions and reopen investigations, as issues that had faded into the backgrounds of our minds re-emerged while we were trying to make our case. Now, we try to write things up “as” we research them, for exactly this reason. Writing things up for a general audience also means that our old material is easier for new employees (and long-running employees with not-so-great memories) to absorb, and so we lose less knowledge internally.

Even if we were allocating all of the “money moved” ourselves (rather than making giving recommendations to others) and even if no outsiders could ever see what we wrote, I’d still want to put a great deal of time – maybe almost as much as we do now – into creating clear, supported writeups and summaries.

Luke: Thanks, Holden!