In 2019, we wrote a blog post about how we think about the “bar” for our giving and how we compare different kinds of interventions to each other using back-of-the-envelope calculations, all within the realm of what we now call Global Health and Wellbeing (GHW). This post updates that one and:
- Explains how we previously compared health and income gains in comparable units. In short, we use a logarithmic model of the utility of income, so a 1% change in income is worth the same to everyone, and a dollar of income is worth 100x more to someone who has 100x less. We measure philanthropic impact in units of the welfare gained by giving a dollar to someone with an annual income of $50,000, which was roughly US GDP per capita when we adopted this framework. Under the logarithmic model, this means we value increasing 100 people’s income by 1% (i.e. a total of 1 natural log unit increase in income) at $50,000. We have previously also valued averting a disability-adjusted life year (DALY; roughly, a year of healthy life lost) at $50,000, so we valued increasing income by one natural-log unit as equal to averting 1 DALY. This would imply that a charity that could avert a DALY for $50 would have a “1,000x” return because the benefits would be $50,000 relative to the costs of $50. (More)
- Reviews our previous “bar” for what level of cost-effectiveness a grant needed to hit to be worth making. Overall, having a single “bar” across multiple very different programs and outcome measures is an attractive feature because equalizing marginal returns across different programs is a requirement for optimizing the overall allocation of resources1, and we are devoted to doing the most good possible with our giving. Prior to 2019, we used a “100x” bar based on the units above, the scalability of direct cash transfers to the global poor, and the roughly 100x ratio of high-income country income to GiveDirectly recipient income. As of 2019, we tentatively switched to thinking of “roughly 1,000x” as our bar for new programs, because that was roughly our estimate of the unfunded margin of the top charities recommended by GiveWell (which we used to be part of and remain closely affiliated with), and we thought we would be able to find enough other opportunities at that cost-effectiveness to hit our overall spending targets. (More)
- Updates our ethical framework to increase the weight on life expectancy gains relative to income gains. We’re continuing to use the log income utility model, but, after reviewing several lines of evidence, we’re doubling the weight on health relative to income in low-income settings, so we will now value a DALY at 2 natural log units of income or $100,000. We’re also updating how we measure the DALY burden of a death; our new approach will accord with GiveWell’s moral weights, which value preventing deaths at very young ages differently than implied by a DALY framework. (More)
- Articulates our tentative “bar” for giving going forward, of roughly 1,000x (which is ~20% lower than our old bar given new units – explanation in footnote2). The (offsetting) changes in the bar come from new units, our available assets growing, more sophisticated modeling of how we expect cost-effectiveness and asset returns to interact over time, growth in GiveWell’s other funding sources, and slightly increased skepticism about our ability to spend as much as needed at much higher levels of cost-effectiveness. Due to the increased assets and lower bar, we’re planning to substantially increase our funding for GiveWell’s recommended charities, which we will write more about next week. However, we still expect most of our medium-term growth in GHW to be in new causes that can take advantage of the leveraged returns to research and advocacy, and could imagine that we’ll eventually find enough room for more funding in those interventions that we will need to raise the bar again. (More)
This post focuses exclusively on how we value different outcomes for humans within Global Health and Wellbeing; when it comes to other outcomes like farm animal welfare or the far future, we practice worldview diversification instead of trying to have a single unified framework for cost-effectiveness analysis. We think it’s an open question whether we should have more internal “worldviews” that are diversified over within the broad Global Health and Wellbeing remit (vs everything being slotted into a unified framework as in this post).
This post is unusually technical relative to our others, and we expect it may make sense for most of our usual blog readers to skip it.
1. How we previously compared health and income
We often use “marginal value of a dollar of income to someone with baseline income of $50K” as our unified outcome variable, so by definition giving a dollar to someone with $50k of annual income has a cost-effectiveness of 1x.3 We value income using a logarithmic utility function, so $100 in extra income for a rich person generates as much utility as $1 in extra income for a person with 1/100th the income of that rich person. In order for a grant’s cost-effectiveness to be, say, 1000x, it must be 1000 times more cost-effective than giving a dollar to someone with $50k of annual income.
This can be confusing because GiveWell, which we used to be part of and remain closely affiliated with, uses contributions to GiveDirectly as their baseline unit of “1x.” In our framework, those contributions are very roughly worth 100x, because GiveDirectly recipients are roughly 100x poorer (after overhead) than our $50K income benchmark. This section of our 2019 blog post reviews how our logarithmic income model works and the “100x” calculation.
We quantify health outcomes using disability-adjusted life years (DALYs). The DALY burden of a disease is the sum of the years of life lost (YLL) due to the disease, and the weighted years lived with disability (YLD) due to the disease. If you save the life of someone who goes on to live for 10 years, your intervention averted 10 YLLs. If you prevent an illness that would have caused someone to live in a condition 20% as bad as death for 10 years, your intervention averted 20%*10=2 YLDs. (We don’t necessarily endorse the disability weights used to measure YLDs, and in principle we might prefer other methodologies, such as quality-adjusted life years (QALYs) or just focusing on YLLs. We use DALYs because global-health data is much more widely available in DALYs than in other frameworks, especially from the Global Burden of Disease project.)
The health interventions that we support are primarily lifesaving interventions (with the exception of deworming where the modeled impacts run entirely through economic outcomes) – so although we talk in terms of DALYs, most of our health impact is in YLLs. (For instance, according to the GBD, ~80% of all DALYs in Sub-Saharan Africa are from YLLs, and amongst under-5 children the same figure is 97%. For malaria DALYs the share from YLLs is even higher, at 94% overall and 98% amongst under-5 children.) We also sometimes use quasi-disability-weights for valuing other outcomes (e.g., the harm of a year of prison).
There are different approaches to calculating the YLLs foregone with a death at a given age. For instance, the life-expectancy tables in Kenya suggest that an average child malaria death shortens the child’s life by 68 years (i.e. that is the average remaining life expectancy of a 0-5 year old in Kenya).4 This is not a floor on plausible YLLs – one could imagine that those prevented from dying by some intervention are unusually sick relative to the broader population, and accordingly not likely to live as long as a population life table would suggest – and as explained below GiveWell uses moral weights for child deaths that would be consistent with assuming 51 years of foregone life in the DALY framework (though that is not how they reach the conclusion). On the other extreme, the GBD takes a normative approach to life expectancy, saying in effect that everyone’s life expectancy at birth should be 88 years.5 Therefore any death under age 5 has a DALY burden of at least 84 years under the GBD approach.6 We have previously been inconsistent in this regard – following the methodology of the GBD in much of our own analysis, while deferring to GiveWell’s approach (and moral weights) on their recommendations. (In the analysis further below of our current bar, we assume for simplicity that our status quo approach splits the difference between the GiveWell and GBD approaches, and uses expected life years remaining for someone who lives to age 5 based on national-level life tables for Kenya.) Below, we explain our plan to follow GiveWell’s approach more closely going forward.
Historically we haven’t written much about how we came to value a DALY at $50,000 (or equivalently to one unit of log-income), which is less than many other actors would suggest. We don’t have a well-documented account of this historical choice, but the recollection of one of us (Alexander) is:
- We had seen numbers like $50K cited in the (older) health economics literature. It was also close to the British government’s widely-cited cost-effectiveness threshold of £20k-£30k (roughly $50k as of 2007; less now) per QALY (though that is more based on an opportunity cost frame than a valuation frame).
- A $50K DALY valuation roughly reconciled our relative valuation on the life-saving GiveWell top charities against GiveDirectly (i.e., we had GiveDirectly at 100x in our units and a cost per DALY averted of ~$50 for GiveWell top charities, which at $50K/DALY, implied a 1,000x ROI, or a ~10x difference with GiveDirectly) with GiveWell’s (which thought their life-saving charities were ~10x as cost-effective as GiveDirectly at the time).7
- Given that we were already prioritizing health heavily relative to income in terms of the total set of interventions we were supporting, opting for a lower valuation than some other parts of the literature seemed conservative.
- Using a number that was significantly larger than GDP per capita (which was ~$50K in the US at the time we adopted this valuation) implied there was more value at stake than total resources in the economy, and that seemed wrong to me at the time. We now think that I was mistaken and there’s no in-principle reason there can’t be substantially more value at stake than the sum of the world economy.
2. Our previous “bar”
It is useful for us to have a single “bar” across multiple very different programs, years, and outcome measures, because equalizing marginal returns across different programs and across time is a requirement for optimizing the overall allocation of resources, and our mission is to give as effectively as we can. The basic idea of this “bar” is that it tells us what level of cost-effectiveness of grant is “good enough” to justify spending on some specific grant or program vs. saving for future years or allocating more funding to another program instead. If instead we set different bars across programs or years (in present value terms, i.e., after adjusting for investment returns), then that would mean we could have more impact by changing the allocation of resources (to put more into the program with the higher bar and less into the program with the lower bar), and we’d, in principle, want to do that. (However, in practice, we often see a case for practicing worldview diversification.)
Prior to 2019, we often used a “100x” bar, based on the units above and the very roughly 100x ratio of $50K to GiveDirectly recipient income (net of transaction costs). We thought “that such giving was quite cost-effective and likely extremely scalable and persistently available, so we should not generally make grants that we expected to achieve less benefit per dollar than that.”
As of 2019, we switched to tentatively thinking of “roughly 1,000x” as our bar for new programs, because that was roughly our estimate of the unfunded margin of the GiveWell top charities, and we thought we would be able to find enough other opportunities at that cost-effectiveness to hit our overall spending targets. We wrote: “Overall, given that GiveWell’s numbers imply something more like “1,000x” than “100x” for their current unfunded opportunities, that those numbers seem plausible (though by no means ironclad), and that they may find yet-more-cost-effective opportunities in the future, it looks like the relevant “bar to beat” going forward may be more like 1,000x than 100x.” However, that bar was rough, and we never made it very precise in large part because we don’t expect to be able to make back-of-the-envelope calculations that could reliably distinguish between, say, 800x and 1,000x.
3. New moral weights
We now think our previous approach to valuing health placed too little value on lifesaving interventions relative to income interventions. Our new approach values a DALY averted twice as highly, equal to a 2-unit (rather than 1-unit) increase in the natural log of any individual’s income. (This is equivalent to increasing 200 people’s incomes by 1% – i.e., in our favored units, equal to $100K in units of marginal dollars to individuals making $50K.) This updated value is more consistent with empirical estimates of beneficiary preferences, the subjective wellbeing literature, and the practice of other actors in the field.
These are all judgment calls subject to uncertainty, and we could readily imagine revising our weights again in the future based on further argument.
3.1 Beneficiary preferences
There is a large body of research on how people tend to trade off mortality risks against income gains, in particular the Value of a Statistical Life (VSL) literature. Some analyses in this literature use the stated preferences of individuals – i.e. responses to survey questions like, “would you rather reduce your risk of dying this year by 1%, or increase your income this year by $500?” Others use the revealed preferences of individuals: are the study subjects, on average, willing to take a job with a 1% higher rate of annual mortality, in exchange for $500 higher annual income?8
The research findings are clearest in high-income countries, where they tend to find that respondents value a year of life expectancy 2.5 to 4 times more than annual income.9 (Since these valuations are mostly based on marginal tradeoffs, and since we model utility as a logarithmic function of income, we can interpret these findings to say that “respondents value a year of life expectancy as much as the utility gained from increasing income by 2.5 to 4 natural-log units.”)
In low-income countries, the evidence is sparser and the findings vary widely.10 For example, in this chart we plot all the estimates found by the literature search in Robinson et al. 2019’s meta-analysis – they searched for all VSL analyses, whether stated or revealed, in any country that had been classified as low- or middle-income in the last 20 years.11 (We divide the VSLs by adult12 life expectancy to get the VSLY – value of a statistical life-year. We express this in terms of local income, which tells us how many units of log-income are worth as much to the respondents as an extra year of life expectancy. We plot stated-preference studies as circles, and revealed-preference studies as diamonds.)
As you can see, the results vary extremely widely. One paper (Mahmud 2009) finds that subjects in rural Bangladesh traded off mortality risk against income in a way that suggests they valued a year of life expectancy at, very roughly, just 62% of annual income. Another paper (Qin et al. 2013) finds that subjects in China (who reported only $840 in annual per-capita income) valued a year of life expectancy at roughly 7x annual income.
In the face of this huge variation, some sources13 recommend estimating LMIC preferences via an “elasticity” approach: statistically estimate a function mapping VSL to income, anchoring it off the better-validated VSL figures in high-income countries. This elasticity approach is reviewed in Appendix A. Mainstream versions of it predict that individuals at the global poverty line would trade off 1 year of life expectancy for anywhere between 0.5x-4x of a year’s income14 (which we can interpret as 0.5 to 4 units of log-income).
Our beneficiaries (especially for lifesaving interventions) are mostly in low and lower middle income countries, and for now we’re focusing on this context in setting our moral weights. (As we’ll discuss in Appendix A, there are empirical and theoretical reasons to think that the exchange rate at which people trade off mortality risks against income gains differs systematically across income levels, with richer people valuing mortality more relative to income.)
Rather than estimate beneficiary preferences based on the academic literature, why not ask beneficiaries directly? In 2019, GiveWell commissioned a survey among individuals who are demographically similar to their beneficiaries: families in rural Kenya and Ghana, with annual consumption of roughly $300 per capita.15 IDInsight conducted this survey, and their analysis suggested that these respondents would value an extra year of life expectancy as much as 2.1-3.8 years of income, which we can interpret as 2.1-3.8 units of log-income. (See GiveWell’s post for a discussion of the limitations of this study).16
3.2 Subjective wellbeing
Research on subjective wellbeing suggests that we can improve life satisfaction by increasing incomes, but it also seems to indicate that such gains are small compared to the baseline wellbeing experienced by individuals even at low levels of income. We interpret this as suggestive evidence that a year of life expectancy is worth substantially more than 1 unit of log-income.
For example, this chart from Stevenson and Wolfers 201317 attempts to harmonize different surveys into one measure of “life satisfaction”. It suggests that you’d have to increase income by a factor of at least 64 in order to double reported life satisfaction on the harmonized scale (starting from the baseline level of life satisfaction experienced by the average individual at the global poverty line).18 Note, of course, that the units of this scale are somewhat arbitrary, so you have to add some strong assumptions to think that a doubling on this scale indicates a doubling in actual subjective wellbeing.19
We could use these “life satisfaction” units to estimate an exchange rate between increases in life expectancy and increases in income. The chart suggests that, for someone near the global poverty line, you’d have to increase income by roughly 64x in order to get twice as much life satisfaction at any moment. You could conclude that a 64x increase in income (for one year) is worth as much life satisfaction as an extra year of life. In that case, a logarithmic utility function suggests valuing a DALY at roughly 4 units of log-income.20
We don’t want to over-interpret this – especially since the measure of “life satisfaction” uses a somewhat arbitrary scale and you could get different results depending on your assumptions about how reported satisfaction translates to actual subjective wellbeing.21 But it does seem like suggestive evidence that the old weights placed too much weight on income, and too little value on saving lives.
3.3 Other actors’ values
Doubling the weight we place on health also brings us into better alignment with other actors in the field. For example, GiveWell’s implied valuation of a YLL from preventing an adult death from malaria is 1.6 units of marginal log-income.22 The Lancet Commission on Investing in Health recommended valuing DALYs in LMICs at 2.3 times per-capita GDP.23 Until recently, the World Health Organization recommended that health interventions be considered “cost-effective” if the cost of averting a DALY was less than 3x per-capita GDP.24 Other actors in high-income countries use widely-ranging methodologies; some would give numbers as high as 4x per-capita income25 or as low as 0.7x per-capita income.26
In particular, we don’t want our moral weights to stray too far from GiveWell’s. This is substantially for pragmatic reasons: most of our lifesaving interventions run through them, and a statement of our valuations that was out of line with our spending on the GiveWell recommendations doesn’t seem like it would be accurate. But it is also partially due to some epistemic deference: GiveWell’s moral weights are the result of thoughtful deliberative processes, and we share some meta-ethical and epistemic approaches with them. While we are generally aware of each other’s reasoning, if we invested more time in understanding each other’s thinking, we expect we would likely come closer to agreeing on moral weights. Thus, some preemptive deference to their moral weights makes sense to us.
3.4 Aggregating these considerations
The literature on subjective wellbeing, and our attempts to estimate beneficiary preferences, both suggest to us that we should rate lifesaving impacts significantly more highly than under our old weights. On the other hand, pushing in favor of a less aggressive increase:
- We don’t want to be too far off from GiveWell’s current valuation of 1.6 units of log-income.
- The theoretical arguments in Appendix A suggest that high DALY valuations in low-income settings would be inconsistent with developed-world VSL evidence.
- We don’t want to change our moral weights too much in any one year, to avoid “whipsawing” if we later determine that this change was mistaken.
- Given that we already prioritize health interventions more highly than other philanthropists, erring towards a lower valuation seems “conservative.”
There is huge uncertainty/disagreement across and between lines of evidence – including between us and on the broader GHW cause prioritization research team – and any given choice of ultimate valuations seems fairly arbitrary, so we also prefer a visibly rough/round number that reflects the arbitrariness/uncertainty.
Taking all of these considerations together, we’re doubling our DALY valuation to $100,000, i.e. 2 units of log-income. We expect to continue to revisit this in future years and could readily envision major further updates.
3.5 Measurement of DALYs
We’re also switching our approach to be more consistent with GiveWell’s framework in how we translate deaths into DALYs. GiveWell assigns moral weights to deaths at various ages, rather than to DALYs. But we can use their moral weights to derive a mapping of deaths to DALYs, by dividing GiveWell’s moral weight for each death by GiveWell’s moral weight for a year lived with disability (which is defined by WHO so as to be equivalent to a DALY).27
The resulting model cares about child mortality more than adult mortality, but not by as much as remaining-population-life-expectancy would suggest. For example, GiveWell places 60% more weight on a child malaria death than on an adult death, and we can fairly straightforwardly interpret their process as counting an average of 32 DALYs per adult malaria death,28 so the GiveWell-based DALY model would implicitly count 32*160% = 51 DALYs for an under-5 malaria death. In contrast, a direct remaining-population-life-expectancy approach in Kenya would count 68 DALYs for an under-5 malaria death29, and the Global Burden of Disease approach (explained above) would count more than 84 DALYs.
GiveWell has written about the process for reaching their current moral weights here.
We see a normative case for the Global Burden of Disease’s uniform global approach to DALY attribution, but given our commitment to maximizing (expected) counterfactual impact, we think the national life table approach represents a plausible upper bound on attributable DALYs, and even those seem aggressive as an estimate of counterfactual lifespan for children whose lives are saved on the margin (who are presumably less advantaged and less healthy than the national average). Overall, we’re not sure where precisely this consideration should leave us, but it seems to argue for lower numbers.
We also haven’t reached any settled thoughts on the impact of population ethics or the second-order consequences of saving a life (e.g., on economic or population growth) on how to translate between deaths and DALYs.
For now, in order to be more consistent in our practices, we’re going to defer to GiveWell and start to use the number of DALYs that would be implied by extrapolating their moral weights. (In practice, we already defer to GiveWell for their own recommendations, so this would mainly change how we use GBD figures in our BOTECs for other grants, especially in science. In the section immediately below comparing our new values to old values, we assume for simplicity that we were using the national life table approach, which splits the difference between the GiveWell approach and the GBD in describing our status quo practices.) This means fewer DALYs averted per child death averted, which offsets some of the apparent gains from doubling our value on health. We expect to revisit this and try to form a more confident independent view about the balance of all these considerations in the future.
4. Expanding our spending, and modestly lowering our cost-effectiveness bar
We think there are four major buckets of updates that affect our “bar” going forward:
- The change to our weight on health, described just above.
- Other secular changes to GiveWell’s expected cost-effectiveness.
- Cross-cutting changes to our estimate of future available assets.
- Updates to our estimate of the likely “last dollar” cost-effectiveness of our non-GiveWell spending.
We walk through more detail below, but overall these factors leave us with a bar of very roughly 1,000x going forward for now.
4.1 Increased value on health
Overall, our new moral weights put more emphasis on health than we did before, which in some sense increases the amount of value at stake according to our moral framework, and should raise our cost-effectiveness bar, at least expressed in terms of marginal dollars to someone making $50K.
This table shows how we would rate the impact of various interventions, according to our new moral weights. (You can see the calculations in Google Sheets here.) Here’s how to read the first column:
- GiveWell estimates that HKI’s vitamin A supplementation program in Kenya averts 14 child deaths per $100k granted, and also increases incomes by 500 units of log-income.30
- Per GiveWell’s moral weights, this is 7 times more cost-effective than cash transfers to the global poor.
- Open Philanthropy’s old moral weights would have rated this intervention as 747 times more cost-effective than giving a dollar to someone with $50K of income.
- Under our new moral weights, we would rate this intervention as 977 times more cost-effective than giving a dollar to someone making $50K.
Note that the update in cost-effectiveness depends on the mix of beneficial outcomes an intervention generates. A charity that just increases income (as in the second column of this table) will have the same cost-effectiveness under our new or old moral framework. An intervention that simply averted DALYs (say, averting 1 DALY for every $100 spent) would be twice as cost-effective under our new moral weights. Because of the change in how we measure DALYs, an intervention that averts child deaths at a fixed rate (say, averting 1 child death for every $5000 spent) would only be roughly 1.5x more cost-effective under our new moral weights than under the old framework.31 The program in column 1 gets some of its moral impact from income interventions and some from averting child deaths, so its cost-effectiveness changes by a weighted average of 1x and 1.5x.
The third and fourth columns show the mix of outcomes that could be achieved by a typical dollar to GiveWell top charities – that is, roughly 50% of its impact from income effects and 50% from mortality effects.32 Our new weights rate the cost-effectiveness of this mix of outcomes ~25% higher than our old weights did.33 (To foreshadow a bit, this means that if we lower the bar by ~20%, simultaneously with changing our moral weights, then our nominal bar will not change.)
For the past few years GiveWell’s rough margin has been interventions that they rate as 10x more cost-effective than cash transfers to the global poor – the third column shows how a dollar could be spent at this margin. This was our prior bar, and you can see that in our old units it was ~1000x. If the only updates were to our valuation on DALYs (and our framework for translating deaths into DALYs), our bar would go to roughly 1300x (our rough old bar expressed in new units).
However, this change to our valuations is not the only change here; below we address others which, taken together, lower our expected cost-effectiveness of our last dollar (assuming the GiveWell mix of outcomes) by roughly 20%. This means our nominal bar is staying roughly constant.
4.2 Changes to GiveWell’s expected cost-effectiveness
Our expectation of future funding for GiveWell top charities, including both our support and that from others, has grown much faster than we would have expected in 2019. We didn’t have a precise model of expected future funding to GiveWell at that point, but very roughly we think it’s reasonable to model expected future funding for GiveWell’s recommendations as having doubled relative to our 2019 expectations. We currently model the GiveWell opportunity set as isoelastic with eta=.375,34 which implies that a doubling of expected funding should reduce marginal cost-effectiveness (and the bar) by 23% (1 – 2^-.375 ≈ 23%).
On the other hand, GiveWell has found slightly more cost-effective opportunities over the last year than we would have expected them to. This year, they’re expecting roughly $400M of spending capacity at least ~8x as cost-effective as GiveDirectly according to their modeling. This is roughly as much as we would have expected if they had already explored the space of “8x” opportunities as thoroughly as they’ve explored the opportunities that are ~10x as cost-effective as GiveDirectly.35 Given that they have only recently focused on exploring this space of slightly-less-cost-effective interventions, this is a very promising amount of spending capacity, and suggests the potential for even more capacity in the near future. This should marginally raise our bar.
We’ve also done more sophisticated modeling work on how we expect the cost-effectiveness of direct global health aid and asset returns to interact over time, and how we should optimally spread spending across time to maximize expected impact while hitting Cari and Dustin’s (our main funders) stated goal of spending down within their lifetimes. We’re still hoping to share that analysis and code, but the top level conclusion is that for opportunities like the GiveWell top charities and with a ~50 year spenddown target, it’s optimal to spend something like 9% of assets per year. That would imply a significantly faster pace of spending for the assets we expect to recommend to the GiveWell top charities than we’ve reached in the past, which would in turn imply a lowering of the bar. Two other interesting implications of the model for GiveWell spending are that: (a) we should be trying to get to our optimal spending and then spending down with a decreasing dollar amount each year (which may be a more a flaw/simplification of the model than accurate conclusion); and (b) we should expect our bar to fall by roughly 4% per year in nominal terms (largely reflecting asset returns – this helps equalize our “real”36 bar over time).
Overall, we currently expect GiveWell’s marginal cost-effectiveness to end up around 7-8x GiveDirectly (in their units), which, assuming their current distribution across health and income benefits, translates to ~900-1,100x in our new units,37 though our understanding is that GiveWell does not necessarily endorse this extrapolation. Assuming that we continue to support the GiveWell recommendations and have a correctly-implemented uniform bar, that implies a similar bar across all our other work, though it could turn out to be too low if we’re able to find many more cost-effective opportunities in other work.
One complication for extrapolating from the GiveWell bar to our other work is that GiveWell is much more thorough in their cost-effectiveness calculations than we typically are in our back-of-the-envelope calculations, which might mean that the results aren’t really comparable. We linked to some examples of our back-of-the-envelope calculations from our 2019 post, and they compare very unfavorably to the thoroughness of GiveWell’s cost-effectiveness analyses. That said, GiveWell also counts some second-order benefits (e.g., the expected income benefits of health interventions) that we typically don’t, so it isn’t totally obvious which direction this adjustment would end up pointing on net. (It’s also not clear how we would want to make the appropriate adjustment even in principle since there’s some division of labor going on where GiveWell has more conservative/skeptical epistemics, but we intentionally don’t consistently apply those epistemics across our work.) Overall, we think we should probably use a somewhat higher bar for our other BOTECs rather than just applying the same bar from GiveWell, but we’re not currently making a mechanical adjustment for this and don’t have a good sense of how big it should be.
4.3 Increases to our estimate of future available assets
Our available funding has increased significantly as a result of stock market moves over the last few years. (This is not independent of the assumption above of GiveWell’s available resources doubling, since that assumes a substantial increase in our giving to their top charities.)
We’ve also become more optimistic about future funders contributing to highly-effective opportunities of the sort we may recommend, which would also lower our current bar on the margin. Some of this is driven by the emergence of other billionaires self-identifying with effective altruism, but it also reflects GiveWell’s increased funding from other donors, increasingly concentrated wealth at the top end of the global distribution, and the cryptocurrency boom. (Yes, that is weird, we know.)
By the same logic as above, increasing expected resources should lower the bar, but we don’t have as good of a model for how cost-effectiveness scales with resources in other Global Health and Wellbeing causes as we do for GiveWell recommendations, especially not for causes that we haven’t identified yet.
If the only thing that changed were GiveWell autonomously lowering its bar and accordingly having less cost-effective marginal recommendations, we should in principle marginally reallocate away from GiveWell and to other opportunities. But GiveWell isn’t independently lowering its bar; our overall plans and assessment of our bar contribute to the update. And given the composition of GiveWell’s top charities, made up of scalable, commodities-driven global health interventions, we expect them to have a lower eta (i.e., decline less in cost-effectiveness with more funding) than opportunities like R&D or advocacy that are more people-intensive (where we have a prior that returns tend to be more like logarithmic, which is more steeply declining than our model of the GiveWell top charities). That should mean that as resources rise, a larger portion of the total should flow to GiveWell. And that is reflected in our most recent plans: we previously wrote that we expected something like 10% of Open Phil assets/spending to go to “straightforward charity” exemplified by the GiveWell top charities, but now anticipate likely giving a modestly higher proportion (which, combined with asset increases, will mean substantially increasing our support for them in dollar terms).
4.4 Updates to our estimate of the likely “last dollar” cost-effectiveness of our non-GiveWell spending
Above, we argued that the GiveWell bar should go down in terms of our old weights and stay roughly nominally flat (at ~1,000x) in our new units. While the first-order implication is that the bar should uniformly decline across all of our Global Health and Wellbeing work, there are some complicating considerations.
Our new higher valuation on health and the GiveWell bar going down makes non-GiveWell global health R&D and advocacy opportunities look more promising than before, and we really don’t know how many of these opportunities we could find. If, hypothetically, we could find billions of dollars a year in global health R&D opportunities with marginal cost-effectiveness above the new GiveWell bar, then that should be the bar instead (and, by implication, we shouldn’t fund the GiveWell recommendations going forward). That said, it seems unlikely a priori that marginal cost-effectiveness for billions of dollars of global health R&D or advocacy spending would end up right in between the old and new GiveWell bars (which fall by one third for child health interventions and 50% for adult health interventions38), so the most likely implications of this are that either (less likely) our bar has always been too low and we should have always been doing this hypothetical global health R&D or advocacy instead of supporting GiveWell top charities, or (more likely) we will find many good opportunities better than the marginal GiveWell dollar but not enough that they independently drive our marginal dollar. Our new valuation on health (relative to income) is also a little higher than GiveWell’s, which in principle means that there could be things above our bar but below theirs, though in practice the valuations are close enough that we don’t think this is likely to be a big deal.
More concretely, we’ve been continuing to explore new areas that we think might be more leveraged than the GiveWell top charities, including recently making hires (announcements coming soon!) to lead new programs in global aid policy and South Asian air quality. The basic thesis on these new causes is to try to “multiply sources of leverage for philanthropic impact (e.g., advocacy, scientific research, helping the global poor) to get more humanitarian impact per dollar (for instance via advocacy around scientific research funding or policies, or scientific research around global health interventions, or policy around global health and development).” With our first hires in these new causes starting next year, it’s too soon for us to have a major update on the marginal cost-effectiveness of spending far out the curve from where we are now, but a few modest updates:
- The a priori case for the existence of large scale leveraged interventions more cost-effective than the GiveWell margin continues to seem compelling to us. The Bill and Melinda Gates Foundation spends well over half a billion dollars a year on global health research and development39 and we find it plausible (though far from obvious, and haven’t been able to get great data on the view) that the marginal dollar there is better than the GiveWell margin.
- But we haven’t found anything obviously more cost-effective than the GiveWell margin and scalable to billions of dollars a year. We’re far from being done looking, and have only covered a small part of the conceptual space, but we’re also not prepared to bet that we will succeed at that scale in the future.
- For a more pessimistic prior, consider that at GiveWell’s bar of 8x GiveDirectly, the cost per outcome-as-good-as-averting-an-under-5 death is about $4000-$4500, and the cost per actual under-5 life saved (ignoring other benefits) for a charity focused on saving kids’ lives, is about $6000-$7000.40 There are about 5 million under-5 deaths per year, which implies that they would all be eliminated for $20-35B/year if GiveWell’s cost-effectiveness level could be maintained to an arbitrary scale. Total development assistance for health is more than that. If there were more than $3B/year of room-for-more-funding an order of magnitude more cost-effective than GiveWell’s margin, it would need to be as effective as eliminating all child mortality. This argument isn’t decisive by any means but can help give a sense of how hard it would be to beat the GiveWell margin by a lot at massive scale: there just aren’t that many orders of magnitude to work with.
- We still don’t think all of our existing grantmaking necessarily hits this bar, and are continuing to try to move flexible portfolios towards higher expected-return areas while learning more about our returns and considering reducing our spending in others.
Our current best guess is that we won’t be able to spend as much as we need to within the Global Health and Wellbeing frame at a significantly higher marginal cost-effectiveness rate (though average might be higher) than the GiveWell top charities, so the marginal cost-effectiveness of the GiveWell recommendations continues to be a relevant metric for our overall bar. However, we still expect most of our medium-term growth in GHW to be in new causes that can take advantage of the leveraged returns to research and advocacy, and could imagine that we’ll eventually find enough room for more funding in those interventions that we will need to raise the bar again.
We haven’t done a thorough analysis of the costs of over- vs under-shooting “the bar” for all of our causes, but one important takeaway from our analysis of that for the GiveWell setting, which may or may not properly extrapolate, is that saving and collecting investment returns isn’t “free” (or positive) from an impact perspective. That is because, very roughly, the world is getting better (and accordingly opportunities to improve the world are getting worse) over time, and saved funding doesn’t have that long to compound and has to be spent later at a higher rate given the “spend down within our primary donors’ lifetimes” constraint (which in turn likely means it will be spent at a lower cost-effectiveness further out the annual spending curve). We also think we will be in a much better place to raise more funding from other donors if we’re spending down our existing resources, and the expected benefits of that, along with the ex ante possibility of raising a lot more money, makes it theoretically ambiguous whether the costs of under- or over-estimating the “true” bar are higher. Accordingly we’re just going with our very rough and round current best guess for the bar for now, rather than doing a full expected-value calculation, and we will revisit in the future as we learn more.
4.5 Bottom line
We are now treating our bar as “roughly 1,000x” (with our new weights on health) for the GiveWell top charities and in our new cause selection and grantmaking, though we retain considerable uncertainty and expect to continue to revisit that over the coming years. For the typical mix of GiveWell interventions, this bar is about 20% lower given our new moral weights.41
We think it’s important to note that the bar is very rough – we aren’t very confident that it, or the BOTECs we consider against it, are even within a factor of 2 of correct – and we will continue to put considerable weight on factors not included in our rough back-of-the-envelope calculations in making major decisions.
Due to this analysis and the lower forward-looking bar, we’re planning to give more to the GiveWell top charities this year and going forward – more on that next week.
5. Appendix A
The available direct service interventions in health, like the ones GiveWell recommends, are far more cost-effective in low-income countries than high-income countries, so the discussion above focuses on what value we should place on lifesaving interventions in low-income countries. If we were focused instead on saving lives in the developed world, likely via advocacy of some sort, we might trade off differently between lifesaving vs income-enhancing interventions – we are uncertain over a range of rich-world DALY valuations between 2-6 units of log-income.
There are theoretical and empirical reasons to think that the exchange rate at which people trade off mortality risks against income gains differs systematically across income levels, with richer people valuing mortality more relative to income.
5.1 VSL elasticity to income
The main empirical evidence is from the VSL literature described above. As discussed, economists often attempt to statistically estimate a function mapping VSL to income, anchoring it off the better-validated VSL figures in high-income countries.
This literature generally finds that individuals’ willingness to pay for life expectancy increases with income, which is unsurprising – a dollar matters a lot more to a rich person, so if everyone valued a DALY equally then VSLYs would increase linearly with income. Most reviews also find that this willingness to pay increases at a faster pace than income. For example, Robinson et al. 2019 review the mainstream literature, which finds that the elasticity of VSL to income is between 1.0-1.2 across LICs (and a bit below 1 across the developed world), though Robinson’s own analysis suggests an elasticity of 1.5 for extrapolating to LMICs.42 An elasticity of 1.2 would mean that if two individuals’ income differs by 10%, then on average the dollar value they place on a year of life expectancy will differ by 12%.
Lisa Robinson chaired a commission, sponsored by the Gates Foundation, that recommended the following ensemble approach:
- VSL is anchored to US at 160x income, with an income elasticity of 1.5, and a lower bound of 20x income.
- VSL is anchored to US at 160x income, with an income elasticity of 1.0
- VSL is anchored to OECD at 100x income, with an income elasticity of 1.0
These yield the following VSLY estimates for someone at an income 1/100th that of the US43:
Here again is the chart from our discussion of LIC VSLYs above. As noted above, in this chart we plot all the estimates found by the literature search in Robinson et al. 2019’s meta-analysis – they searched for all VSL analyses, whether stated or revealed, in any country that had been classified as low- or middle-income in the last 20 years.47
I’ve added a few lines to show various elasticities you could use to predict LMIC VSLYs based on US VSLYs. An elasticity of 1 would say that willingness to pay for a year of life expectancy varies 1:1 with income – combining that with the US VSLY of 4 years of income, we’d predict that individuals in any country are willing, on average, to trade 4 units of log-income for 1 year of life expectancy. If instead we combined an elasticity of 1.1 with the US VSLY, we’d predict that individuals at the global poverty line would be willing to trade roughly 2.5 years of income for a year of life expectancy – this is roughly in line with the VSLYs from IDInsight’s surveys of communities demographically similar to GiveWell beneficiaries.
It seems to me that any elasticity between, say, 0.9 and 1.3 is potentially compatible with this data.
Analysts are often interested in the relationship between VSL and national-level income statistics like per-capita GDP, even though we often have measures of the respondents’ incomes, because it is often more practical to apply heuristics to national-level income statistics, especially when deriving VSL estimates to inform national policies. When we analyze VSLY measured in multiples of GNI per capita, rather than respondents’ income, we see a stronger relationship between income and VSLY – this data could be compatible with elasticities between, say, 1.0 and 1.5. This seems to be because the respondents in LMIC VSL studies report lower average incomes than their respective national average – presumably because the social scientists performing these studies are interested in somewhat poorer populations.48 We suspect that comparing LMIC VSL estimates to national-level income averages, rather than to respondents’ (lower) incomes, biases analyses like Robinson et al. toward finding higher elasticities of VSL to income.
5.2 Theoretical arguments for DALY valuations to vary by income
One theoretical argument would go like this: If you think we can improve individuals’ lives by improving their incomes, and you also think the moral impact of saving a life varies somewhat with the quality of that life (i.e. it’s better to extend a happy life than a miserable life), then it follows that it is more valuable in theory to extend the life of a typical high-income country resident than that of a typical person at the global poverty line.49 Many people (including us) find this a deeply concerning line of reasoning – and critically, this theoretical dynamic is in reality swamped by the fact that it is dramatically less expensive to extend the life of someone at the global poverty line, which is why the overwhelming majority of our GHW portfolio is focused on extending lives and increasing incomes in low and lower middle income countries.
5.3 Bringing these considerations together
We’ve been unsettled about how to aggregate these lines of argument, but ended up concluding that we didn’t need to reach a resolution on this because the expected cost of mistakes if we were wrong (i.e., if we assumed 2x and the true answer were 6x, or vice versa) were low.50
For now, we haven’t decided specifically how to weigh lives-vs-income tradeoffs in high-income countries, and when we face decisions that might depend on the specifics, will test a range of values between 2 and 6 units of log-income.