What is the Expected Value of Working on AI Safety? I Ran the Numbers.

Hazem Hassan 🔶

This is a linkpost for https://hazemhasan.substack.com/p/what-is-the-expected-value-of-working

TL;DR

I estimate the expected impact of an additional early-career AI safety researcher by combining assumptions about AI risk, tractability, counterfactual replaceability, and population at stake to express the result in GiveWell-equivalent terms. Under what I believe are very conservative inputs, the estimate is on the order of a few million dollars per year in equivalent donations. The results are very sensitive to some unkown parameters, though. Access the calculator/model used here (tweakable to a wide range of beliefs and judgements).

Written for readers not necessarily familiar with EA, so some basic concepts and orgs are explained in more detail than is typical on the Forum.

The Question (Almost) Nobody Calculates

“AI safety is important” is something EAs say a lot. But how important, exactly? Like, in dollars? That’s the question I faced a month ago, and I’ve just finished answering it (at least enough to convince myself).

Background: I’m a freshman at Penn. I have to decide what to do with my career in the next few years. “Work on AI safety” kept coming up, but I couldn’t find an expected value calculation for how much good one more safety researcher actually generates.* Probably because it’s too hard and any attempt ends up wishy-washy. Nevertheless, I sat down with Opus 4.6 and tried to actually calculate it.

I made every assumption as modest as possible. I anchored to the most skeptical credible forecasters I could find. I pressure-tested (nearly) every parameter. Even after all that, the answer was: $5M in (GiveWell) donations per year.

I built a calculator^[1] where you can plug in your own assumptions and see what you get. The rest of this post walks through the logic behind it.

*The one prior attempt I found is @Jordan Taylor's “Expected impact of a career in AI safety under different opinions” (2022). Taylor’s estimations are more optimistic (they count a lot of future people). Mine is deliberately skeptical, trying to (1) establish a floor for undergrads / early-career EAs considering AI safety, and (2) make the case to people who aren’t already convinced (most of my college friends). The idea for (1) and this entire project was inspired by William MacAskill’s description of “lower-bound reasoning” (chapter 6 of Doing Good Better) to help decide between two careers. For context, my other contender is entrepreneurship with earning-to-give.

How to Calculate E(AI Safety Career)?

The expected impact of one AI safety career, measured in GiveWell-equivalent dollars, is:

Each piece has sub-components. The calculator lets you tweak all of them. I’ll walk through the important ones.

How Likely Is AI Catastrophe?

This is the parameter with the widest disagreement, and the one that matters most.

The Existential Risk Persuasion Tournament (Karger et al. 2023) brought together 89 superforecasters — people with strong track records on real-world prediction questions — and 80 domain experts, then had them deliberate and give final estimates.^[2] Superforecasters landed at 0.38% chance of AI-caused extinction by 2100. Domain experts landed at 3%. (This was the largest disagreement in the entire tournament.)

Meanwhile, a survey of 2,778 AI researchers who’d published at NeurIPS or ICML gave a median of 5% and a mean of ~9%.^[3]

The calculator defaults to 0.75% — roughly where Samotsvety, an high-perfoming forecasting group, would land based on their AGI timelines. That’s between the superforecasters and the domain experts. You can change it.

When Does Your Career Window End?

Here’s something imporatnt to note: your career window ends when AGI arrives, because after that, either AI solves everything (and your career is useless/moot) or it doesn’t go well (we all die).

The calculator uses your chosen AGI timeline to set this automatically. Pick “Samotsvety” and your career ends around 2033. Pick “Metaculus” and it’s similar. Pick the superforecaster timeline and you get until ~2040.

For a freshman graduating in May 2029 with a Samotsvety anchor, that’s about 3.6 working years.

How Much Does Safety Research Actually Help?

This is the parameter I’m least confident about. Nobody has measured it empirically. The only framework I found: assume some percentage of relative risk reduction per doubling of cumulative research effort.^[4] One EA Forum commenter suggested 5-10% per doubling. The calculator defaults to 7%.

Whether the entire field of ~1,100 researchers^[5] reduces AI catastrophe risk by 0.03 or 3 percentage points depends almost entirely on this one assumption. If you believe the research being done today is useless (against future AI systems), the number is near zero. If you believe a few key breakthroughs may change the trajectory, then it’s higher.

I can’t resolve this. Neither can anyone else right now (I think). So for now, I’ll have to make it up. The calculator lets you try different values.

How Productive Are You?

A fresh graduate doesn’t produce as much research as a senior researcher. Academic data gives us a baseline: at Norwegian universities, the bottom 50% of researchers produce 15% of total output.^[6] That’s a per-person average of 0.30x the field mean. Juniors cluster in this bottom half. The boundary between the top and bottom half sits around 0.65-0.70x.

The calculator uses these empirically-derived multipliers (0.30x for your first two years, 0.70x for years 3-4, 0.80x for year 5+) and weighs them against your working years to get quality-adjusted researcher-years.

With 3.6 years and the default multipliers, you produce about 1.7 quality-adjusted researcher-years out of a field total of roughly 15,000.

What About the Counterfactual?

If you don’t take this role, does someone equally good fill it? If yes, your contribution was zero regardless of how important the work is.

The answer is somewhere between “definitely yes” and “definitely no” (note to self: no sh*t, Sherlock).

Evidence for the answer being less than “definitely yes”: MATS (a major AI safety training program) reports that fellow applications are growing 1.8x per year, but actual deployed research talent is only growing 1.25x per year.^[7]

The calculator defaults to a what I believe is modest: 30% (a judgment call, not a derivation of anything). I couldn’t find any empirical measurements of this. In fact, I’m not sure how this can ever be measured.

How Many People Are at Stake?

Not just the current 8 billion. If you prevent AI catastrophe, future generations exist too. But each future generation should be discounted by the probability it goes extinct from non-AI causes (think: nuclear war, pandemics, climate change).

The only peer-reviewed estimate of natural extinction risk is Snyder-Beattie et al. (2019, Nature Scientific Reports): less than 1-in-14,000 per year.^[8] But that excludes anthropogenic (man-caused) risk. Domain experts put total non-AI risk at roughly 3% per century.^[2]

Using a geometric series with 2% per-generation risk (deliberately pessimistic), 3.3 billion births per generation (UN data^[9]), and 50% chance that AI catastrophe is truly permanent^[10], the calculator arrives at about 90 billion (probability-weighted) people at stake.

What Is a GiveWell Donation Actually Worth?

GiveWell’s top charities save a life for roughly $3,000-$5,500.^[11] The calculator defaults to $4,000. But under short AI timelines, that $4,000 doesn’t really buy a full life.

There are three scenarios. The calculator groups them explicitly:

Scenario A: AGI goes well (28.5% chance with Samotsvety defaults). AI solves poverty, malaria, clean water (all of global poverty, basically) within a few years. The child you saved for $4,000 would have been saved anyway. Your donation was redundant. Maybe 10% of the value is retained (again, making numbers up), accounting for a deployment lag.

Scenario B: AI catastrophe (0.2% chance). Everyone dies or civilization collapses. The child you saved at age 2 dies at age ~4 in the catastrophe. You bought 2 years of life, not 58. That’s 2/58 = 3.1% of the value. This number is derived from your AGI timeline (change the timeline and it update)s.

Scenario C: No AGI yet (71.3% chance). Business as usual. GiveWell retains full value.

Blended: a GiveWell donation is worth about 74% of its face value. The effective cost per life is roughly $5,400 instead of $4,000.

Note that this discount only applies to GiveWell donations. Direct AI safety work isn’t discounted by short timelines because if you reduce the probability of catastrophe, that value doesn’t depend on when AGI arrives.

The Results

With all the defaults (Samotsvety’s P(doom), Samotsvety AGI timeline, modest assumptions everywhere else), one marignal freshman pursuing AI safety researcher produces:

$13.6 million in total career impact (GiveWell-equivalent)

To match that through GiveWell donations, you’d need to donate:

$16.9 million as a lump sum at graduation (GiveWell discount is smallest here because AGI hasn’t arrived yet)
$18.4 million as a lump sum at mid-career (discount is larger cause AGI is more likely by then)
$5.1 million per year on average

That last number is the bottom line. Can you donate an average of $5.1 million per year (inflation-adjusted, of course) to GiveWell from graduation until AGI? If not, AI safety research is the better career option in terms of expected value. (At least, that’s how I was thinking about it, because my other option for doing good was entrepreneurship + earning-to-give at scale)

What If You’re More Skeptical?

Change the P(AI catastrophe) to the superforecaster level (0.38%) and everything scales down proportionally. The per-year threshold drops to roughly $2.5M per year. Still not a number 99%+ of American adults can make per year.

Change the tractability parameter (risk reduction per doubling) to the very-skeptical 3% and it drops to $1M per year.

Try it yourself.^[1] Every parameter is a dropdown with sourced options. The three scenarios in the GiveWell section adapt automatically when you change the AGI timeline. The sources sheet lists 22 cited works with URLs.

What Would It Take to Reject AI Safety?

You’d need to believe many** of these simultaneously:

AI catastrophe risk is below 0.38% (lower than superforecasters)
Safety research reduces risk by less than 3% (i.e. the entire field is nearly useless)
Your counterfactual contribution is way below 30% (i.e. you’re highly replaceable))
Non-AI extinction risk exceeds 2% per generation (i.e. civilization is fragile enough that solving alignment barely matters)
AI catastrophe is recoverable most of the the time
You can donate ~$5M per year to GiveWell

Some are defensible; however, holding multiple of these at once is too confident a position (in my opinion).

**I’m too lazy to find the configuration with the smallest number of simultaneous beliefs.

I might be wrong about a lot of things here. I’m a freshman, not an expert. I used Claude extensively for the calculations and literature sourcing (the judgment calls and final reasoning are largely mine, though). If you have better numbers for any parameter, especially tractability (AI risk reduction per doubling of AI safety researchers) and per‑generation non-AI-caused extinction risk, please share them! Those two drive the estimate strongly, and neither has any empirical basis (at least, that I can find).

^{^}
Calculator: https://docs.google.com/spreadsheets/d/1ucXLecZ1OA42I9pJh-xAUzr4ICUg2ccfvMbwj8D0bh4/edit?usp=sharing. Alll parameters are tweakable dropdowns. Sources sheet has 22 citations.
^{^}
Karger et al. (2023), “Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament.” 89 superforecasters and 80 domain experts. Superforecasters: 0.38% AI extinction by 2100. Domain experts: 3%.
^{^}
Grace et al. (2024), “Thousands of AI Authors on the Future of AI.” 2,778 researchers surveyed. Median: 5%. Mean: ~9%. 38-51% placed at least 10% on extinction-level outcomes.

^{^}

The “relative risk reduction per doubling” framework comes from comments on Jordan Taylor’s EA Forum post (2022), “Expected impact of a career in AI safety under different opinions”

^{^}

EA Forum field growth analysis (September 2025). ~600 technical + ~500 non-technical AI safety FTEs across 113 organizations. Growing at ~21% per year on the technical side.

^{^}

Kyvik (1989), “Productivity differences, fields of learning, and Lotka’s law.” Norwegian universities. “About 20% of the tenured faculty produce 50% of the total output, and the most prolific half of the researchers account for almost 85% of the output.” Bottom 50% average: 15%/50% = 0.30x field mean.

^{^}

LessWrong (2025), “AI safety undervalues founders.”

^{^}

Snyder-Beattie, Ord, and Bonsall (2019), “An upper bound for the background rate of human extinction.” Nature Scientific Reports. Natural extinction risk: “almost guaranteed to be less than one in 14,000 per year, and likely to be less than one in 87,000.”

^{^}

UN Department of Economic and Social Affairs (2024), World Population Prospects. ~132 million births per year currently.

^{^}

RAND (2025), “Could AI Really Kill Off Humans?” Found that true human extinction is mechanistically very hard (e.g. AI-initiated nuclear war would probably not kill every human).

^{^}

GiveWell (2024), “How Much Does It Cost to Save a Life?” Range: $3,000-$5,500. Note: they “generally expect the cost to save a life to increase over time.”

Show all footnotes

Effective Altruism Forum
EA Forum

What is the Expected Value of Working on AI Safety? I Ran the Numbers.

14

TL;DR

The Question (Almost) Nobody Calculates

How to Calculate E(AI Safety Career)?

How Likely Is AI Catastrophe?

When Does Your Career Window End?

How Much Does Safety Research Actually Help?

How Productive Are You?

What About the Counterfactual?

How Many People Are at Stake?

What Is a GiveWell Donation Actually Worth?

The Results

What If You’re More Skeptical?

What Would It Take to Reject AI Safety?

14

Reactions

More posts like this