Oliver Sourbut

Technical staff (Autonomous Systems) @ UK AI Safety Institute (AISI)
393 karmaJoined Working (6-15 years)Pursuing a doctoral degree (e.g. PhD)London, UK
www.oliversourbut.net

Bio

Participation
4

  • Autonomous Systems @ UK AI Safety Institute (AISI)
  • DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
  • Former senior data scientist and software engineer + SERI MATS

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

  • Ord - The Precipice
  • Pearl - The Book of Why
  • Bostrom - Superintelligence
  • McCall Smith - The No. 1 Ladies' Detective Agency (and series)
  • Melville - Moby-Dick
  • Abelson & Sussman - Structure and Interpretation of Computer Programs
  • Stross - Accelerando
  • Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

  • Hanabi (can't recommend enough; try it out!)
  • Pandemic (ironic at time of writing...)
  • Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
  • Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.

Comments
69

On point 1 (space colonization), I think it's hard and slow! So the same issue as with bio risks might apply: AGI doesn't get you this robustness quickly for free. See other comment on this post.

I like your point 2 about chancy vs merely uncertain. I guess a related point is that when the 'runs' of the risks are in some way correlated, having survived once is evidence that survivability is higher. (Up to an including the fully correlated 'merely uncertain' extreme?)

For clarity, you're using 'important' here in something like an importance x tractability x neglectedness factoring? So yes more important (but there might be reasons to think it's less tractable or neglected)?

I've been meaning to write something about 'revisiting the alignment strategy'. The section 5 here ('Won't AGI make post-AGI catastrophes essentially irrelevant?') makes the point very clearly:

On this view, a post-AGI world is nearly binary—utopia or extinction—leaving little room for Sisyphean scenarios.

But I think this is too optimistic about the speed and completeness of the transition to globally deployed, robustly aligned "guardian" systems.

without making much of a case for it. Interested in Will and reviewers' sense of the space and literature here.

Yep, definitely for me 'big civ setbacks are really bad' was already baked in from the POV of setting bad context for pre-AGI-transition(s) (as well as their direct badness). But while I'd already agreed with Will about post-AGI not being an 'end of history' (in the sense that much remains uncertain re safety), I hadn't thought through the implication that setbacks could force a rerun of the most perilous transition(s), which does add some extra concern.

A small aside: some put forth interplanetary civilisation as a partial defence against either of total destruction and 'setback'. But reaching the milestone of having a really robustly interplanetary civ might itself take quite a long time after AGI - especially if (like me) you think digital uploading is nontrivial.

(This abstractly echoes the suggestion in this piece that bio defence might take a long time, which I agree with.)

Some gestures which didn't make the cut as they're too woolly or not quite the right shape:

  • adversarial exponentials might force exponential expense per gain
    • e.g. combatting replicators
    • e.g. brute forcing passwords
  • many empirical 'learning curve' effects appear to consume exponential observations per increment
    • Wright's Law (which is the more general cousin of Moore's Law) requires exponentially many production iterations per incremental efficiency gain
    • Deep learning scaling laws appear to consume exponential inputs per incremental gain
    • AlphaCode and AlphaZero appear to make uniform gains per runtime compute doubling
    • OpenAI's o-series 'reasoning models' appear to improve accuracy on many benchmarks with logarithmic returns to more 'test time' compute
    • (in all of these examples, there's some choice of what scale to represent 'output' on, which affects whether the gains look uniform or not, so the thesis rests on whether the choices made are 'natural' in some way)

This is lovely, thank you!

My main concern would be that it takes the same very approximating stance as much other writing in the area, conflating all kinds of algorithmic progress into a single scalar 'quality of the algorithms'.

You do moderately well here, noting that the most direct interpretation of your model regards speed or runtime compute efficiency, yielding 'copies that can be run' as the immediate downstream consequence (and discussing in a footnote the relationship to 'intelligence'[1] and the distinction between 'inference' and training compute).

I worry that many readers don't track those (important!) distinctions and tend to conflate these concepts. For what it's worth, by distinguishing these concepts, I have come to the (tentative) conclusion that a speed/compute efficiency explosion is plausible (though not guaranteed), but an 'intelligence' explosion in software alone is less likely, except as a downstream effect of running faster (which might be nontrivial if pouring more effective compute into training and runtime yields meaningful gains).


  1. Of course, 'intelligence' is also very many-dimensional! I think the most important factor in discussions like these regarding takeoff is 'sample efficiency', since that's quite generalisable and feeds into most downstream applications of more generic 'intelligence' resources. This is relevant to R&D because sample efficiency affects how quickly you can accrue research taste, which controls the stable level of your exploration quality. Domain-knowledge and taste are obviously less generalisable, and harder to get in silico alone. ↩︎

Glad to hear it! Any particular thoughts or suggestions? (Consider applying, or telling colleagues and friends you think would be a good fit!)

On this note, the Future of Life Foundation (headed by Anthony Aguirre, mentioned in this post) is today launching a fellowship on AI for Human Reasoning.

Why? Whether you expect gradual or sudden AI takeoff, and whether you're afraid of gradual or acute catastrophes, it really matters how well-informed, clear-headed, and free from coordination failures we are navigating into and through AI transitions. Just the occasion for human reasoning uplift!

12 weeks, $25-50k stipend, mentorship, and potential pathways to future funding and impact. Applications close June 9th.

Load more