AI Safety Has a Very Particular Worldview

zeshen🔸

This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked.

Epistemic status: This post is based on vibes from interacting with people in AI safety over several years. I’m trying to paint a broad brush on a large group of people, so obviously the things I say here won’t map perfectly onto everyone in the field.

Summary

Much of AI safety in EA is shaped by a very particular worldview. This worldview then shapes conversations, communities, funding, incentives, emotions, and more. There also tends to be a positive feedback loop, where this worldview gets reinforced. A quick browse around the EA Forum has led me to realize that apparently some have never even heard of arguments outside of this narrow worldview. I find this concerning.

What’s this worldview about?

What even is a worldview? It is the set of beliefs about fundamental aspects of Reality that ground and influence all one's perceiving, thinking, knowing, and doing (source). For example, a social justice worldview primarily sees the world as dynamics between the oppressed and the oppressor. In any worldview, there is a central organizing theme on what the world is “about” — everything else is just commentary.

EAs in AI safety also often have a particular worldview. This worldview is characterized by the belief that it is very likely that in the near-term (perhaps 2027, or in the next five years, or the next decade, or whatever), we will achieve this thing called artificial general intelligence (AGI) which may continue to scale to artificial superintelligence (ASI) soon after. It’s what those in the ‘normalist view’ refer to as the ‘superintelligence worldview’. It’s about “feeling the AGI” or being “AGI-pilled”.^[1]

In this worldview, an AI takeover is so salient that it’s no longer about whether it will happen, but how it will happen. Is it FOOM? Or gradual? Pick your favourite ‘threat model’.

This also sometimes has an unfortunate implication that those who have this worldview think about others who don’t share their worldview as being one of the following:

Have not yet engaged seriously with the arguments
Have really bad epistemics and are constantly having motivated reasoning
Are actually evil
Are actually way too freaked out but just trying to ‘cope’

AI-centrism

A big part of this worldview is about having an AI-centric view. Just like how someone with a social justice worldview sees the world mostly through the lens of the oppressor versus the oppressed, the AI safety in EA worldview sees the world mostly through the lens of AGI versus humans.

The implicit first step is to place the AGI in the centre of everything. Then, reasonable claims like the orthogonality thesis and the instrumental convergence thesis are used. Once this is established, the rest of the arguments to existential risk would come naturally, and it naturally leads to the conclusion that we’re all going to die from a power-seeking AI.

The problem is not that these theses are wrong — in fact, they're perfectly logical in theory. Rather, they are just assumed to also be true in practice. For instance, the instrumental convergence thesis, which relies on having the agent behave like strong optimizers, is implicitly assumed to be true even though AIs today behave nothing like strong optimizers.

Risks, capabilities, and model evals

Everyone is worried about AI risks. But what’s unique about those who have this worldview is that they are worried about AGI (which could eventually become ASI), which, if misaligned, could literally destroy the world.

How do we know if we’re getting close to AGI? Model capabilities. How do we measure model capabilities? Model evals. How do we know if they will actually take over the world? Model propensities. How do we what propensities these models have? Again, model evals.

So in this worldview, results from model evals count as strong evidence for risks. Model evals tell us about a lot about model capabilities and propensities, which in turn tell us a lot about our chances of surviving an AGI existential catastrophe. Numbers go up = scary.

While they recognize that model evals are not perfect, they rarely challenge the underlying assumption that model evals translate well into capabilities which then translate well into risks. To them, it’s unthinkable to not be worried by charts that show models performing better and better on all sorts of benchmarks. They do not seriously consider that the argument they find meaningful or convincing only makes sense from within the worldview, not outside of it.

On thinking concretely

Another part of this worldview seems to be about reasoning from high-level abstract concepts without being concrete.

Or rather, when those in this worldview talk about being concrete, what they mean is concrete stories on how we achieve AGI, not concrete pathways of how AGI would lead to actual physical harm. To them, anything that happens downstream of the AI is largely irrelevant, because if we all die anyway, it doesn’t matter how exactly we die.

Just as how evidence that is meaningful to those inside this worldview tend to be rather meaningless to those outside of this worldview, what is concrete to those within this worldview often tend to be not at all concrete to those outside of the worldview.

On forecasting

Given all of the above, those who adopt this worldview tend to be very interested in forecasting, but in a very specific way. It often involves forecasting when AGI will arrive (based on extrapolation of certain data points) and how it would unfold.

There is a tendency where conversations around theory of change of AI safety research agenda starts with a question like “what’s your threat model”, where the answer would generally be some variation of “AIs will be very capable and misaligned and will take over the world”. Those within the worldview might disagree with specifics of the answer, but those outside the worldview might reject the question itself.^[2]

How did this happen?

So how did so much of those in AI safety within EA end up having this peculiar worldview?

I think it largely boils down to selection effects. The AI safety field naturally selects for people who adopt this worldview because it makes the problem of AI risk feel more visceral to them. Those who feel a strong sense of urgency would naturally be motivated to engage and contribute to the field.

Then the positive feedback loop starts. These people become more senior and start shaping the field. Research agenda is set around these worldviews. Grants are given to those who do research in topics that fit this worldview. More people join, and newcomers start adopting this worldview.

And the process repeats.

So what?

Even if some people in EA have this worldview, does it matter?

I think it does. I find it quite concerning that we constantly spread the meme that AGI is coming soon with arguments that only make sense to those within this particular worldview. It also concerns me that we even seem to be prioritizing the transition to AGI so much that it is becoming the mission of EA.

So there are probably a few things we could do.

First, let’s stop thinking about AGI as this thing that is extremely qualitatively different from whatever we have right now. It’s not going to be a binary and straightforward “I know it when I see it” thing. If you had gone back in time and showed today’s ChatGPT to someone ten years ago, there’s a good chance they’d be convinced that ChatGPT is already AGI — they probably wouldn’t think “ChatGPT is definitely not AGI but something else will surely be AGI”. We already have AIs that are dumb in some ways, human-level in some ways, and superhuman in many ways. Yet our lives remain largely the way it has been, changing over time as they always do.

Second, let’s at least recognize that the world is a complex place with 8 billion humans and all sorts of systems that underlie human civilization. The world is not just a bunch of humans sitting around, where a wild AGI would suddenly appear and take over the world.

Third, let’s consider that the problem is not about us not sufficiently updating our priors with new evidence. Rather, we are updating our priors with evidence that provides very little information, built on an assumptions-laden worldview as our foundation. Consider that model evals tell us very little about capabilities and propensities, and capabilities and propensities tell us very little about actual risks.

Lastly, it goes without saying, but for many EAs in AI safety, it’s probably worth having a richer life outside of EA. There’s a big world out there, and interacting with the rest of the world might help realize that sometimes we might be stuck in a very particular idiosyncratic worldview.

^{^}
To be clear, I’m not saying that an AGI can never exist in the way that people think about them. After all, humans are algorithms wrapped in organic matter that constitutes the very definition of (non-artificial) general intelligence. What I’m saying is that people are often making an assumption that a very scary “AGI” is very likely to come into existence very soon, and it will transform the world in unimaginable ways, including leading us towards an existential catastrophe.
^{^}
As another example, in the debate between Daniel Kokotajlo and Sayash Kapoor, Daniel asked Sayash about what his story is (i.e. how does he see the future playing out). Eventually, outside of the debate, Sayash (and Arvind Narayanan) wrote in their blog that "This kind of scenario forecasting is only a meaningful activity within their worldview. We are concrete about the things we think we can be concrete about."

37 Reactions

More posts like this

Comments5

Sorted by

New & upvoted

Click to highlight new comments since: Today at 4:04 PM

Yarrow Bouchard🔸Nov 11

I find this post incisive overall. I love this paragraph in particular:

What even is a worldview? It is the set of beliefs about fundamental aspects of Reality that ground and influence all one's perceiving, thinking, knowing, and doing (source). For example, a social justice worldview primarily sees the world as dynamics between the oppressed and the oppressor. In any worldview, there is a central organizing theme on what the world is “about” — everything else is just commentary.

I really like the way you said this. I think "a central organizing theme on what the world is ‘about’" is a clearer articulation than previous formulations I’ve heard. I find this insight.

I don’t know if this is a good definition of worldviews in general. For example, maybe someone’s worldview is that the world is full of diverse and heterogeneous problems and conflicts, each of which requires their own special attention and thinking. That seems like it could be a worldview but there wouldn’t be one thing the whole world is about. I think maybe there should be a term specifically worldviews that are characterized by reliance on a central organizing theme.

A related concept is the sociologist James Hughes’ treatment of millennialism. He has a great talk about that called "Avoiding Millennialist Cognitive Biases". (If you prefer to read, he has a paper called "Millennial Tendencies in Response to Apocalyptic Threats" that was published in the essay anthology Global Catastrophic Risks.) Millennialist beliefs can be totalizing — they can make the world "about" some coming utopia or apocalypse, or utopia that will follow an apocalypse. But millennialist beliefs don’t encompass all totalizing worldviews. For example, someone could have a Christian worldview that is totalizing but not believe in the rapture or the end times or anything like that. Or someone could have a social justice worldview that is fairly totalizing but doesn’t involve a utopia or an apocalypse.

The superintelligence worldview is definitely millennialist. The advent of superintelligence is presented as bringing either apocalypse and utopia.

Identifying a belief or a worldview as millennialist isn’t a refutation of it. It is merely a classification. But that classification that should draw concern because of the psychological biases associated with millennialism. It warrants caution and scrutiny. It’s similar to any situation where you identify bias — detecting bias isn’t a refutation, it’s just a reason for skepticism.

Jobst Heitzig (vodle.it)Oct 223

Maybe this is true in the EA branch of AI safety. In the wider community, e.g. as represented by those attending IASEAI in February, I believe this is not a correct assessment. Since I began working on AI safety, I have heard many cautious and uncertainty-aware statements along the line that the things you claim people believe will almost certainly happen are merely likely enough to worry deeply and work on preventing them. I also don't see that community having an AI-centric worldview – they seem to worry about many other cause areas as well such as inequality, war, pandemics, climate.

zeshen🔸Oct 222

Agreed - I should've made it clearer in the title that I was referring specifically to the AI safety people in EA, i.e. this excludes other EAs not in AI safety, and also excludes other non-EAs in AI safety.

Tax GeekOct 184

Thanks for this post. As someone who has only recently started exploring the field of AI safety, much of this resonates with my initial impressions. I would be interested to hear the counterpoints from those who have Disagree-voted on this post.

Do you think the "very particular worldview" you describe is found equally among those working on technical AI safety and AI governance/policy? My impression is that policy inherently requires thinking through concrete pathways of how AGI would lead to actual harm as well as greater engagement with people outside of AI safety.

I have also noticed a split between the "superintelligence will kill us all" worldview (which you seem to be describing) and "regardless of whether superintelligence kills us all, AGI/TAI will be very disruptive and we need to manage those risks" (which seemed to be more along the lines of the Will MacAskill post you linked to - especially as he talks about directing people to causes other than technical safety or safety governance). Both of these worldviews seem prominent in EA. I haven't gotten the impression that the superintelligence worldview is stronger, but perhaps I just haven't gotten deep enough into AI safety circles yet.

zeshen🔸Oct 188

I would be interested to hear the counterpoints from those who have Disagree-voted on this post.

Likewise!

Do you think the "very particular worldview" you describe is found equally among those working on technical AI safety and AI governance/policy? My impression is that policy inherently requires thinking through concrete pathways of how AGI would lead to actual harm as well as greater engagement with people outside of AI safety.

I think they're quite prevalent regardless. While some people's roles indeed require them to analyze concrete pathways more than others, the foundation of their analysis is often implicitly built upon this worldview in the first place. The result is that their concrete pathways tend to be centred around some kind of misaligned AGI, just in much more detail. Conversely, someone with a very different worldview who does such an analysis might end up with concrete pathways centred around severe discrimination of marginalized groups.

I have also noticed a split between the "superintelligence will kill us all" worldview (which you seem to be describing) and "regardless of whether superintelligence kills us all, AGI/TAI will be very disruptive and we need to manage those risks" (which seemed to be more along the lines of the Will MacAskill post you linked to - especially as he talks about directing people to causes other than technical safety or safety governance).

There are indeed many different "sub-worldviews", and I was kind of lumping them all under one big umbrella. To me, the most defining characteristic of this worldview is AI-centrism, and treating the impending AGI as an extremely big deal — not just like any other big deals we have seen before, but this will be unprecedented. Those within this overarching worldview would differ in terms of the details, e.g. will it kill everyone? or will it just lead to gradual disempowerment? are LLMs getting us to AGI? or is it some yet-to-be-discovered architecture? should we focus on getting to AGI safely? or start thinking more about the post-AGI world? I think many people move between these "sub-worldviews" as they see evidences that update their priors, but way fewer people move out of this overarching worldview entirely.