RN

richard_ngo

7953 karmaJoined

Bio

Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.blogspot.com

Sequences
2

Replacing Fear
EA Archives Reading List

Comments
355

Hmm, I think you're overstating the disanalogy here. In the case of the individual life, you say "the reasons to trust "that some process designed their body and mind to function well" are relatively strong". But we also have strong reasons to trust that some process designed our cooperative instincts to allow groups of humans to cooperate effectively.

I also think that many individuals need to decide how to make their lives go well in pretty confusing circumstances. Imagine deciding whether to immigrate to America in the 1700s, or how to live in the shadow of the Cold War, or whether to genetically engineer your children. There's a lot of stuff that's unprecedented, and which you only get one shot at.

Re the experience of AI safety so far: I certainly do think that a bunch of actions taken by the AI safety movement have backfired. I also think that a bunch have succeeded. If you think of the AI safety movement as a young organism going through its adolescence, it's gaining a huge amount of experience from all of these interactions with the world—and the net value of the things that it's done so far may well be dominated by what updates it makes based on that experience.

I guess that's where we disagree. It seems like you think that we should update towards being clueless. Whereas I think we can extract a bunch of generalizable lessons that make us less clueless than we used to be—and one of those lessons is that many of these mistakes could have been prevented by using the right kinds of non-model-based decision-making.

EDIT: another way of trying to get at the crux: do you think that, if we had a theory of sociopolitics that was about as good as 20th-century economics, then we wouldn't be clueless about how to do sociopolitical interventions (like founding AI safety movements) effectively?

FWIW the reasons you're giving here are closely related to the reasons why I'm sceptical that modern AI-focused EA is in fact as good.

In hindsight I shouldn't have used the phrase "what made EA good", since by this point I'm skeptical about both the AI safety version and the "original spirit" of EA. I guess that makes me one of the people you're describing who joined a long time ago (in my case, over a decade) and have now disengaged.

I do think that int/a is less likely than EA to be significantly harmful, and I'm excited about that. Whether or not it has a decent chance of actually doing something meaningful will depend a lot on the vision of the founders (and Euan in particular). Right now I'm not seeing what will prevent it from dissolving into the background of general hippie-adjacent things (kinda like a lot of the Game B and metacrisis stuff seems to have done). But we'll see.

I have some caution around pushing to generate precise object-level claims that "define int/a", in that you have to believe these claims to be part of it.

Yeah, I phrased it badly when I said that the movement should be pinning down claims. I'm not suggesting that you use these claims to define membership. Indeed, even the framing of your original post feels too "we are a group defined by believing the same things" for my taste (as compared with, say, "we're some collaborators with similar intellectual/emotional/ethical stances").

But I'm excited about you (and the others you mention in this post) writing about the things you personally think the EA worldview gets wrong, ideally not just engaging with how the movement turned out in practice but the broken philosophical assumptions that led to practical mistakes.

As one example, EAs constantly use "value-aligned" as a metric of who to ally with. But it seems pretty plausible to me that SBF was extremely value-aligned with most of the stated philosophical principles of EA. The problem was that he wasn't value-aligned with the background ethics of society that EA mostly takes for granted. Understanding this deeply enough I think would lead to you reconceptualize the whole concept of "value-aligned" towards things more reminiscent of int/a (in a way that would then have implications for e.g. what moral theories to believe, what alignment targets to aim AIs at, etc).

Yeah, I expected as much. Though as per my comment above, I'm much more concerned about representation of thought leaders. A better proxy for intellectual diversity is something like "are the few people from each of these clusters who are the biggest critics of the consensus view invited?" E.g. for the Pause AI cluster that'd probably be Holly; for the MIRI cluster that'd probably be Yudkowsky and Habryka; for the academic ML cluster that'd probably be Dan Hendrycks; for the sociopolitical safety cluster that'd probably be Ben Hoffman and Michael Vassar.

I don't know exactly who was invited but I expect that the Summit gets a medium score on this metric: not great, not terrible.

IMO you should think of global health/factory farming etc as one paradigm of EA—which did focus on cost effectiveness—and AI safety as a different paradigm in which the concept of cost-effectiveness is simply not very useful, for a few reasons (see also this related comment):

  • Talent bottlenecks are a far bigger obstacle than financial bottlenecks, and you can't buy talent. Often you can't even spend money to persuade talent—e.g. the AI safety community ended up convincing several of the most influential AI researchers of AGI risk, and even they mostly don't seem to be able to think clearly about the issue (e.g. it's hard to imagine a coherent strategy behind SSI).
  • Insofar as there are financial bottlenecks, it's mostly because the biggest funder is ideologically and politically constrained, and because the trust networks in AI safety aren't robust enough to distribute most of the money available. This will be only more true as Anthropic equity becomes liquid.
  • There's an extreme principal-agent problem where not only is it difficult for funders to tell who will do good research in advance, but it's even difficult for them to tell what was good research in hindsight.
  • As you mention, a lot of the action is in figuring out how not to be net-negative.
  • It's very hard to cash out what AI safety is even trying to achieve in terms of metrics of cost-effectiveness. Once you start talking about the transhuman future, then almost every metric you could come up with is better optimized via weird futuristic stuff than simply "keeping humans alive".

Re "it's pretty easy to show that it's orders of magnitude more cost-effective than GiveDirectly": the kinds of reasoning that one might use to show that are pretty similar to e.g. the kinds that a communist could use to show that a proletarian revolution is orders of magnitude more cost-effective than GiveDirectly. In other words, it's mostly reasoning within the confines of a worldview, and then slapping on a "cost-effectiveness" framing at the end, rather than having the cost-effectiveness part of the reasoning be load-bearing in any meaningful way.

FWIW I think it's historically pretty incorrect that the grounding in cost-effectiveness is what made EA good. E.g. insofar as you think that AI safety is valuable, reasoning about cost-effectiveness actually cut against EA's ability to pivot towards that. Instead, the thing that helped most was something like intellectual openness.

Let's limit ourselves for a moment to someone who wants to make their own life go well. They have various instinctive responses that were ingrained in them by evolution—e.g. a disgust response to certain foods, or a sense of anger at certain behavior. Should they follow those instincts even when they don't know why evolution instilled them? Personally I think they often should, and that this is an example of rational non-model-based decision-making. (Note that this doesn't rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)

Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.

How does this relate to cluelessness? Mostly I don't really know what the term means, it's not something salient in my ontology. I don't feel clueless about how to have a good life, and I don't feel clueless about how the long-term future should be better, and these two things do in fact seem analogous. In both cases the idea of avoiding pointless internal conflict (and more generally, becoming wiser and more virtuous in other ways) seems pretty solidly good. (Also, the evolution thing isn't central, you have similar dynamics with e.g. intuitions you've learned at a subconscious level, or behaviors that you've learned via reinforcement.)

Another way of thinking about it is that, when you're a subagent within a larger agent, you can believe that "playing your role" within the larger agent is good even when the larger agent is too complex for you to model well (i.e. you have Knightian uncertainty about it).

And yet another way of thinking about it is "forward-chaining often works, even when you don't have a back-chained reason why you think it should work".

Cool project :) There's definitely something very important in the rough direction you're pointing. Some thoughts on how to gain more clarity on it:

  • I suspect that it'd be worth your time to think a bunch about the relationship between altruism and ethics. In some sense, I think of ethics (and particularly virtue ethics) as already a kind of "integral altruism"—i.e. ethics as a set of principles and heuristics by which we can remain in integrity with ourselves and others, thereby allowing our compassion to actually make the world better.
  • I think that the hippie/metamodern/etc communities are very good at some aspects of ethics, but quite bad at others. In general they tend to err on the side of agreeableness, rather than e.g. being honest about unpleasant truths. It feels valuable to take this broad worldview and then try to add a bunch of moral courage that it's currently missing (analogous to how you can think of EA as adding moral courage to econ-brained thinking).
    • However, I feel pretty confused about how to actually help people aim their moral courage towards being ethical, since IMO neither EA nor most inner work helps much with this. One litmus test that I use to evaluate whether inner work is actually making people braver is whether they're more willing to break political taboos afterwards (e.g. for people in the UK, by making a fuss about the Pakistani rape gangs); however, this seldom comes up positive. Another litmus test is whether they're more willing to face the possibility of physical violence when appropriate (e.g. when a crazy person is being a bit menacing in public, do they still just look away?). These are just illustrative examples but hopefully they point at what I think is missing by default.
  • The stuff on cluelessness feels like it's conceding a little too much to the EA/bayesian frame. It's implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that it's sensible and even "rational" to make non-model-based decisions (e.g. via heuristics, intuitions, etc). Some other terms that might be better: bounded rationality, group agency, Knightian uncertainty. I sometimes use "distributed agency" or "coalitional agency", but I think they won't make sense to most of your readers.
  • The problem with stuff like systems thinking & complexity science is that it's not really aiming to make the same kind of scientific progress as sciences like physics or evolutionary biology have made. More generally, it seems easy for movements like integral altruism to fall into the trap of not pinning down core ideas and claims. But insofar as integral altruism is true, it suggests that something important about the expected utility maximization paradigm is false, which someone should pin down. In other words, imagine that someone from the 22nd century comes back and tells you that something like integral altruism was actually scientifically/mathematically correct. What's the version of integral altruism that actually leads to you figuring that out?

I filled in your form, and am excited to see where you take this!

As an event focused on x-risk, yes, I think this is fair.

This seems like a misinterpretation of Jan's point. There are multiple intellectual clusters which at least claim to care about x-risk which aren't well-described as the "EA/Constellation/Trajan House" cluster. The main ones which come to mind are:

  • The MIRI cluster
  • The Pause AI cluster
  • The academic ML cluster
  • The multi-agent/sociopolitical safety cluster (which isn't very well-defined right now but I'd put both Jan and myself in this, broadly speaking)
  • The Anthropic cluster (which e.g. is more positive on racing than the EA/Constellation cluster, though I'm not claiming that there's a coherent intellectual worldview behind this)

I would only describe a few people in each cluster as actual thought leaders or key thinkers. So compared with Jan my concern is less about who gets invited, and more that sampling any gathering as large as the Summit averages together responses from people with too wide a range of levels of leadership to be accurately described as "AI safety leaders".

Seems very odd that this comment has 6 "helpful" votes. It's clearly very unhelpful for actually resolving the disagreement, the only way it's helpful is in providing an authority figure's reassurance.

Load more