RS

Rohin Shah

4473 karmaJoined

Bio

Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.

I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.

In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.

http://rohinshah.com

Comments
472

I'm totally on board with "if the broader world thought more like EAs that would be good", which seems like the thrust of your comment. My claim was limited to the directional advice I would give EAs.

I don’t know how much the FTX collapse is responsible for our current culture. They did cause unbelievable damage, acting extremely unethically and unilaterally and recklessly in destructive ways. But they did have this world-scale ambition, and urgency, and proclivity to actually make things happen in the world, that I think central EA orgs and the broader EA community sorely lack in light of the problems we’re hoping to solve. 

But this is exactly why I don't want to encourage heroic responsibility (despite the fact that I often take on that mindset myself). Empirically, its track record seems quite bad, and I'd feel that way even if you ignore FTX.

Like, my sense is that something along the lines of heroic responsibility causes people to:

  • Predictably bite off more than they can chew, and have massively reduced impact as a result
    • If 100 people each solved 1% of a problem, you'd be in a good place. Instead, 100 EAs with heroic responsibility each try to solve 100% of the problem, and each solve 0.01% of it, and then you still have 99% left. (And in practice I expect many also move backwards.)
  • Leave a genuinely impactful role because they can't see how it will solve everything (and then go on to something not as good)
  • Cut corners due to increased urgency and responsibility, that leads to worse outcomes, because actually those corners were important
  • Underestimate the value of conventional wisdom
    • E.g. undervaluing the importance of management, ops, process, and maintenance, because it's hard to state a clear, legible theory of change for them that is as potentially-high-upside as something like research
  • Trick themselves into thinking a bet is worth taking ("if this has even a 1% chance of working, it would be worth it" but actually the chance is more like 0.0001%)

To be clear in some sense these are all failures of epistemics, in that if you have sufficiently good epistemics then you wouldn't make any of these mistakes even if you were taking on heroic responsibility. But in practice humans are enough of an epistemic mess that I instead think that it's better to just not adopt heroic responsibility and instead err more in the direction of "the normal way to do things".

In fact, all of the top 7 most sought-after skills were related to management or communications.

"Leadership / strategy" and "government and policy expertise" are emphatically not management or communications. There's quite a lot of effort on building a talent pipeline for "government and policy expertise". There isn't one for "leadership / strategy" but I think that's mostly because no one knows how to do it well (broadly speaking, not just limited to EA).

If you want to view things through the lens of status (imo often a mistake), I think "leadership / strategy" is probably the highest status role in the safety community, and "government and policy expertise" is pretty high as well. I do agree that management / communications are not as high status as the chart would suggest they should be, though I suspect this is mostly due to tech folks consistently underestimating the value of these fields.

Applicant A started out wanting to be a researcher. They did MATS before becoming an AI Safety researcher. By gaining enough research experience they were promoted to a research manager.

Applicant B always wanted to be a manager. They got an MBA from a competitive business school and worked their way into becoming a people manager in a tech company. Midway through their career they discover AI Safety and decide they want to make a career transition.

If I were hiring for a manager and somehow had to choose between only these two applicants with only this information, I would choose applicant A. (Though of course the actual answer is to find better applicants and/or get more information about them.)

I can always train applicant A to be an adequate people manager (and have done so in the past). I can't train applicant B to have enough technical understanding to make good prioritization decisions.

(Relatedly, at tech companies, the people managers often have technical degrees, not MBAs.)

in many employers’ eyes they would not look as value aligned as someone who did MATS, something which is part of a researcher’s career path anyway.

I've done a lot of hiring, and I suppose I do look for "value alignment" in the sense of "are you going to have the team's mission as a priority", but in practice I have a hard time imagining how any candidate who actually was mission aligned could somehow fail to demonstrate it. My bar is not high and I care way more about other factors. (And in fact I've hired multiple people who looked less "EA-value aligned" than either applicants A or B, I can think of four off the top of my head.)

It's possible that other EA hiring cares more about this, but I'd weakly guess that this is a mostly-incorrect community-perpetuated belief.

(There is another effect which does advantage e.g. MATS -- we understand what MATS is, and what excellence at it looks like. Of the four people I thought of above, I think we plausibly would have passed over 2-3 of them in a nearby world where the person reviewing their resume didn't realize what made them stand out.)

I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs.

I somewhat disagree but I agree this is plausible. (That was more of a side point, maybe I shouldn't have included it.)

most people really really don't want to die in the next ten years

Is your claim that they really really don't want to die in the next ten years, but they are fine dying in the next hundred years? (Else I don't see how you're dismissing the anti-aging vs sports team example.)

So, for x-risk to be high, many people (e.g. lab employees, politicians, advisors) have to catastrophically fail at pursuing their own self-interest.

Sure, I mostly agree with this (though I'd note that it can be a failure of group rationality, without being a failure of individual rationality for most individuals). I think people frequently do catastrophically fail to pursue their own self-interest when that requires foresight.

Most people really don’t want to die, or to be disempowered in their lifetimes. So, for existential risk to be high, there has to be some truly major failure of rationality going on. 

... What is surprising about the world having a major failure of rationality? That's the default state of affairs for anything requiring a modicum of foresight. A fairly core premise of early EA was that there is a truly major failure of rationality going on in the project of trying to improve the world.

Are you surprised that ordinary people spend more money and time on, say, their local sports team, than on anti-aging research? For most of human history, aging had a ~100% chance of killing someone (unless something else killed them first).

If you think the following claim is true - 'non-AI projects are never undercut but always outweighed'

Of course I don't think this. AI definitely undercuts some non-AI projects. But "non-AI projects are almost always outweighed in importance" seems very plausible to me, and I don't see why anything in the piece is a strong reason to disbelieve that claim, since this piece is only responding to the undercutting argument. And if that claim is true, then the undercutting point doesn't matter.

We are disputing a general heuristic that privileges the AI cause area and writes off all the others.

I think the most important argument towards this conclusion is "AI is a big deal, so we should prioritize work that makes it go better". But it seems you have placed this argument out of scope:

[The claim we are interested in is] that the coming AI revolution undercuts the justification for doing work in other cause areas, rendering work in those areas useless, or nearly so (for now, and perhaps forever).

[...]

AI causes might be more cost-effective than projects in other areas, even if AI doesn’t undercut those projects’ efficacy. Assessing the overall effectiveness of these broad cause areas is too big a project to take on here.

I agree that lots of other work looks about as valuable as it did before, and isn't significantly undercut by AI. This seems basically irrelevant to the general heuristic you are disputing, whose main argument is "AI is a big deal so is way more important".

I agree with some of the points on point 1, though other than FTX, I don't think the downside risk of any of those examples is very large

Fwiw I find it pretty plausible that lots of political action and movement building for the sake of movement building has indeed had a large negative impact, such that I feel uncertain about whether I should shut it all down if I had the option to do so (if I set aside concerns like unilateralism). I also feel similarly about particular examples of AI safety research but definitely not for the field as a whole.

Agree that criticisms of AI companies can be good, I don't really consider them EA projects but it wasn't clear that was what I was referring to in my post

Fair enough for the first two, but I was thinking of the FrontierMath thing as mostly a critique of Epoch, not of OpenAI, tbc, and that's the sense in which it mattered -- Epoch made changes, afaik OpenAI did not. Epoch is at least an EA-adjacent project.

Sign seems pretty negative to me.

I agree that if I had to guess I'd say that the sign seems negative for both of the things you say it is negative for, but I am uncertain about it, particularly because of people standing behind a version of the critique (e.g. Habryka for the Nonlinear one, Alexander Berger for the Wytham Abbey one, though certainly in the latter case it's a very different critique than what the original post said).

I think I stand by the claim that there aren't many criticisms that clearly mattered, but this was a positive update for me.

Fwiw, I think there are probably several other criticisms that I alone could find given some more time, let alone impactful criticisms that I never even read. I didn't even start looking for the genre of "critique of individual part of GiveWell cost-effectiveness analysis, which GiveWell then fixes", I think there's been at least one and maybe multiple such public criticisms in the past.

I also remember there being a StrongMinds critique and a Happier Lives Institute critique that very plausibly caused changes? But I don't know the details and didn't follow it

I'm not especially pro-criticism but this seems way overstated.

Almost all EA projects have low downside risk in absolute terms

I might agree with this on a technicality, in that depending on your bar or standard, I could imagine agreeing that almost all EA projects (at least for more speculative causes) have negligible impact in absolute terms.

But presumably you mean that almost all EA projects are such that their plausible good outcomes are way bigger in magnitude than their plausible bad outcomes, or something like that. This seems false, e.g.

  • FTX
  • Any kind of political action can backfire if a different political party gains power
  • AI safety research could be used as a form of safety washing
  • AI evaluations could primarily end up as a mechanism to speed up timelines (not saying that's necessarily bad, but certainly under some models it's very bad)
  • Movement building can kill the movement by making it too diffuse and regressing to the mean, and by creating opponents to the movement
  • Vegan advocacy could polarize people, such that factory farming lasts longer than it would be default (e.g. if cheap and tasty substitutes would have caused people to switch over if they weren't polarized)

There are almost no examples of criticism clearly mattering

I'd be happy to endorse something like "public criticism rarely causes an organization to choose to do something different in a major org-defining way" (but note that's primarily because people in a good position to change an organization through criticism will just do so privately, not because criticism is totally ineffective).

Of course, it's true that they could ignore serious criticism is they wanted to, but my sense is that people actually quite often feel unable to ignore criticism.

As someone sympathetic to many of Habryka's positions, while also disagreeing with many of Habryka's positions, my immediate reaction to this was "well that seems like a bad thing", c.f.

shallow criticism often gets valorized

I'd feel differently if you had said "people feel obliged to take criticism seriously if it points at a real problem" or something like that, but I agree with you that the mechanism is more like "people are unable to ignore criticism irrespective of its quality" (the popularity of the criticism matters, but sadly that is only weakly correlated with quality).

Load more