Yarrow Bouchard 🔸

1345 karmaJoined Canadastrangecosmos.substack.com

Bio

Pronouns: she/her or they/them. 

Parody of Stewart Brand’s whole Earth button.

I got interested in effective altruism back before it was called effective altruism, back before Giving What We Can had a website. Later on, I got involved in my university EA group and helped run it for a few years. Now I’m trying to figure out where effective altruism can fit into my life these days and what it means to me.

Sequences
2

Criticism of specific accounts of imminent AGI
Skepticism about near-term AGI

Comments
643

Topic contributions
2

If you took this seriously, in 2011 you'd have had no basis to trust GiveWell (quite new to charity evaluation, not strongly connected to the field, no credentials) over Charity Navigator (10 years of existence, considered mainstream experts, CEO with 30 years of experience in charity sector).

Well, no. Because I did hold that view very seriously (as I still do) in the late 2000s and early 2010s, and I came to trust GiveWell.

Charity Navigator doesn't even claim to evaluate cost-effectiveness; they don't do cost-effectiveness estimates.

Even prior to GiveWell, there were similar ideas kicking around. A clunky early term that was used was 'philanthrocapitalism' (which is a mouthful and also ambiguous). It meant that charities should seek an ROI in terms of impact like businesses do in terms of profit.

Back in the day, I read the development economist William Easterly's blog Aid Watch (a project of NYU's Development Research Institute) and he called it something like the smart aid movement, or the smart giving movement. 

The old blog is still there in the Wayback Machine, but the Wayback Machine doesn't allow for keyword search, so it's hard to track down specific posts.

I had forgotten until I just went spelunking in the archive that William Easterly and Peter Singer had a debate in 2009 about global poverty, foreign aid, and charity effectiveness. The blog post summary says that even though it was a debate and they disagreed on things, they agreed on recommendations to donate to some specific charities.

My point here is that charity effectiveness had been a public conversation involving aid experts like Easterly going back a long time. You never would have taken away from this public conversation that you should pay attention to something like Charity Navigator rather than something like GiveWell.

In the late 2000s and early 2010s, what international development experts would have told you to look at Charity Navigator?

This feels like a Motte ("skeptical of any claim that an individual or a group is competent at assessing research in any and all extant fields of study") and Bailey (almost complete deference with deference only decreasing with formal education or credentials). GiveWell obviously never claimed to be experts in much beyond GHW charity evaluation.

I might have done a poor job getting across what I'm trying to say. Let me try again.

What I mean is that, in order for a person or a group of people to avoid deferring to experts in a field, they would have to be competent at assessing research in that field. And maybe they are for one or a few fields, but not all fields. So, at some point, they have to defer to experts on some things — on many things, actually. 

What I said about this wasn't intended as a commentary on GiveWell — sorry for the confusion. I think GiveWell's approach was sensible. They realized that competently assessing the relevant research on global poverty/global health would be a full-time job, and they would need to learn a lot, and get a lot of input from experts — and still probably make some big mistakes. I think that's an admirable approach, and the right way to do it. 

I think this is quite different from spending a few weeks researching covid and trying to second-guess expert communities, rather than just trying to find out what the consensus views among expert communities are. If some people in EA had decided in, say, 2018 to start focusing full-time on epidemiology and public health, and then started weighing in on covid-19 in 2020 — while actively seeking input from experts — that would have been closer to the GiveWell approach. 

This sounds like outcome bias to me, i.e., believing in retrospect that a decision was the right one because it happened to turned out well. For example, if you decide to drive home drunk and don't crash your car, you could believe based on outcome bias that that was the right decision. 

There may also be some hindsight bias going on, where in retrospect it's easier to claim that something was absolutely, obviously the right call, when, in fact, at the time, based on the available evidence, the optimally rational response might have been to feel a significant amount of uncertainty. 

I don't know if you're right that someone taking their advice from Gregory Lewis in this interview would have put themselves more at risk. Lewis said it was highly uncertain (as of mid-April 2020) whether medical masks were a good idea for the general population, but to the extent he had an opinion on it at the time, he was more in favour than against. He said it was highly uncertain what the effect of cloth masks would ultimately turn out to be, but highlighted the ambiguity of the research at the time. There was a randomized controlled trial that found cloth masks did worse than the control, but many people in the control group were most likely wearing medical masks. So, it's unclear.

The point he was making with the cloth masks example, as I took it, was simply that although he didn't know how the research was ultimately going to turn out, people in EA were missing stuff that experts knew about and that was stated in the research literature. So, rather than engaging with what the research literature said and deciding based on that information, people in EA were drawing conclusions from less complete information.

I don't know what Lewis' actual practical recommendations were at the time, or if he gave any publicly. It would be perfectly consistent to say, for example, that you should wear medical masks as a precaution if you have to be around people and to say that the evidence isn't clear yet (as of April 2020) that medical masks are helpful in the general population. As Lewis noted in the interview, the problem isn't with the masks themselves, it's that medical professionals know how to use them properly and the general population doesn't. So, how does that cash out into advice?

You could decide: ah, why bother? I don't even know if masks do anything or not. Or you could think: oh, I guess I should really make sure I'm wearing my mask right. What am I supposed to do...? 

Similarly, with cloth masks, when it became better-known how much worse cloth masks were than medical masks, everyone stopped wearing cloth masks and started wearing KN95, N95 masks, or similar. If someone's takeaway from that April 2020 interview with Lewis was that the efficacy of cloth masks was unclear and there's a possibility they might even turn out to be net harmful, but that medical masks work really well if you use them properly, again, they could decide at least two different things. They could think: oh, why bother wearing any mask if my cloth mask might even do more harm than good? Or they could think: wow, medical masks are so much better than cloth masks, I really should be wearing those instead of cloth masks.

Conversely, people in EA promoting cloth masks might have done more harm than good depending on, well, first of all, if anyone listened to the advice in the first place, but, second, on whether any people who did listen decided to go for cloth masks rather than no mask or decided to wear cloth masks instead of medical masks.

Personally, my hunch is that if very early on in the pandemic (like March or April 2020), there had been less promotion of cloth masks, and if more people had been told the evidence looked much better for medical masks than for cloth masks (slight positive overall story for medical masks, ambiguous evidence for cloth masks, with a possibility of them even making things worse), then people would have really wanted to switch from cloth masks to medical masks — because this is what happened later on when the evidence came in and it was much clearer that medical masks were far superior. 

The big question mark above that is I don't know/don't remember at what point the supply chain was able to provide enough medical masks for everybody. 

I looked into it and found a New York Times article from April 3, 2020 that discusses a company in Chicago that had a large supply of KN95 masks, although this is still in the context of providing masks to hospitals:

One Chicago-based company, iPromo, says it has been in the KN95 importing business for a month. It had previously developed relationships with Chinese supplies for its main business, churning out custom logo-adorned promotional knickknacks like mugs, water bottles, USB flash drives and small containers of hand sanitizer.

The company’s website advertises KN95 masks at $2.96 apiece for hospitals, with delivery in five to seven days, although its minimum order is 1,000 masks.

More masks are available because coronavirus transmission in China has been reduced. “They have so much stock,” said Leo Friedman, the company’s chief executive, during an interview Thursday. “They ramped up and now it’s a perfect storm of inventory.”

I found another source that says, "Millions of KN95 masks were imported between April 3 and May 7 and many are still in circulation." But this is also still in the context of hospitals, not the general public. That's in the United States.

I found a Vice article from July 13, 2020 that implies by that point it was easy for anyone to buy KN95 masks. Similarly, starting on or around June 30, 2020, there were vending machines selling KN95 masks in New York City subway stations. Although apparently KN95 masks were for sale in vending machines in New York as early as May 29.

In any case:

  1. It seems quite plausible that pushing people toward medical masks rather than cloth masks sooner would have been beneficial.
  2. Even if people who don't understand the science and don't know how to properly read the research occasionally stumble into the correct conclusion from time to time through sheer luck, that doesn't imply this is the best strategy over a long timespan, over a large sample size of examples. 

but the actual challenges were usually closer to a reflexive dismissal

I don't know the specific, actual criticisms of GiveWell you're referring to, so I can't comment on them — how fair or reasonable they were.

My point is more abstract: just that, in general, it is fair to be to challenge non-experts who are trying to do serious work in area outside of their expertise. It is a challenge that anyone in the position of the GiveWell founders should gladly and willingly accept, or else they're not up to the job. 

Reputation, trust, and credibility in an area where you are a neophyte is not a right owed to you automatically. It's something you earn by providing evidence that you trustworthy, credible, and deserve a good reputation.

We can often just look at object-level work, study research & responses to the research, and make up our mind. Credentials are often useful to navigate this, but not always necessary.

This is hazy and general, so I don't know what you specifically mean by it. But there are all kinds of reasons that non-experts are, in general, not competent to assess the research on a topic. For example, they might be unacquainted with the nuances of statistics, experimental designs, and theories of underlying mechanisms involved in studies on a certain topic. Errors or caveats that an expert would catch might be missed by an amateur. And so on. 

I am extremely skeptical of any claim that an individual or a group is competent at assessing research in any and all extant fields of study, since this would seem to imply that individual or group possesses preternatural abilities that just aren't realistic given what we know about human limitations. I think the sort of Tony Stark or Sherlock Holmes general-purposes geniuses of fiction are only fictional. But even if they existed, we would know who they are, and they would have a litany of objectively impressive accomplishments. 

Are you sure that no-one with any credibility thinks UFOs may be extraterrestrial spacecraft?

Yes.

To clarify, are you saying that, in retrospect, the process through which people in EA did research on epidemiology, public health, and related topics looks any better to you now that it looked to you back in April 2020 when you did this interview? 

I think I understand your point that it would probably be nearly impossible to score the conclusions in a way that people in EA would agree is convincing or fair — there's tons of ambiguity and uncertainty, hence tons of wiggle room. (I hope I'm understanding that right.)

But in the April 2020 interview, you said that many of these conclusions were akin to calling a coin flip. Crudely, many interventions that experts were still debating could be seen as roughly having a 50-50 chance of being good or bad (or maybe it's anywhere from 70-30 to 30-70, doesn't really matter), so any conclusion that an intervention is good or bad has a roughly 50-50 chance of being right. You said a stopped clock is right twice a day, and it may turn out that Donald Trump got some things right about the pandemic, but if so, it will be through dumb luck rather than good science. 

So, I'm curious: leaving aside the complicated and messy question of scoring the conclusions, do you now think the EA community's approach to the science — particularly, the extent to which they wanted to do it themselves, as non-experts, rather than just trying to find the expert consensus on any given topic, or even seeing if any expert would talk to them about it (e.g. in 2020, you suggested some names of experts to have on the 80,000 Hours Podcast) — was any less bad than you saw it in 2020?

Fair enough, but it seems more like a cool, fun coding project in the realm of science communication, rather than a prediction or some sort of original scientific research or analysis that generated new insights. 

The infectious disease doctor interviewed for the Smithsonian Magazine article about microCovid said that microCovid is a user-friendly, clearly explained version of tools that already existed within the medical profession. So, that’s great, that’s useful, but it’s not a prediction or an original insight. It’s just good science communication and good coding. 

The article also mentions two other similar risk calculators designed for use by the public. One of the calculators mentioned, Mathematica’s 19 and Me calculator, was released on or around May 11, 2020, more than 3 months before microCovid. I was able to find a few other risk calculators that were released no later than mid-May 2020. So, microCovid wasn’t even a wholly original idea, although it may have been differentiated from those previous efforts in some important ways.

When people say that LessWrong called covid early or was right about covid, what they mean is that LessWrong made correct predictions or had correct opinions about the pandemic (not by luck or chance, but by superior rationality) that other people didn’t make or didn’t have. And they say this in the context of providing reasons why the LessWrong community’s views or predictions on other topics should be trusted or taken seriously. 

microCovid, as nice a thing as it may be, does not support either of those ideas.

I think when you look at the LessWrong community’s track record on covid-19, there is just no evidence to support this flattering story that the community tells about itself. 

Okay. I actually watched the TikTok. That shoulda been step 1 — I committed the cardinal sin of commenting without watching. (My previous comment was more responding to the screenshotted comments, based on my past experience with leftist discourse on TikTok and Twitter.)

The TikTok is 100% correct. The creator’s points and arguments are absolutely correct. Every factual claim she makes is correct. The video is extremely reasonable, fair-minded, and even-handed. The creator is eloquent, perceptive, and clearly very intelligent. She comes across as earnest, sincere, kind, open-minded, and well-meaning. I really liked her brief discussion of Strangers Drowning. Just from this brief video, I already feel some fondness toward her. Based on this first impression, I like her.

If I still had a TikTok account, I would give video a like.

Her exegesis of Peter Singer’s parable of the drowning child is really, really good — quick, breezy, and straight to the point, in a way that should be the envy of any explainer. The only part that was a question mark for me was her use of the term "extreme utilitarians". It’s not exactly inaccurate, though, and it does get the point across, so, now that I’m thinking about it, I guess it’s actually fine. Come to think of it, if I were trying to explain this idea casually to a friend or an acquaintance or a general audience, I might use a similar phrase like "hardcore utilitarians" or something. 

It isn’t a technical term, but she is referring to the extreme personal sacrifice some people will go through for their moral views, or people who take moral views to more of an extreme than the typical person will (even probably the typical utilitarian or the typical moral philosopher).

Her suspicion of the emotional motivations of people in EA who have pivoted from what tends to be more boring, humble, and sometimes gruelling work in global poverty to high-paying, sexy, glamorous, luxurious, fun, exciting work in AI safety is incredibly perceptive and just a really great point. I have said (and others have said) similar things in the past, and even so, the way she said it was so clear and perceptive that I feel I now better understand the point I was trying to make because she said it (and thought it) better. So, kudos to her on that.

I would say your instinct should not be to treat this as a PR or marketing or media problem, or to want to leap into the fray to provide a "counternarrative". I would say this is actually just perceptive, substantive, eloquently expressed criticism or skepticism. I think the appropriate response is to take it a substantive argument or point.

There are many things people in EA could do if they wanted to do more to establish the credibility of AI safety for a wider audience or for mainstream society. Doing vastly more academic publishing on the topic is one idea. People are right not to take seriously ideas only written on blogs, forums, Twitter, or in books that don’t go through any more rigour or academic review than the previous three mediums. Science and academia provide a blueprint for how to establish mainstream credibility of obscure technical ideas.

I’m sure there are other good ideas out there too. For example, why not get more curious about why AI safety critics, skeptics, and dissenters disagree? Why not figure out their arguments, engage deeply, and respond to them? This could be in informal mediums and not through academic publishing. I think it would be a meaningful step toward persuasion. It’s kind of embarrassing for AI safety that it’s fairly easy for critics and skeptics to lob up plausible-sounding objections to the AI safety thesis/worldview and there isn’t really a convincing (to me, and to many others) response. Why not do the intellectual work, first, and focus on the PR/marketing later?

Something that would go a long way for me, personally, toward establishing at least a bit more good faith and credibility would be if AI safety advocates were willing to burn bad arguments that don’t make sense. For instance, if an AI safety advocate were willing to concede the fundamental, glaring flaws in AI 2027 or Situational Awareness, I would personally be willing to listen to them more carefully and take them more seriously. On the other hand, if someone can’t acknowledge that this is an atrocious, ridiculous graph, then I sort of feel like I can safely ignore what they say, since overall they haven’t demonstrated to me a level of seriousness, credibility, or reasonableness that I would feel is needed if it’s going to be worthwhile for me to engage with their ideas.

Right now, whatever the best arguments in AI safety are, it feels like they’re all lumped in with the worst arguments, and it’s hard for me not to judge it all based on the worst arguments. I imagine this will be a recurring problem if AI safety tries to gain more mainstream, widespread acceptance. If like 10% of people in EA were constantly talking about how great homeopathy is and is and how it’s curing all their ailments, and how foolish the medical and scientific establishment is for saying it’s just a placebo, would you be as willing to take EA arguments about pandemic risk seriously? Or would you just figure that this community doesn’t know what it’s talking about? That’s the situation for me with AI safety, and I’m sure others feel the same way, or would if they encountered AI safety ideas from an initial position of reasonable skepticism.

Those are just my first 2-3 ideas. Other people could probably brainstorm others. Overall, I think the intellectual work is lacking. More marketing/PR work would either fail or deserve to fail (even if it succeeded), in my view, because the intellectual foundation isn’t there yet.

Is this a response to what Gregory Lewis said? I don’t think I understand. 

Maybe this is subtle/complicated… Are the examples you’re citing the actual consensus views of experts? Or are they examples of governments and institutions like the World Health Organization (WHO) misunderstanding what the expert conensus and/or misrepresenting the expert consensus to the public? 

This excerpt from the Nature article you cited makes it sound like the latter:

The change brings the WHO’s messaging in line with what a chorus of aerosol and public-health experts have been trying to get it to say since the earliest days of the outbreak. Many decry the agency’s slowness in stating — unambiguously — that SARS-CoV-2 is airborne. Interviews conducted by Nature with dozens of specialists on disease transmission suggest that the WHO’s reluctance to accept and communicate evidence for airborne transmission was based on a series of problematic assumptions about how respiratory viruses spread.

Did random members of the EA community — as bright and eager as they might be — with no prior education, training, or experience with relevant fields like public health, epidemiology, virology, and medicine outsmart the majority of relevant experts on this question (airborne vs. not) or any others, not through sheer luck or chance, but by actually doing better research? This is a big claim, and if it is to be believed, it needs strong evidentiary support. 

In this Nature article, there is an allegation that the WHO wasn’t sufficiently epistemically modest or deferential to the appropriate class of experts:

Other criticisms are that the WHO relies on a narrow band of experts, many of whom haven’t studied airborne transmission, and that it eschews a precautionary approach that could have protected countless people in the early stages of the pandemic.

And there is an allegation around communications, rather than around the science itself:

Having shifted its position incrementally over the past two years, the WHO also failed to adequately communicate its changing position, they say. As a result, it didn’t emphasize early enough and clearly enough the importance of ventilation and indoor masking, key measures that can prevent airborne spread of the virus.

So, let’s say there’s another pandemic. Which is the better strategy? 

Strategy A: Read forum posts and blog posts by people in the EA community doing original research and opining on epidemiology, virology, and public health who have never so much cracked open a relevant textbook.

Strategy B: Survey sources of expert opinion, including publications like Nature, open letters written on behalf of expert communities, statements by academic and scientific organizations and so on, to determine if a particular institution like the WHO is accurately communicating the majority view of experts, or if they’re a weird outlier adhering to a minority view, or just communicating the science badly. 

I would say the Nature article is support for Strategy B and not at all support for Strategy A. 

You could even interpret it as evidence against Strategy A. If you believe the criticism in Nature is right, even experts in an adjacent field or subfield, who have prestigious credentials like advising the WHO, can get things catastrophically wrong by being insufficiently epistemically modest and not deferring enough to the experts who know the most about a subject, and who have done the most research on it. If this is true, then that should make you even more skeptical about how reliable the research and recommendations will be from a non-expert blogger with no relevant education who only started learning about viruses and pandemics for the first time a few weeks ago.

What is described in the Nature piece sound like incredibly subtle mistakes (if we can for sure call them mistakes at this point). Lewis’ critique of the EA community is that it was making incredibly obvious, elementary mistakes. So, why think the EA community can outperform experts on avoiding subtle mistakes if it can’t even avoid the obvious mistakes?

One of the recent examples of hubris I saw on the EA Forum was someone asserting that they (or the EA community at large) could resolve, within the next few months/years, the fundamental uncertainty around the philosophical assumptions that go into cost-effectiveness estimates comparing shrimp welfare and human welfare. Out of the topics I know well, the philosophy of consciousness might be #1, or at least near the top. I don’t know how to convey the level of hubris that comment betrays. 

It would be akin to saying that a few people in EA, with no training in physics, could figure out, in a few months or years, the correct theory of quantum gravity and reconcile general relativity and quantum mechanics. Or that a few people in EA, with no training in biology or medicine, would be able to cure cancer within a few months or years. Or that, with no background in finance, business, or economics, that they’d be able to launch an investment fund that consistently, sustainably achieves more alpha than the world’s top-performing investment funds every year. Or that, with no background in engineering or science, they’d be able to beat NASA and SpaceX in sending humans to Mars.  

In other words, this is a level of hubris that’s unfathomable, and just not a serious or credible way to look at the world or your own abilities, in the absence of strong, clear evidence that you possess even average abilities relative to the relevant expert class. 

I don’t know the first thing about epidemiology, virology, public health, or medicine. So, I can’t independently evaluate how appropriate or correct it is/was for Gregory Lewis to be so aggravated by the EA community’s initial response to covid-19 that he considered distancing himself from the movement. I can believe that Lewis might be correct because a) he has the credentials, b) the way he’s describing it is how it essentially always turns out when non-experts think they can outsmart experts in a scientific or medical field without first becoming experts themselves, and c) in areas where I do know enough to independently evaluate the plausibility of assertions made by people in the EA community on the object level, I feel as infuriated and incredulous as Lewis described feeling in that 80,000 Hours interview. 

I see these sort of wildly overconfident claims about being able to breezily outsmart experts on difficult scientific, philosophical, or technical problems as moderately, but not dramatically, more credible than the people talking about UFOs or ESP or whatever. (Which apparently is not that rare.)

I see a general rhetorical or discursive strategy employed across many people with fringe views, be they around pseudoscience, fake medicine, or conspiracy theories. First, identify some scandal or blunder or internecine conflict within some scientific expert community. Second, say, “Aha! They’re not so smart, after all!” Third, use this as support for whatever half-cocked pet theory you came up with. This is obviously a logically invalid argument, as in, obviously the conclusion does not logically follow from the premises. The standard for scientific experts should not be perfection; the standard for amateurs, dilettantes, and non-expert iconoclasts should be showing they can objectively do better than the average expert — not on a single coin flip, but on an objective, unbiased measure of overall performance.

There is a long history in the LessWrong community of opposition to institutional science, with the typical amount of intellectual failure that usually comes with opposition to institutional science. There is a long history of hyperconfidently scorning expert consensus and being dead wrong. Obviously, there is significant overlap between the LessWrong community and the EA community, and significant influence by the former on the latter. What I fear is that this anti-scientific attitude and undisciplined iconoclasm has become a mainstream, everyday part of the EA community, in a way that was not true, or at least not nearly as true, in my experience, in the early-to-mid-2010s

The obvious rejoinder is: if you really can objectively outperform experts in any field you care to try your hand at for a few weeks, go make billions of dollars right now, or do any other sort of objectively impressive thing that would provide evidence for the idea that you have the abilities you think you do. Surely, within a few months or years of effort, you would have something to show for it. LessWrong has been around for a long time, EA has been around for a long time. There’s been plenty of time. What’s the excuse for why people haven’t done this yet?

And, based on base rates, what would you say is more likely: people being misunderstood iconoclastic self-taught geniuses who are on the cusp of greatness or people just being overly confident based on a lack of experience and a lack of understanding of the problem space? 

I don’t think any of these are personality traits. These are ideas or strategies that people can discuss and decide whether they’re wise or unwise. You could, conceivably, have a discussion about one or more of these, become convinced that the way you’ve been doing things is unwise, and then change your behaviour subsequently. I wouldn’t call that "changing your personality". I don't see why these would be stable traits, as opposed to things that people can change by thinking about it and deciding to act differently. 

I think there might be serious problems with the ideas or strategies that you described, if those were the ideas or strategies at play in EA. But my feeling is you gave a bit of a watered-down, euphemistic retelling of the ideas and strategies than what I tend to see people in EA actually act on, or what they tend to say they believe. 

For instance, on covid-19, it seems like some people in EA still think (as evidenced by the comments on this post) that they actually repeatedly outsmarted the expert/scientific communities on the relevant public health questions — not just by chance or luck, in a "broken clock is right twice a day" or "calling a coin flip" way, but by general superior rationality/epistemology — rather than following a much more epistemically modest, cautious rationale of we "might be able to contribute something meaningful, and doing so is both low-effort and very good if it works, so worth the shot". 

I don't buy that thinking this way is a stable personality trait that is beyond your power to change as opposed to something that you can be talked out of. 

It seems weird to call any of these things personality traits. Is being an act consequentialist as opposed to a rule consequentialist a personality trait? Obviously not, right? It seems equally obvious to me that what we're talking about here are not personality traits. It seems equally weird to me to call them personality traits as it would be to call subscribing to rule consequentialism a personality trait.

Load more