Wei Dai

Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.

I think some have given up philosophy to work on other things such as AI alignment.

Any other names you can cite?

In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince them to spend the time to take it seriously etc.

Thanks, this makes sense to me, and my follow-up is how concerning do you think this situation is?

One perspective I have is that at this point, several years into a potential AI takeoff, with AI companies now worth trillions in aggregate, alignment teams at AI companies still have virtually no professional philosophical oversight (or outside consultants that they rely on), and are kind of winging it based on their own philosophical beliefs/knowledge. It seems rather like trying to build a particle collider or fusion reaction with no physicists on the staff, only engineers.

(Or worse, unlike engineers' physics knowledge, I doubt that receiving a systematic education in fields like ethics and metaethics is a hard requirement for working as an alignment researcher. And even worse, unlike the situation in physics, we don't even have settled ethics/metaethics/metaphilosophy/etc. that alignment researchers can just learn and apply.)

Maybe the AI companies are reluctant to get professional philosophers involved, because in the fields that do have "professional philosophical oversight", e.g., bioethics, things haven't worked out that well. (E.g. human challenge trials being banned during COVID.) But to me, this would be a signal to yell loudly that our civilization is far from ready to attempt or undergo an AI transition, rather than a license to wing it based on one's own philosophical beliefs/knowledge.

As an outsider, the situation seems cray alarming to me, and I'm confused that nobody else is talking about it, including philosophers like you who are in the same overall space and looking at roughly the same things. I wonder if you have a perspective that makes the situation not quite as alarming as it appears to me.

How Well Does RL Scale?

Wei Dai2d4

Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it's right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It's been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I've gotten a few comments from philosophers saying they found the list helpful or that they'll think about working on some of the problems, but I'm not aware of any concrete follow-ups.

If it is some kind of cultural or structural problem, it might be even higher leverage to work on solving that, instead of object level philosophical problems. I'd try to do this myself, but as an outsider to academic philosophy and also very far from any organizations who might potentially hire philosophers to work on AI alignment, it's hard for me to even observe what the problem might be.

How good would a CCP-dominated AI future be?

Wei Dai5d7

Interesting re belief in hell being a key factor, I wasn't thinking about that.

It seems like the whole AI x-risk community has latched onto "align AI with human values/intent" as the solution, with few people thinking even a few steps ahead to "what if we succeeded"? I have a post related to this if you're interested.

possibly the future economy will be so much more complicated that it will still make sense to have some distributed information processing in the market rather than have all optimisation centrally planned

I think there will be distributed information processing, but each distributed node/agent will be a copy of the central AGI (or otherwise aligned to it or shares its values), because this is what's economically most efficient, minimizes waste from misaligned incentives and so on. So there won't be the kind of value pluralism that we see today.

I assume we won't be able to know with high confidence in advance what economic model will be most efficient post-ASI.

There's probably a lot of other surprises that we can't foresee today. I'm mostly claiming that post-AGI economics and governance probably wont look very similar to today's.

How Well Does RL Scale?

Wei Dai6d3

Why do you think this work has less value than solving philosophical problems in AI safety?

From the perspective of comparative advantage and counterfactual impact, this work does not seem to require philosophical training. It seems to be straightforward empirical research, that many people could do, besides the very few professionally trained AI-risk-concerned philosophers that humanity has.

To put it another way, I'm not sure that Toby was wrong to work on this, but if he was, it's because if he hadn't, then someone else with more comparative advantage for working on this problem (due to lacking training or talent for philosophy) would have done so shortly afterwards.

How Well Does RL Scale?

Wei Dai7d16

While I appreciate this work being done, it seems a very bad sign for our world/timeline that the very few people with both philosophy training and an interest in AI x-safety are using their time/talent to do forecasting (or other) work instead of solving philosophical problems in AI x-safety, with Daniel Kokotajlo being another prominent example.

This implies one of two things: Either they are miscalculating the best way to spend their time, which indicates bad reasoning or intuitions even among humanity's top philosophers (i.e., those who have at least realized the importance of AI x-risk and are trying to do something about it). Or they actually are the best people (in a comparative advantage sense) available to work on these other problems, in which case the world must be on fire, and they're having to delay working on extremely urgent problems that they were trained for, to put out even bigger fires.

(Cross-posted to LW and EAF.)

How good would a CCP-dominated AI future be?

Wei Dai7d44

The ethical schools of thought I'm most aligned with—longtermism, sentientism, effective altruism, and utilitarianism—are far more prominent in the West (though still very niche).

I want to point out that the ethical schools of thought that you're (probably) most anti-aligned with (e.g., that certain behaviors and even thoughts are deserving of eternal divine punishment) are also far more prominent in the West, proportionately even more so than the ones you're aligned with.

Also the Western model of governance may not last into the post-AGI era regardless of where the transition starts. Aside from the concentration risk mentioned in the linked post, driven by post-AGI economics, I think different sub-cultures in the West breaking off into AI-powered autarkies or space colonies with vast computing power, governed by their own rules, is also a very scary possibility.

I'm pretty torn and may actually slightly prefer a CCP-dominated AI future (despite my family's past history with the CCP). But more importantly I think both possibilities are incredibly risky if the AI transition occurs in the near future.

Effective altruism in the age of AGI

Wei Dai11d*9

Whereas it seems like maybe you think it's convex, such that smaller pauses or slowdowns do very little?

I think my point in the opening comment does not logically depend on whether the risk vs time (in pause/slowdown) curve is convex or concave^[1], but it may be a major difference in how we're thinking about the situation, so thanks for surfacing this. In particular I see 3 large sources of convexity:

The disjunctive nature of risk / conjunctive nature of success. If there are N problems that all have to solved correctly to get a near-optimal future, without losing most of the potential value of the universe, then that can make the overall risk curve convex or at least less concave. For example compare f(x) = 1 - 1/2^(1 + x/10) and f^4.
Human intelligence enhancements coming online during the pause/slowdown, with each maturing cohort potentially giving a large speed boost for solving these problems.
Rationality/coordination threshold effect, where if humanity makes enough intellectual or other progress to subsequently make an optimal or near-optimal policy decision about AI (e.g., realize that we should pause AI development until overall AI risk is at some acceptable level, or something like this but perhaps more complex involving various tradeoffs), then that last bit of effort or time to get to this point has a huge amount of marginal value.

Like: putting in the schlep to RL AI and create scaffolds so that we can have AI making progress on these problems months earlier than we would have done otherwise

I think this kind of approach can backfire badly (especially given human overconfidence), because we currently don't know how to judge progress on these problems except by using human judgment, and it may be easier for AIs to game human judgment than to make real progress. (Researchers trying to use LLMs as RL judges apparently run into the analogous problem constantly.)

having governance set up such that the most important decision-makers are actually concerned about these issues and listening to the AI-results that are being produced

What if the leaders can't or shouldn't trust the AI results?

^{^}
I'm trying to coordinate with, or avoid interfering with, people who are trying to implement an AI pause or create conditions conducive to a future pause. As mentioned in the grandparent comment, one way people like us could interfere with such efforts is by feeding into a human tendency to be overconfident about one's own ideas/solutions/approaches.

Effective altruism in the age of AGI

Wei Dai19d14

A couple more thoughts on this.

Maybe I should write something about cultivating self-skepticism for an EA audience, in the meantime here's my old LW post How To Be More Confident... That You're Wrong. (On reflection I'm pretty doubtful these suggestions actually work well enough. I think my own self-skepticism mostly came from working in cryptography research in my early career, where relatively short feedback cycles, e.g. someone finding a clear flaw in an idea you thought secure or your own attempts to pre-empt this, repeatedly bludgeon overconfidence out of you. This probably can't be easily duplicated, unlike the post suggests.)
I don't call myself an EA, as I'm pretty skeptical of Singer-style impartial altruism. I'm a bit wary about making EA the hub for working on "making the AI transition go well" for a couple of reasons:
1. It gives the impression that one needs to be particularly altruistic to find these problems interesting or instrumental.
2. EA selects for people who are especially altruistic, which from my perspective is a sign of philosophical overconfidence. (I exclude people like Will who have talked explicitly about their uncertainties, but think EA overall probably still attracts people who are too certain about a specific kind of altruism being right.) This is probably fine or even a strength for many causes, but potentially a problem in a field that depends very heavily on making real philosophical progress and having good philosophical judgment.

Effective altruism in the age of AGI

Wei Dai20d80

I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level. To be more specific:

Solving AI welfare may depend on having a good understanding of consciousness, which is a notoriously hard philosophical problem.
Concentration of power may be structurally favored by the nature of AGI or post-AGI economics, and defy any good solutions.
Defending against AI-powered persuasion/manipulation may require solving metaphilosophy, which judging from other comparable fields, like meta-ethics and philosophy of math, may take at least multiple decades to do.

I'm worried that by creating (or redirecting) a movement to solve these problems, without noting at an early stage that these problems may not be solvable in a relevant time-frame (without a long AI pause), it will feed into a human tendency to be overconfident about one's own ideas and solutions, and create a group of people whose identities, livelihoods, and social status are tied up with having (what they think are) good solutions or approaches to these problems, ultimately making it harder in the future to build consensus about the desirability of pausing AI development.

The Gift of Life

Wei Dai24d4

Perhaps the most important question is whether you support a restriction on space colonization (completely or to a few nearby planets) during the Long Reflection. Unrestricted colonization seems good from a pure pro-natalist perspective, but bad from an optionalist perspective, as it makes much more likely that if anti-natalism (or adjacent positions like there should be strict care or controls over what lives can be brought into existence) is right, some of the colonies will fail to reach the correct conclusion and go on to colonize the universe in an unrestricted way, thus making humanity as a whole unable to implement the correct option.

If you do support such a restriction, then I think we agree on "the highest order bits" or the most important policy implication of optionalism, but probably still disagree on what is the best population size during the Long Reflection, which may be unresolvable due to our differing intuitions. I think I probably have more sympathy for anti-natalist intuitions than you do (in particular that most current lives may have negative value and people are mistaken about this), and worry more that creating negative-value lives and/or bringing lives into existence without adequate care could constitute a kind of irreversible or irreparable moral error. Unfortunately I do not see a good way to resolve such disagreements at our current stage of philosophical progress.

Wei Dai

Bio

Posts 7

Comments275

Posts
7

Comments
275