Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it's right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It's been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I've gotten a few comments from philosophers saying they found the list helpful or that they'll think about working on some of the problems, but I'm not aware of any concrete follow-ups.
If it is some kind of cultural or structural problem, it might be even higher leverage to work on solving that, instead of object level philosophical problems. I'd try to do this myself, but as an outsider to academic philosophy and also very far from any organizations who might potentially hire philosophers to work on AI alignment, it's hard for me to even observe what the problem might be.
Interesting re belief in hell being a key factor, I wasn't thinking about that.
It seems like the whole AI x-risk community has latched onto "align AI with human values/intent" as the solution, with few people thinking even a few steps ahead to "what if we succeeded"? I have a post related to this if you're interested.
possibly the future economy will be so much more complicated that it will still make sense to have some distributed information processing in the market rather than have all optimisation centrally planned
I think there will be distributed information processing, but each distributed node/agent will be a copy of the central AGI (or otherwise aligned to it or shares its values), because this is what's economically most efficient, minimizes waste from misaligned incentives and so on. So there won't be the kind of value pluralism that we see today.
I assume we won't be able to know with high confidence in advance what economic model will be most efficient post-ASI.
There's probably a lot of other surprises that we can't foresee today. I'm mostly claiming that post-AGI economics and governance probably wont look very similar to today's.
Why do you think this work has less value than solving philosophical problems in AI safety?
From the perspective of comparative advantage and counterfactual impact, this work does not seem to require philosophical training. It seems to be straightforward empirical research, that many people could do, besides the very few professionally trained AI-risk-concerned philosophers that humanity has.
To put it another way, I'm not sure that Toby was wrong to work on this, but if he was, it's because if he hadn't, then someone else with more comparative advantage for working on this problem (due to lacking training or talent for philosophy) would have done so shortly afterwards.
While I appreciate this work being done, it seems a very bad sign for our world/timeline that the very few people with both philosophy training and an interest in AI x-safety are using their time/talent to do forecasting (or other) work instead of solving philosophical problems in AI x-safety, with Daniel Kokotajlo being another prominent example.
This implies one of two things: Either they are miscalculating the best way to spend their time, which indicates bad reasoning or intuitions even among humanity's top philosophers (i.e., those who have at least realized the importance of AI x-risk and are trying to do something about it). Or they actually are the best people (in a comparative advantage sense) available to work on these other problems, in which case the world must be on fire, and they're having to delay working on extremely urgent problems that they were trained for, to put out even bigger fires.
(Cross-posted to LW and EAF.)
The ethical schools of thought I'm most aligned with—longtermism, sentientism, effective altruism, and utilitarianism—are far more prominent in the West (though still very niche).
I want to point out that the ethical schools of thought that you're (probably) most anti-aligned with (e.g., that certain behaviors and even thoughts are deserving of eternal divine punishment) are also far more prominent in the West, proportionately even more so than the ones you're aligned with.
Also the Western model of governance may not last into the post-AGI era regardless of where the transition starts. Aside from the concentration risk mentioned in the linked post, driven by post-AGI economics, I think different sub-cultures in the West breaking off into AI-powered autarkies or space colonies with vast computing power, governed by their own rules, is also a very scary possibility.
I'm pretty torn and may actually slightly prefer a CCP-dominated AI future (despite my family's past history with the CCP). But more importantly I think both possibilities are incredibly risky if the AI transition occurs in the near future.
Whereas it seems like maybe you think it's convex, such that smaller pauses or slowdowns do very little?
I think my point in the opening comment does not logically depend on whether the risk vs time (in pause/slowdown) curve is convex or concave[1], but it may be a major difference in how we're thinking about the situation, so thanks for surfacing this. In particular I see 3 large sources of convexity:
Like: putting in the schlep to RL AI and create scaffolds so that we can have AI making progress on these problems months earlier than we would have done otherwise
I think this kind of approach can backfire badly (especially given human overconfidence), because we currently don't know how to judge progress on these problems except by using human judgment, and it may be easier for AIs to game human judgment than to make real progress. (Researchers trying to use LLMs as RL judges apparently run into the analogous problem constantly.)
having governance set up such that the most important decision-makers are actually concerned about these issues and listening to the AI-results that are being produced
What if the leaders can't or shouldn't trust the AI results?
I'm trying to coordinate with, or avoid interfering with, people who are trying to implement an AI pause or create conditions conducive to a future pause. As mentioned in the grandparent comment, one way people like us could interfere with such efforts is by feeding into a human tendency to be overconfident about one's own ideas/solutions/approaches.
A couple more thoughts on this.
I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level. To be more specific:
I'm worried that by creating (or redirecting) a movement to solve these problems, without noting at an early stage that these problems may not be solvable in a relevant time-frame (without a long AI pause), it will feed into a human tendency to be overconfident about one's own ideas and solutions, and create a group of people whose identities, livelihoods, and social status are tied up with having (what they think are) good solutions or approaches to these problems, ultimately making it harder in the future to build consensus about the desirability of pausing AI development.
Perhaps the most important question is whether you support a restriction on space colonization (completely or to a few nearby planets) during the Long Reflection. Unrestricted colonization seems good from a pure pro-natalist perspective, but bad from an optionalist perspective, as it makes much more likely that if anti-natalism (or adjacent positions like there should be strict care or controls over what lives can be brought into existence) is right, some of the colonies will fail to reach the correct conclusion and go on to colonize the universe in an unrestricted way, thus making humanity as a whole unable to implement the correct option.
If you do support such a restriction, then I think we agree on "the highest order bits" or the most important policy implication of optionalism, but probably still disagree on what is the best population size during the Long Reflection, which may be unresolvable due to our differing intuitions. I think I probably have more sympathy for anti-natalist intuitions than you do (in particular that most current lives may have negative value and people are mistaken about this), and worry more that creating negative-value lives and/or bringing lives into existence without adequate care could constitute a kind of irreversible or irreparable moral error. Unfortunately I do not see a good way to resolve such disagreements at our current stage of philosophical progress.
Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.
Any other names you can cite?
Thanks, this makes sense to me, and my follow-up is how concerning do you think this situation is?
One perspective I have is that at this point, several years into a potential AI takeoff, with AI companies now worth trillions in aggregate, alignment teams at AI companies still have virtually no professional philosophical oversight (or outside consultants that they rely on), and are kind of winging it based on their own philosophical beliefs/knowledge. It seems rather like trying to build a particle collider or fusion reaction with no physicists on the staff, only engineers.
(Or worse, unlike engineers' physics knowledge, I doubt that receiving a systematic education in fields like ethics and metaethics is a hard requirement for working as an alignment researcher. And even worse, unlike the situation in physics, we don't even have settled ethics/metaethics/metaphilosophy/etc. that alignment researchers can just learn and apply.)
Maybe the AI companies are reluctant to get professional philosophers involved, because in the fields that do have "professional philosophical oversight", e.g., bioethics, things haven't worked out that well. (E.g. human challenge trials being banned during COVID.) But to me, this would be a signal to yell loudly that our civilization is far from ready to attempt or undergo an AI transition, rather than a license to wing it based on one's own philosophical beliefs/knowledge.
As an outsider, the situation seems cray alarming to me, and I'm confused that nobody else is talking about it, including philosophers like you who are in the same overall space and looking at roughly the same things. I wonder if you have a perspective that makes the situation not quite as alarming as it appears to me.