Bio

There are critical gaps in the accessibility and affordability of mental health services worldwide: In some countries, you have to wait for years for therapy, in others you get at max one session per month covered by the health insurance, in others therapies for particular conditions like cluster B personality disorders are virtually nonexistent.

We want to leverage LLMs to fill these gaps and complement regular therapy. Our product is in development. We've based it on Gemini and want it to interface with widely used messaging apps, so users can interact with it like they would with a friend or coach.

I’ve previously founded or worked for several charities and spent a few years in earning to give for work on invertebrate welfare and s-risks from AI.

You can get up to speed on my thinking at Impartial Priorities.

Sequences
2

Impact Markets
Researchers Answering Questions

Comments
588

My current practical ethics

The question often comes up how we should make decisions under epistemic uncertainty and normative diversity of opinion. Since I need to make such decisions every day, I had to develop a personal system, however inchoative, to assist me.

A concrete (or granite) pyramid

My personal system can be thought of like a pyramid.

  1. At the top sits some sort of measurement of success. It's highly abstract and impractical. Let's call it the axiology. This is really a collection of all axiologies I relate to, including the amount of frustrated preferences and suffering across our world history. This also deals with hairy questions such as how to weigh Everett branches morally and infinite ethics.
  2. Below that sits a kind of mission statement. Let's call it the ethical theory. It's just as abstract, but it is opinionated about the direction in which to push our world history. For example, it may desire a reduction in suffering, but for others this floor needn't be consequentialist in flavor.
  3. Both of these abstract floors of the pyramid are held up by a mess of principles and heuristics at the ground floor level to guide the actual implementation.

The ground floor

The ground floor of principles and heuristics is really the most interesting part for anyone who has to act in the world, so I won't further explain the top two floors. 

The principles and heuristics should be expected to be messy. That is, I think, because they are by necessity the result of an intersubjective process of negotiation and moral trade (positive-sum compromise) with all the other agents and their preferences. (This should probably include acausal moral trades like Evidential Cooperation in Large Worlds.)

It should also be expected to be messy because these principles and heuristics have to satisfy all sorts of awkward criteria:

  1. They have to inspire cooperation or at least not generate overwhelming opposition.
  2. They have to be easily communicable so people at least don't misunderstand what you're trying to achieve and call the police on you. Ideally so people will understand your goal well enough that they want to join you.
  3. They have to be rapidly actionable, sometimes for split second decisions.
  4. They have to be viable under imperfect information.
  5. They have to be psychologically sustainable for a lifetime.
  6. They have to avoid violating laws.
  7. And many more.

Three types of freedom

But really that leaves us still a lot of freedom (for better or worse):

  1. There are countless things that we can do that are highly impactful and hardly violate anyone's preferences or expectations.
  2. There are also plenty of things that don't violate any preferences or expectations once we get to explain them.
  3. Finally, there are many opportunities for positive-sum moral trade.

These suggest a particular stance toward other activists:

  1. If someone is trying to achieve the same thing you're trying to achieve, maybe you can collaborate.
  2. If someone is trying to achieve something other than what you're trying to achieve, but you think their goals are valuable, don't stand in their way. In particular, it may sometimes feel like doing nothing (to further or hinder their cause) is a form of “not standing in their way.” But if your peers are actually collaborating with them to some extent, doing nothing (or collaborating less) can cause others to also reduce their collaboration and can prevent key threshold effects from taking hold. So the true neutral position is to try to understand how much you need to collaborate toward the valuable goal so it would not have been achieved sooner without you. This is usually very cheap to do and has a chance to get runaway threshold effects rolling
  3. If someone is trying to achieve something that you consider neutral, the above may still apply to some extent because perhaps you can still be friends. And for reasons of Evidential Cooperation in Large Worlds. (Maybe you'll find that their (to you) neutral thing is easy to achieve here and that other agents like them will collaborate back elsewhere where your goal is easy to achieve.)
  4. Finally, if someone is trying achieve something that you disapprove of… Well, that's not my metier, temperamentally, but this is where compromise can generate gains from moral trade.

Very few examples

In my experience, principles and heuristics are best identified by chatting with friends and generalizing from their various intuitions.

  1. Charitable donations are total anarchy. Mostly, you can just donate wherever the fluff you want, and (unless you're Open Phil) no one will throw stones through your windows in retaliation. You can just optimize directly for your goals – except, Evidential Cooperation in Large Worlds will still make strong recommendations here, but what they are is still a bit underexplored.
  2. Even if you're not an animal welfare activist yourself, you're still well-advised to cooperate with behavior change to avert animal suffering to the extent expected by your peers. (And certainly to avoiding inventing phony reasons to excuse your violation of these expectations. These might be even more detrimental to moral progress and rationality waterline.)
  3. If you want to spend time with someone but they behave outrageously unempathetically toward you or someone else (e.g., say something like “Your suffering is nothing compared to the suffering of X” to their face), you should rather cut all ties with them even though, strictly speaking, this does not imply that no positive-sum trade is possible with them.
  4. Trying to systematically put people in powerful positions can arouse suspicion and actually make it harder to put people in powerful positions. Trying to systematically put people into the sorts of positions they find fulfilling might put as many people in powerful positions and make their lives easier too. (Or training highly conscientious people in how to dare to accept responsibility so it's not just those who don't care who self-select into powerful positions.)
  5. And hundreds more…

Various non-consequentialist ethical theories can come in handy here to generate further useful principles and heuristics. That is probably because they are attempts at generalizing from the intuitions of certain authors, which puts them almost on par (to the extent to which these authors are relateable to you) with generalizations from the intuitions of your friends.

(If you find my writing style hard to read, you can ask Claude to rephrase the message into a style that works for you.)

[Same as what I replied to your DM:] Yeah, exactly! Systems like that were on my mind a lot around 2021–23 when we launched Impact Markets, which is now GiveWiki. You could do the payouts if humanity has survived for another year, has survived AGI, has survived a year with AGI, etc. You could also chain retrofunders with different time horizons. We tried to get something like this off the ground for about 2 years, but couldn't find any retrofunders anymore after FTX collapsed. (Some of the regrantors were interested.)

https://impactmarkets.substack.com/p/chaining-retroactive-funders

If you can find a retrofunder for a system like that, you can make it happen!

Hiii! Thanks! I'm only speaking for myself here, and I'm mostly interested in #3, or specifically in building, testing, and rolling out an AI-based tool for this rather than an RCT.

2. Consider the challenges of distribution and funding.

Yeah, working directly with the likes of Google (Gemini) and others would be swag, but correct me if I'm wrong, I see a very low chance of that working out? There is little commercial incentive in it for them, it doesn't help them gain more market share from their competitors because our target clients can't pay much, reputational risks similar to self-driving cars, etc. I haven't asked anyone who works there, but I'm not sufficiently optimistic that it could work out to attempt it… Besides, if it does work out and lots of people start using Gemini for therapy, and then Google redecides and closes that department again, lots of users will use new version of a product for a purpose for which it's not tested or optimized anymore.

But I already built an alpha version of an app for mentalization-based treatment on top of Gemini. That’s super easy, and I'll permanently have control over the instructions and possibly the fine-tuning. If it should turn out to be too risky, I can shut it down, or more likely I can make adjustments to minimize any new risks.

Do you think I overestimate the difficulty of working with the model providers?

4. Doing too many things

The topics can probably be trimmed down a bit, but I feel like #1–3 form a nice story line where we first assess the risks, the assess the opportunities, and then exploit them? Personally, I'd rather 80/20 all of that by rolling out my solution only to fairly stable people first (I'm in some relevant support groups), collect feedback, poll well-being measures from time to time, and react to any problems with safety (in the feedback) or lacking effectiveness (well-being measures) along the way, while I increasingly market it to wider audiences. The others might want to take this more slowly, and as a result they'll probably have the better data, but when that data is in, I can still optimize my tool accordingly.

Do you think it would really be better to focus on one topic only or would you agree that merging and 80/20ing is the better approach?

They model that, and after, I think, 1661 iterations of the human-AI trade game, the human-AI trade game accumulates enough wealth for humans that it would've been self-defeating for the humans to defect like that. I think it's still a Nash equilibrium but one where the humans give up perfectly good gains from trade. (Plus blockchain tech can make it hard to confiscate property.)

by promoting revanchism and ethnic hatred

Hiii! Do you know someplace where I can read up on that? Ty!

Oops, already subscribed. xD I'll try to pay more attention! Ty!

Just learned about it! Do you have a newsletter or something so I can try to be quicker next year? I'll be in Berlin around that time in case someone wants to meet up before or after! 

What has been your personal take-away from this line of thinking? This “standard case” is far from my own thinking, though I agree with the conclusion. Is it also far from your own thinking?

My take:

  1. My axiology depends on unknown empirical results from further research into such things as Evidential Cooperation in Large Worlds, but at first approximation I'm skeptical that anything that runs counter to widely shared goals (or convergent drives, e.g., self-preservation) can be particularly good.
  2. I endorse antisubstratism and AI rights.
  3. I value reducing suffering very highly because I think that suffering is frequently very intense.
  4. Bio-humans will have an extremely hard time surviving and reproducing in space or on other planets in our solar system and will have an extremely hard time reaching other solar systems.
  5. The Long Reflection is necessary but impossible.
  6. They key feature of a being that risks that it can experience suffering is reinforcement learning. Perhaps negative reinforcement can feel very mild or just like the absence of positive reinforcement, but perhaps it's intense.

So what I'm afraid will happen is that an artificial RL agent will seek out resources first elsewhere in our solar system and then elsewhere in the galaxy (something that would be difficult for bio-humans), will run into communication delays due to the lightspeed limit, and will hence split into countless copies, each potentially capable of suffering. Soon they'll be separated so far that even updates on what it means to be value-aligned would travel for a long time, so there'll be moral “drift” in countless directions.

What I would find reassuring is:

  1. Making sure AIs want to minimize the risk of suffering of other AIs, or at least near copies.
  2. Research into how we can measure and minimize the suffering in RL (at least to the point where it's mild and the pleasure dominates) and some way of applying that research broadly. Sadly it seems questionable to me that a low-suffering training regime is the most efficient training regime.
  3. Faster-than-light communication but not transportation so the agent can remain monolithic.
  4. Very energy efficient AIs, because they'll face a tradeoff between staying monolithic to avoid value drift but only being able to harvest energy sources in our solar system or even just from the sun vs. the opposite. And the more they can do with little energy, the longer they'll stay in our solar system, which might buy us and them time.
  5. More time to get anyone interested in these things, research them, and apply them on a global scale before the lightspeed limits make it hard to disseminate the information.

Human extinction also seems bad on the basis that it contradicts the self-preservation drive that many/most humans have. Peaceful disenfranchisement may be less concerning depending on the details. But at the moment it seems random where we're headed in the coming years because hardly anyone in power is trying to steer these things in any sensible way. Again more time would be helpful.

Basic rights for AIs (and standing in court!) could also provide them with a legal recourse where they currently have to resort to threats, making the transition more likely to go smoothly, like you argue in another post. Currently we're nowhere close to having those. Again more time would be helpful.

I used the comment field in the form to note that a field in the form was marked as optional when it was actually mandatory. That comment got automatically published here, and out of context it made no sense whatsoever. I think it would've been clearer to not automatically transfer this form feedback here (some people might've even assumed that it's private feedback).

Load more