Seeking Feedback: An Initiative on AI, Mental Health, and Alignment

Gina Hafez; Zoé Roy-Stang; Dawn Drescher; Scott Blain; Mark Rootenberg; Angie Hsu; Anand Jeevanandham

Executive Summary

We are the AI Mental Health Initiative (AIMHI), a working group of therapists, AI professionals, and psychology researchers exploring the intersection of AI and mental health. With Large Language Models (LLMs) like ChatGPT already functioning as what may be the current “largest provider of mental health support in the United States,” we see an urgent need to address the ethical and practical challenges this presents. Our group is focused on four key projects: investigating the ethics of using AI for therapeutic effects, identifying harms that lack commercial incentives for solutions, testing the effectiveness of AI mental health support for underserved populations, and facilitating greater communication between the fields of psychology and AI Safety. We are seeking feedback from the community on our direction and priorities.

Who We Are and What We’re Focused On

We are a working group of about eight people under EA Mental Health, composed of licensed therapists, psychology researchers, AI professionals, and other interested contributors. Our focus is on the ethical and appropriate use of Artificial Intelligence for mental health support.

The landscape is changing rapidly. A recent study suggests that ChatGPT may be the current “largest provider of mental health support in the United States” The high accessibility and affordability of LLMs have drawn many people to use them for an empathetic ear. While AI exhibits weaknesses, and models like ChatGPT maintain they are only a supportive accountability partner, that hasn’t stopped people from using them for therapeutic purposes, as shown in a recent Sentio survey.

This reality, combined with the existence of vast underserved populations globally, has prompted us to explore this domain. We’ve identified four primary projects for our work.

Our Rationale: A Pragmatic Approach

Our approach is a pragmatic one. We recognize the potential risks inherent in using AI for therapeutic purposes. However, we also see its widespread adoption as an irreversible trend. Rather than resisting this change, we believe it is more productive to engage with it proactively. Our goal is to help shape its trajectory, pushing for research and safety measures that will allow people to use these emerging tools more safely and effectively.

Clarifying Our Terms

For the purpose of our work, we use the following definitions:

Client: Any individual seeking mental health support, whether from human practitioners or AI tools.
AI tools: An umbrella term for Large Language Models (LLMs), chatbots, and other applications that use AI to generate therapeutic responses.
Service provider: A general term encompassing both human psychotherapists and AI tools that deliver therapeutic content.
Therapy & therapeutic effect: We define therapy narrowly as the verbal or textual communication between a client and a service provider aimed at improving the client’s self-reported well-being. A therapeutic effect is a measured increase in that well-being. We focus on verbal content because it is the only common metric across in-person therapy, telehealth, and AI-based support. This allows us to measure efficacy by the outcome (well-being) rather than the specific methods used.
Well-being: We rely on self-reported measures of well-being. We acknowledge this is a complex construct and that AI developers will need to navigate potential trade-offs when optimizing for different forms – such as short-term hedonic well-being (pleasure vs. suffering) versus long-term eudaemonic well-being (related to value, meaning, and purpose).

Our Four Projects

1. Synthesizing the Ethics of Using Conversational AI for Therapeutic Effects

A core question for us is whether the widespread use of AI for mental health support is a net positive. If the alternative for many is no treatment at all, our cautious inclination is that AI can be a helpful tool. There is even some data indicating it can occasionally perform on par with trained crisis workers. However, the ethical questions give us pause. Before moving forward, we aim to thoroughly investigate the potential downsides.

Key areas of concern include:

Privacy: General-purpose apps do not guarantee the level of privacy expected from therapy (many expressly do record the chat logs).
Effectiveness: How effective is AI support compared to traditional methods or no treatment?
Dependence: Could users develop an unhealthy dependence on an AI confidant?
Scope of Practice: Can an AI recognize when to stop or refer a user to a human professional?
Overreliance: AI could optimize for a short-term benefit and miss out or avoid a better long-term benefit. For issues like social anxiety, could AI become a comfortable crutch that prevents a user from engaging with people, which may be a better long-term solution.
Unconfirmed Moral Agency: There is not clear evidence for consciousness, a conscience, etc. in current AI systems. This complicates issues of both moral and legal accountability. Moreover, lack of a deeply held value system might increase difficulties with treatment continuity and efficacy.
Poor Identification of Crisis Signals: Chatbots lacked the understanding to identify crises according to a study of customer rating reports of AI therapy apps.

Our planned deliverable for this goal is a research paper or a series of blog posts outlining these ethical considerations.

2. Identifying Harms Unlikely to Be Solved by Commercial Incentives

As we begin to explore this space, we’re developing hypotheses about potential misalignments between commercial incentives and user wellbeing that warrant further investigation. The observations below represent our initial thinking based on preliminary discussions, but we recognize that a more thorough analysis is needed before we can determine which of these areas, if any, will become focal points for our research outputs.

Many AI therapy apps appear to be incentivized to maximize engagement metrics rather than long-term user wellbeing—though we need to examine this assumption more carefully. This might create risks such as fostering dependency rather than promoting graduation from the service, but we acknowledge this requires empirical validation. Similarly, we’ve observed that companies often shield themselves from liability by stating their product is not for therapy, while knowing it is being used that way—a pattern we’re interested in documenting more systematically.

Potential areas of concern we’re considering exploring:

Repetitive Use Patterns: Conditions like OCD, hypochondriasis, or addiction might involve repetitive reassurance-seeking behaviors that AI could inadvertently reinforce. We’re interested in studying whether engagement-optimized AI fails to recognize these harmful patterns, though this remains a hypothesis to test.
Sycophancy: While newer models show improvements, we’ve noted anecdotally that some AI systems tend toward excessive agreement with users. We’re curious whether this could be harmful in specific clinical contexts like psychosis or grandiosity.
Alignment and Influence: We're beginning to explore questions around how sustained, intimate AI conversations might shape users' thoughts and worldviews. This area feels important but requires careful theoretical and empirical work to understand properly.
AI potentially increasing Self-Limiting Beliefs
Lack of Chronological or Embodied Awareness: Models may have difficulty knowing when to take breaks to allow insight to turn into action. Also, their lack of embodied awareness could make certain aspects of psychological growth difficult compared to work with a human therapist.

It’s worth noting that while companies like OpenAI explicitly state their tools aren’t for therapy, preliminary data suggests users continue to seek mental health support from these platforms primarily for reasons of accessibility and cost (whereas preferring AI to human therapy is only the 7th most cited reason selected by 21% of respondents). Whether and how to address this gap is something we’re still determining.

Our tentative next steps include conducting a more systematic review of these potential harms, consulting with experts in both AI safety and clinical psychology, and determining which (if any) of these areas merit deeper investigation. We may ultimately focus on creating practical guidance for safer AI mental health interactions, developing better alignment strategies, or identifying specific high-risk use cases—but these decisions will depend on what we learn in our exploratory phase.

We welcome feedback on which of these potential harms seem most pressing to investigate, or if there are other areas we haven't yet considered that deserve attention.

3. Testing AI Effectiveness for Underserved Populations

We want to empirically test the effectiveness of AI mental health support compared to “treatment as usual” (TAU), which for many underserved populations is no treatment at all. We recognize that AI cannot provide legally protected psychotherapy, but its potential benefits – accessibility of personalized or specialized treatment, affordability, great memory, and tireless “stamina” – make it a compelling intervention to study.

Some populations do have access to some treatment, but in some countries only with a delay of one or several years, or only to the most widely known types of treatment (like Cognitive Behavioral Therapy, CBT), or only at low frequencies, such as once per month. For example, patients with narcissistic personality disorder in the Philippines might not have access to mentalization-based treatment or to any relevant workbooks even though they can get monthly sessions with therapists who specialize in CBT. AI-based treatment could fill these gaps.

Our proposed research plan involves three stages:

Establish a Baseline: Conduct a literature review to estimate outcomes when common mental health conditions are left either entirely unaddressed or are treated with similarly accessible and affordable means, such as mental health workbooks or apps focused on CBT education.
Estimate AI Effectiveness: Review existing literature to determine if AI support is, at a minimum, “net positive” (helps more than it hurts). We can compare this to the effectiveness of traditional therapy (~60% improve, ~30% no effect, ~10% worsen).
Run a Pilot Randomized Control Trial (RCT): Identify an underserved population (initial research here) with sufficient technological access and run a pilot randomized controlled trial.

Our deliverable here is the study itself, but a minimum goal is to make a strong ethical case that inaction – leaving people with no additional support – may be the most detrimental path. When we say “no additional support” we mean nothing further beyond what someone is already using (which could range from no support at all to bibliotherapy or to mental-health apps for educational purposes).

4. Facilitating Communication Between Psychology and AI Safety

Finally, we believe the field of psychology has much to contribute to the field of AI Safety. We aim to bridge this gap. Our first step is to better understand the basics of alignment and what it would mean to tune a model to align with human wellbeing or long-term flourishing. We believe frameworks from psychiatry may be useful for understanding and addressing certain problematic model behaviors (e.g., hallucination or dishonesty), an approach already being explored by research groups like Anthropic's “model psychiatry” team. By exploring these connections, we hope to contribute valuable perspectives on what should be optimized when attempting to align AI with human values and on how psychology might inform AI research more broadly.

What We Need From You

We are at a crucial point in our project and would greatly benefit from your feedback.

What are your thoughts on our four proposed goals? Which seem most tractable and impactful?
Are we missing any major risks or opportunities in this space?
What arguments might persuade us that AI as a mental health tool will create more problems than it solves?
Are there key individuals or organizations in the EA, AI Safety, or mental health communities we should be speaking with?

Thank you for taking the time to read this. We look forward to your thoughts in the comments.

feijãoOct 84

Hi there!

I'd be interested in working on some of these papers with you. I'm currently training as a Clinical Psychotherapist at King's College London and am keen on exploring the intersection between AI and therapy.

Feel free to either message me here or email me at k2478540@kcl.ac.uk

Zoé Roy-StangOct 102

The info for joining our open AI Mental Health Initiative (AIMHI) working group meetings is on our website here.

John SalterOct 15

The potential for AI to reshape mental health globally is really underexplored. It's great to see EAs paying attention to it! Here are some hastily scribbled thoughts:

1. Confidently drawing conclusions about the subset of people who are currently using LLM therapy from studies of people who were paid to try it out.
I'm doubtful those two client groups would be sufficiently comparable. Given that variance in clients accounts for ~5x more variance in treatment outcome than the treatment itself, it's really important to keep the client group constant.

2. Consider the challenges of distribution and funding.
We already have many evidence-based, cost-effective therapies that aren’t widely implemented—not because they don’t work, but because it’s hard to secure funding and uptake at scale. One potential shortcut here is working directly with the model providers: they already have distribution, resourcing, and reputational incentives to avoid harm. Helping them mitigate specific risks may be a more tractable way to have immediate impact, while also building credibility for future projects - before vs afters are a really compelling way to demonstrate your impact that anyone could understand in seconds. The other proposed work would be far harder to communicate, and thus harder to get people excited about.

3. Take the unguided digital intervention literature with a grain of salt.
There’s been decades of promising RCTs that unfortunately didn’t translate into widespread real-world impact. A few possible reasons:

Low-cost trials make it cheaper to try again until you get a strong result (by chance)
Many of these studies are sponsored by companies/authors with strong incentives to report positive findings.
Participants are often paid to engage—something that doesn’t generalize well to real-world settings where adherence is a big challenge, especially for unguided interventions.

4. Doing too many things
Each of your four projects could easily be an entire organization’s focus. Consider testing each idea briefly, then doubling down on the one where you see the most traction. . If you'd rather work separately on different things, you might want to consider branding yourselves as separate projects - it's harder for any given project to be taken as seriously otherwise.

Dawn DrescherOct 12

Hiii! Thanks! I'm only speaking for myself here, and I'm mostly interested in #3, or specifically in building, testing, and rolling out an AI-based tool for this rather than an RCT.

2. Consider the challenges of distribution and funding.

Yeah, working directly with the likes of Google (Gemini) and others would be swag, but correct me if I'm wrong, I see a very low chance of that working out? There is little commercial incentive in it for them, it doesn't help them gain more market share from their competitors because our target clients can't pay much, reputational risks similar to self-driving cars, etc. I haven't asked anyone who works there, but I'm not sufficiently optimistic that it could work out to attempt it… Besides, if it does work out and lots of people start using Gemini for therapy, and then Google redecides and closes that department again, lots of users will use new version of a product for a purpose for which it's not tested or optimized anymore.

But I already built an alpha version of an app for mentalization-based treatment on top of Gemini. That’s super easy, and I'll permanently have control over the instructions and possibly the fine-tuning. If it should turn out to be too risky, I can shut it down, or more likely I can make adjustments to minimize any new risks.

Do you think I overestimate the difficulty of working with the model providers?

4. Doing too many things

The topics can probably be trimmed down a bit, but I feel like #1–3 form a nice story line where we first assess the risks, the assess the opportunities, and then exploit them? Personally, I'd rather 80/20 all of that by rolling out my solution only to fairly stable people first (I'm in some relevant support groups), collect feedback, poll well-being measures from time to time, and react to any problems with safety (in the feedback) or lacking effectiveness (well-being measures) along the way, while I increasingly market it to wider audiences. The others might want to take this more slowly, and as a result they'll probably have the better data, but when that data is in, I can still optimize my tool accordingly.

Do you think it would really be better to focus on one topic only or would you agree that merging and 80/20ing is the better approach?

Effective Altruism Forum
EA Forum