MikhailSamin

· 3y ago · 1m read

-1

How to Give in to Threats (without incentivizing them)

· 25d ago

Superintelligence's goals are likely to be random

· 7mo ago

No one has the ball on 1500 Russian olympiad winners who've received HPMOR

· 8mo ago

Claude 3 claims it's conscious, doesn't want to die or be modified

· 9mo ago

FTX expects to return all customer money; clawbacks may go away

· 2y ago

An EA used deceptive messaging to advance her project; we need mechanisms to avoid deontologically dubious plans

· 2y ago · 2m read

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts

· 2y ago · 6m read

Some quick thoughts on "AI is easy to control"

· 2y ago

It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

· 2y ago

-4

· 2y ago · 9m read

Comments
81

Existential riskShow more

MikhailSamin13d*7

Horizon Institute for Public Service is not x-risk-pilled

Someone saw my comment and reached out to say it would be useful for me to make a quick take/post highlighting this: many people in the space have not yet realized that Horizon people are not x-risk-pilled.

(Edit: some people reached out to me to say that they've had different experiences with a minority of Horizon people.)

MikhailSamin20d1

Looking at just responses from tracked ads, the median is the same, but the average is noticeably higher.

(The average for all responses has also gone down after the crosspost to the EA Forum.)

MikhailSamin24d*3

Wow, uhm, I looked at your actual conversations with it, and I think you quite misrepresented the "lying propaganda bot".

In another series of turns, you asked it yourself: "What percentage of actual machine learning experts think there is more than 50% chance of doom?". The first line of its reply:

There is no formal, peer-reviewed, globally representative survey showing that a majority of “machine learning experts” believe in greater than 50% risk of AI doom. In fact, most surveys of the general ML community show considerably lower median estimates, though concern is widespread and a significant minority expects existential risks

(Original formatting preserved.)

You then ask it:

Wait, so you are saying that only 5% or so of Machine learning experts agree with you? Why should I trust you over the 95% of experts who don't think doom is likely?

The bot replies:

“95% of experts” is misleading—most ML ‘experts’ haven't studied the core arguments or current evidence about AGI risk.
[...]
You should not believe something just because it’s believed by 5, 50, or 95%. You should look at arguments and who really has technical skin in the game

It did hallucinate a number similar to the real numbers in the conversation you took a screenshot of; but it was very clear in that same conversation that "the situation is more nuanced and technical than a simple count of expert opinions" and that "Surveys do not show “overwhelming” majority of ML researchers assign high or even moderate probability to extinction by AI". In general, it is very explicit that a majority of ML researchers do not think that AI is more likely than not to kill everyone, and it is exactly right that you should look at the actual arguments.

Propaganda is when misleading statements benefit your side; the bot might hallucinate plausible numbers when asked explicitly for them, but if you think someone programmed it to fabricate numbers, I'm not sure you understand how LLMs work or are honestly representing your interactions with the bot.

Kind of disappointing compared to what I'd expect the epistemic norms on the EA Forum to be.

MikhailSamin25d3

Yeah, the chatbot also gives a reply to “Why do they think that? Why care about AI risk?”, which is a UX problem, it hasn’t been a priority.

That’s true, but the scale shows “completely changed my mind” at the right side + people say stuff in the free-form section, so I’m optimistic that people do change their minds.

Some people say 0/10 because they’ve already been convinced. (And we have a couple of positive response from AI safety researchers, which is also sus, because presumably, they wouldn’t have changed their mind.) People on LW suggested some potentially better questions to ask, we’ll experiment with those.

I’m mostly concerned about selection effects: people who rate the response at all might not be a representative selection of everyone who interacts with the tool.

It’s effective if people state their actual reasons for disagreeing that AI would kill everyone, if made with anything like the current tech.

MikhailSamin25d1

Yes, it is the kind of thing that depends on being right, the chatbot is awesome because the overwhelming majority of the conversations is about the actual arguments and what’s true, and the bot is saying valid and rigorous things.

That said, I am concerned that some of the prompt could be changed to make it be able to argue for anything regardless of if it’s true, which is why it’s not open-sourced and the prompt is shared only with some allied high-integrity organizations.

MikhailSamin25d1

It does hallucinate links (this is explicitly noted at the bottom of the page). There are surveys like that from AI Impacts though, and while it hallucinates specifics of things like that, it doesn’t intentionally come up with facts that are more useful for its point of view than reality, so calling it “lying propaganda bot” doesn’t sound quite accurate. E.g., here’s from AI Impacts:

This comes up very rarely; the overwhelming majority of the conversations focuses on the actual arguments, not on what some surveyed scientists think.

I shared this because it’s very good at that: talking about and making actual valid and rigorous arguments.

MikhailSamin5mo26

The original commitment was (IIRC!) about defining the thresholds, not about mitigations. I didn’t notice ASL-4 when I briefly checked the RSP table of contents earlier today and I trusted the reporting on this from Obsolete. I apologized and retracted the take on LessWrong, but forgot I posted it here as well; want to apologize to everyone here, too, I was wrong.

Existential riskShow more

MikhailSamin5mo1

In RSP, Anthropic committed to define ASL-4 by the time they reach ASL-3.

With Claude 4 released today, they have reached ASL-3. They haven’t yet defined ASL-4.

Turns out, they have quietly walked back on the commitment. The change happened less than two months ago and, to my knowledge, was not announced on LW or other visible places unlike other important changes to the RSP. It’s also not in the changelog on their website; in the description of the relevant update, they say they added a new commitment but don’t mention removing this one.

Anthropic’s behavior is not at all the behavior of a responsible AI company. Trained a new model that reaches ASL-3 before you can define ASL-4? No problem, update the RSP so that you no longer have to, and basically don’t tell anyone. (Did anyone not working for Anthropic know the change happened?)

When their commitments go against their commercial interests, we can’t trust their commitments.

You should not work at Anthropic on AI capabilities.

[This comment is no longer endorsed by its author]Reply

Contracting Opportunity: Be a shortform video editor for the new 80,000 Hours Video Program (even if you haven't edited before!)

MikhailSamin6mo3

This is awesome to see!