AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

23
16d
PSA: If you're doing evals things, every now and then you should look back at OpenPhil's page on capabilities evals to check against their desiderata and questions in sections 2.1-2.2, 3.1-3.4, 4.1-4.3 as a way to critically appraise the work you're doing.
38
1mo
1
FYI: METR is actively fundraising!  METR is a non-profit research organization. We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted payment from frontier AI labs for running evaluations. ^[1] Part of METR's role is to independently assess the arguments that frontier AI labs put forward about the safety of their models. These arguments are becoming increasingly complex and dependent on nuances of how models are trained and how mitigations were developed. For this reason, it's important that METR has its finger on the pulse of frontier AI safety research. This means hiring and paying for staff that might otherwise work at frontier AI labs, requiring us to compete with labs directly for talent. The central constraint to our publishing more and better research, and scaling up our work aimed at monitoring the AI industry for catastrophic risk, is growing our team with excellent new researchers and engineers. And our recruiting is, to some degree, constrained by our fundraising - especially given the skyrocketing comp that AI companies are offering. To donate to METR, click here: https://metr.org/donate If you’d like to discuss giving with us first, or receive more information about our work for the purpose of informing a donation, reach out to giving@metr.org   1. ^ However, we are definitely not immune from conflicting incentives. Some examples:    - We are open to taking donations from individual lab employees (subject to some constraints, e.g. excluding senior decision-makers, constituting <50% of our funding)  - Labs provide us with free model access for conducting our evaluations, and several labs also provide us ongoing free access for research even if we're not conducting a specific evaluation. 
17
13d
1
If the people arguing that there is an AI bubble turn out to be correct and the bubble pops, to what extent would that change people's minds about near-term AGI?  I strongly suspect there is an AI bubble because the financial expectations around AI seem to be based on AI significantly enhancing productivity and the evidence seems to show it doesn't do that yet. This could change — and I think that's what a lot of people in the business world are thinking and hoping. But my view is a) LLMs have fundamental weaknesses that make this unlikely and b) scaling is running out of steam. Scaling running out of steam actually means three things: 1) Each new 10x increase in compute is less practically or qualitatively valuable than previous 10x increases in compute. 2) Each new 10x increase in compute is getting harder to pull off because the amount of money involved is getting unwieldy. 3) There is an absolute ceiling to the amount of data LLMs can train on that they are probably approaching. So, AI investment is dependent on financial expectations that are depending on LLMs enhancing productivity, which isn't happening and probably won't happen due to fundamental problems with LLMs and due to scaling becoming less valuable and less feasible. This implies an AI bubble, which implies the bubble will eventually pop.  So, if the bubble pops, will that lead people who currently have a much higher estimation than I do of LLMs' current capabilities and near-term prospects to lower that estimation? If AI investment turns out to be a bubble, and it pops, would you change your mind about near-term AGI? Would you think it's much less likely? Would you think AGI is probably much farther away?
17
21d
Ajeya Cotra writes: Like Ajeya, I haven't thought about this a ton. But I do feel quite confident in recommending that generalist EAs — especially the "get shit done" kind —  at least strongly consider working on biosecurity if they're looking for their next thing.
10
12d
8
I have a question I would like some thoughts on: As a utilitarianist I personally believe alignment to be the most important course area - though weirdly enough, even though I believe x-risk reduction to be positive in expectation, I believe the future is most likely to be net negative. I personally believe, without a high level of certainty, that the current utilities on earth are net negative due to wild animal suffering. If we therefore give the current world a utility value of -1, I would describe my believe for future scenarios like this: 1. ~5% likelihood: ~101000  (Very good future e.g. hedonium shockwave) 2. ~90% likelihood: -1 (the future is likely good for humans, but this is still negligible compared to wild animal suffering, which will remain) 3. ~5% likelihood: ~-10100 (s-risk like scenarios)   My reasoning for thinking “scenario 2” is more likely than “scenario 1” is based on what seems to be the values of the general public currently. Most people seem to care about nature conservation, but no one seems interested in mass producing (artificial) happiness. While the earth is only expected to remain habitable for about two billion years (and humans, assuming we avoid any x-risks, are likely to remain for much longer), I think, when it comes to it, we’ll find a way to keep the earth habitable, and thus wild animal suffering.  Based on these 3 scenarios, you don’t have to be a great mathematician to realize that the future is most likely to be net negative, yet positive in expectation. While I still, based on this, find alignment to be the most important course area, I find it quite demotivating that I spend so much of my time (donations) to preserve a future that I find unlikely to be positive. But the thing is, most long-termist (and EAs in general) don't seem to have similar beliefs to me. Even someone like Brian Tomasik has said that if you’re a classical utilitarian, the future is likely to be good.   So now I’m asking, what am I getting wro
48
3mo
2
AI governance could be much more relevant in the EU, if the EU was willing to regulate ASML. Tell ASML they can only service compliant semiconductor foundries, where a "compliant semicondunctor foundry" is defined as a foundry which only allows its chips to be used by compliant AI companies. I think this is a really promising path for slower, more responsible AI development globally. The EU is known for its cautious approach to regulation. Many EAs believe that a cautious, risk-averse approach to AI development is appropriate. Yet EU regulations are often viewed as less important, since major AI firms are mostly outside the EU. However, ASML is located in the EU, and serves as a chokepoint for the entire AI industry. Regulating ASML addresses the standard complaint that "AI firms will simply relocate to the most permissive jurisdiction". Advocating this path could be a high-leverage way to make global AI development more responsible without the need for an international treaty.
68
5mo
3
A week ago, Anthropic quietly weakened their ASL-3 security requirements. Yesterday, they announced ASL-3 protections. I appreciate the mitigations, but quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work. (This was originally a tweet thread (https://x.com/RyanPGreenblatt/status/1925992236648464774) which I've converted into a quick take. I also posted it on LessWrong.) What is the change and how does it affect security? 9 days ago, Anthropic changed their RSP so that ASL-3 no longer requires being robust to employees trying to steal model weights if the employee has any access to "systems that process model weights". Anthropic claims this change is minor (and calls insiders with this access "sophisticated insiders"). But, I'm not so sure it's a small change: we don't know what fraction of employees could get this access and "systems that process model weights" isn't explained. Naively, I'd guess that access to "systems that process model weights" includes employees being able to operate on the model weights in any way other than through a trusted API (a restricted API that we're very confident is secure). If that's right, it could be a high fraction! So, this might be a large reduction in the required level of security. If this does actually apply to a large fraction of technical employees, then I'm also somewhat skeptical that Anthropic can actually be "highly protected" from (e.g.) organized cybercrime groups without meeting the original bar: hacking an insider and using their access is typical! Also, one of the easiest ways for security-aware employees to evaluate security is to think about how easily they could steal the weights. So, if you don't aim to be robust to employees, it might be much harder for employees to evaluate the level of security and then complain about not meeting requirements[1]. Anthropic's justification and why I disagree Anthropic justified the change by
40
3mo
If you're considering a career in AI policy, now is an especially good time to start applying widely as there's a lot of hiring going on right now. I documented in my Substack over a dozen different opportunities that I think are very promising.
Load more (8/212)