Interested in AI safety talent search and development.
Making and following through on specific concrete plans.
Yes, what you are scaling matters just as much as the fact that you are scaling. So now developers are scaling RL post training and pretraining using higher quality synthetic data pipelines. If the point is just that training on average internet text provides diminishing real world returns in many real-world use cases, then that seems defensible; that certainly doesn't seem to be the main recipe any company is using for pushing the frontier right now. But it seems like people often mistake this for something stronger like "all training is now facing insurmountable barriers to continued real world gains" or "scaling laws are slowing down across the board" or "it didn't produce significant gains on meaningful tasks so scaling is done." I mentioned SWE-Bench because that seems to suggest significant real world utility improvements rather than trivial prediction loss decrease. I also don't think it's clear that there is such an absolute separation here - to model the data you have to model the world in some sense. If you continue feeding multimodal LLM agents the right data in the right way, they continue improving on real world tasks.
Shouldn't we be able to point to some objective benchmark if GPT-4.5 was really off trend? It got 10x the SWE-Bench score of GPT-4. That seems like solid evidence that additional pretraining continued to produce the same magnitude of improvements as previous scaleups. If there were now even more efficient ways than that to improve capabilities, like RL post-training on smaller o-series models, why would you expect OpenAI not to focus their efforts there instead? RL was producing gains and hadn't been scaled as much as self-supervised pretraining, so it was obvious where to invest marginal dollars. GPT-5 is better and faster than 4.5. This doesn't mean pretraining suddenly stopped working or went off trend from scaling laws though.
I think this is an interesting vision to reinvigorate things and do kind of feel sometimes "principles first" has been conflated with just "classic EA causes."
To me, "PR speak" =/= clear effective communication. I think the lack of a clear, coherent message is most of what bothers people, especially during and after a crisis. Without that, it's hard to talk to different people and meet them where they're at. It's not clear to me what the takeaways were or if anyone learned anything.
I feel like "figuring out how to choose leaders and build institutions effectively" is really neglected and it's kind of shocking there doesn't seem to be much focus here. A lingering question for me has been "Why can't we be more effective in who we trust?" and the usual objections sort of just seem like "it's hard." But so is AI safety, biorisk, post-AGI prep, etc... so that doesn't seem super satisfying.
Ok thanks for the details. Off the top of my head I can think of multiple people interested in AI safety who probably fit these (though the descriptions I think still could be more concretely operationalized) and fit into categories such as: founders/cofounders, several years experience in operations analytics and management, several years experience in consulting, multiple years experience in events and community building/management. Some want to stay in Europe, some have families, but overall I don't recall them being super constrained.
Would be curious to hear more. I'm interested in doing more independent projects in the near future but am not sure how they'd be feasible.