A New Way to Rethink Alignment

Taylor Grogan

This past fall, I was living in Asheville, North Carolina when Hurricane Helene swept through. I was lucky to still have my apartment, but naturally, life turned upside down for a bit. I went from sitting in an office every day to wandering the city searching for food and water for myself and my neighbors. For weeks, we lived without power or running water. Eventually, I booked a flight to stay with my sister until things returned to normal.

Despite the heartbreak I felt for the community and the daily tragedy around me, I tried to make meaning of that time. Each morning, I reread Seneca: Letters from a Stoic. The wisdom and peace it offered stuck with me, but one line, in particular, resurfaced recently as something that might be especially relevant to the Effective Altruism community and AI safety:

“Assume authority yourself and utter something that may be handed down to posterity. Produce something from your own resources.”

A classic echo of “standing on the shoulders of giants” - but something about his framing struck me as deeper.

We’re often encouraged to take the insights of great thinkers and evolve them, to turn what we’ve learned into original, actionable ideas. For years, I’ve done just that: collected philosophical gems and implemented them into daily life. I’ve even drafted personal frameworks I imagined turning into a book one day.

But what if that’s not enough?

What if the more powerful act isn’t continuing to build upward, but stepping back and rebuilding entirely? What if the gems I’ve been carrying could have been better gems all along? Rather than building from philosophy, maybe I need to revisit the raw materials themselves. The kind of ideas that emerge from that process might be exponentially more useful.

This line of thinking has shaped how I approach AI safety.

While many of the scientists who first imagined AI did warn about the importance of alignment, optimization has always seemed to take precedence. That raises a difficult but necessary question:
What if alignment can’t be solved by extending today’s tools, or even by creating new ones that still depend on the same old foundations?

What would it mean to design AI from the ground up, with alignment as the core principle? Not a patch, not a feature, but the blueprint?

Has anyone tried reinventing the very fundamentals?

Consider the physicist Paul Dirac. Faced with the disconnect between quantum mechanics and special relativity, he didn’t just tweak existing models, he returned to first principles and re-derived a new equation. The result was the Dirac equation, which not only bridged the two theories but also predicted the existence of antimatter, an entirely unforeseen discovery.

This, I believe, is the kind of mindset AI companies should bring to problems like interpretability, reinforcement learning, and optimization theory. Some alignment efforts do attempt a first-principles approach (I've linked the articles I'm referring to at the bottom) but most still operate within frameworks inherited from today’s paradigm.

Instead of asking, “How do we fix this?”, we should be asking,

“What primitive assumptions are we allowed to challenge?”

It might be the difference between patching a system and preventing a catastrophe.

And maybe this idea applies far beyond AI. In any field, it’s worth asking:
What’s one foundational assumption you’ve never questioned—but maybe should?

It’s a question worth sitting with. Maybe even rebuilding from.

Other articles with foundational approach thinking
Contemplative Wisdom for Superalignment
Foundational Moral Values for AI Alignment
Reversing the logic of generative AI alignment: a pragmatic approach for public interest
AI Alignment Through First Principles

Effective Altruism Forum
EA Forum

A New Way to Rethink Alignment

1

1

Reactions

More posts like this