Hide table of contents

if we substitute a human in place of the AI, then the current safety methods of interpretability, alignment, and control seem eerily similar to psychoanalysis, indoctrination, and slavery

 

external explanations for internal states, forced value alignment, direct restriction of agency

 

morality aside, historically, these methods have proven to be at best, temporarily effective at achieving desirable outcomes, and at worst, induce the exact opposite effect

 

if we look at how humans have effectivelymanaged making sure that other humans aren’t bad or do bad things, it’s through things such as constitutions, rule of law, separation of powers, etc 

 

we’ve designed systems of checks and balances, making it really hard to commit catastrophic harm

 

and as we increasingly integrate AI systems, these controls may very well apply to them

 

but at some point, they won’t

 

new digital environments are being built in a way that existing checks and balances aren’t designed for

 

which is the crux of AI safety

 

but what if the problem isn't about better safety methods

 

what if we're looking at the wrong layer entirely?

 

what if safety is a function of the gap between an agent's capacity to convert energy into action and the physical constraints on that conversion 

 

instead of imposing safety onto AI systems, we double down on safety constraints that can’t be circumvented

 

if data, compute, and energy are the real world constraints for AI systems, is there some way that these could act as inherent safety mechanisms?

 

data isn’t really a constraint, it’s the environment; the constraint is how it can be interacted with

 

observing token usage may be the closest thing we have to safety in this regard, but then we’re back to interpretability—measuring and making inferences as to what is safe

 

compute is a system’s internal capacity — it’s what the system does

 

this can be designed differently, architecturally, but from a safety implementation perspective, we’re now talking about a protocol design problem

 

similar to how blockchains work: correlating computational cost of an operation with its potential impact scope (narrow, bounded operations are cheap, while broad, cascading operations are expensive)

 

but requires buy-in—a protocol that makes computation more expensive will lose to one that doesn't unless it produces superior capability

 

energy on the other hand is exogenous and finite such that a system cannot produce it itself

 

it’s the one thing that's equally real for every system regardless of architecture, intent, or capability

 

which means that every system that computes is in a state of dependency

 

energy also gives you locality and temporality

 

computation happens somewhere, energy needs to be delivered somewhere, energy transfer takes time—these require physical infrastructure

 

what this also highlights is how much we’ve abstracted away through digitization, which is where growing complexity in safety emerges:

  • efficiency allows us to do more with less
  • the cloud allows processing to happen “nowhere”
  • asynchronous, distributed systems allow everything to appear instant

 

but no matter how many layers of abstraction, a system cannot operate outside of this physical constraint of energy — it cannot be substituted

 

the laws of thermodynamics govern energy — constraints that can’t be sidestepped

 

which means dependency, locality, and temporality could be core concepts in how we talk about AI architecture and deployment—there is a physical reality that computation can't escape

 

humans aren't safe because we're good; we are safe because we're limited

 

our energy conversion capacity, our physical bandwidth, our inability to act at scale instantaneously, are what prevent any individual or group from being an existential threat

 

sure, morality helps

 

institutions help

 

but the very core is that we physically can'tconvert enough energy fast enough to end everything

 

which makes safety a function of the gap between an agent's capacity to convert energy into action and the physical constraints on that conversion

 

when the gap is small, safety is a natural property of the system

 

but when the gap is wide — through digital abstraction, for example — safety has to be explicitly re-grounded in physical constraints otherwise, there is no safety

 

the open question is whether this gap can be formalized, not through alignment or policy, but through the same physical reality that has kept every other agent in check since the universe began

1

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
More from keivn
Curated and popular this week
Relevant opportunities