niplav

987 karmaJoined niplav.site

Bio

I follow Crocker's rules.

Comments
161

I've been confused about the "defense-in-depth" cheese analogy. The analogy works in two dimensions, and we can visualize that constructing multiple barriers with holes will block any path from a point out of a three-dimensional sphere.

(What follows is me trying to think through the mathematics, but I lack most of the knowledge to evaluate it properly. Johnson-Lindenstrauss may be involved in solving this? (it's not, GPT-5 informs me))

But plans in the the real world real world are very high-dimensional, right? So we're imagining a point (let's say at ) in a high-dimensional space (let's say for large , as an example), and an -sphere around that point. Our goal is that there is no straight path from to somewhere outside the sphere. Our possible actions are that we can block off sub-spaces within the sphere, or construct n-dimensional barriers with "holes", inside the sphere, to prevent any such straight paths. Do we know the scaling properties of how many of such barriers we have to create, given such-and-such "moves" with some number of dimensions/porosity?

My purely guessed intuition is that, at least if you're given porous -dimensional "sheets" you can place inside of the -sphere, that you need many of them with increasing dimensionality . Nevermind, I was confused about this.

Whereas many people in EA seem to think the probability of AGI being created within the next 7 years is 50% or more, I think that probability is significantly less than 0.1%.

Are you willing to bet on this?

Yeah, I goofed by using Claude for math, not any of the OpenAI models, which are much better at math.

The relevant is this timestamp in an interview. Relevant part of the interview:

But now, getting to the job side of this, I do have a fair amount of concern about this. On one hand, I think comparative advantage is a very powerful tool. If I look at coding, programming, which is one area where AI is making the most progress, what we are finding is we are not far from the world—I think we'll be there in three to six months—where AI is writing 90 percent of the code. And then in twelve months, we may be in a world where AI is writing essentially all of the code. But the programmer still needs to specify what the conditions of what you're doing are, what the overall app you're trying to make is, what the overall design decision is. How do we collaborate with other code that's been written? How do we have some common sense on whether this is a secure design or an insecure design? So as long as there are these small pieces that a programmer, a human programmer, needs to do, the AI isn't good at, I think human productivity will actually be enhanced. But on the other hand, I think that eventually all those little islands will get picked off by AI systems. And then we will eventually reach the point where the AIs can do everything that humans can. And I think that will happen in every industry.

For what it's worth at the time I thought he was talking about code at Anthropic, and another commenter agreed. The "we are finding" indicates to me that it's at Anthropic. Claude 4.5 Sonnet disagree with me and says that it can be read as being about the entire world.

(I really hope you're right and the entire AI industry goes up in flames next year.)

I’m almost centrally the guy claiming LLMs will d/acc us out of AI takeover by fixing infrastructure, technically I’m usually hedging more than that but it’s accurate in spirit.

I'm happy this is reaching exactly the right people :-D

As for proving invariances, that makes sense as a goal, and I like it. If I perform any follow-up I'll try to estimate how much more tokens that'll produce, since IIRC seL4 or CakeML had proofs that exceeded 10× the length of their source code.

A recent experience I've had is to try and use LLMs to generate Lean definitions and proofs for a novel mathematical structure I'm toying with, they do well with anything below 10 lines but start to falter with more complicated proofs, and sorry their way out of anything I'd call non-trivial. My understanding is that a lot of software formal verification is gruntwork but there also needs to be interwoven moments of brilliance.

I'm always unsure what to think of claims that markets will incentivize the correct level of investment in software security. Like, initial computer security in the aughts seems to me like it was actually pretty bad, and while it became better over time it did take at least a decade. From afar, markets look efficient, from close up you can see efficiency establish itself. And then it's the question how much of the cost is internalized, which I feel for private companies should be close to 100%? For open source projects of course that number then goes close to zero.

It'd be cool to see a time series of the number of found exploits in open source software, thanks for the curl numbers. You picked a fairly old/established codebase with an especially dedicated developer, I wonder what it's like with newer ones, and whether one discovers more exploits in the early, middle, or late stage of development. The adoption of better programming languages than C/C++ of course helps.

Thanks! That's the kind of answer I was looking for. I'll sleep a night about pre-ordering, and then definitely look forward more to the online appendices. (I also should've specified it's the other Barnett ;-)

Note: I'm being a bit adversarial with these questions, probably because the book launch advertisements are annoying me a bit. Still, an answer to my questions could tip me over to pre-ordering/not pre-ordering the book.

Would you say that this book meaningfully moves the frontier of the public discussion on AI x-risk forward?

As in, if someone's read much to ~all of the publicly available MIRI material (including all of the arbital alignment domain, the 2021 dialogues, the LW sequences, and even some of the older papers), plus a bunch of writing from detractors (e.g. Pope, Belrose, Turner, Barnett, Thornley, 1a3orn), will they find updated defenses/elaborations on the evolution analogy, why automated alignment isn't possible, why to expect expected utility maximizers, why optimization will be "infectious", and some more on things linked here?

Additionally, would any of the long-time MIRI-debaters (as mentioned above, also including Christiano, the OpenPhil/Constellation cluster of people) plausibly give a positive endorsement of the book as to not just being a good distillation, but moving the frontier of the public discussion forward?

niplav
5
0
0
1
1
80% disagree

It's not OK to eat honey

My best guess is that eating honey is pretty bad, because I buy that bees have non-negligible moral weight, and the arguments for bee-lives being net-negative seem plausible too.

I'm far less wedded to bee lives being net-negative, so it could be that I'll be convinced that eating honey isn't just good, but extremely good—that eating honey is one of the best things modern humans do, because it allows for the existence of many flourishing bees.

Depending on the relationship between brain size and moral weight, different animals may be more or less ethical to farm.

A common assumption in effective altruism is that moral weight is marginally decreasing in number of neurons (i.e. small brains matter more per neuron). This implies that we'd want to avoid putting many small animals into factory farms, and prefer few big ones, especially if smaller animals have faster subjective experience.

A reductio ad absurdum of this view would be to (on the margin) advocate for the re-introduction of whaling, but this would be blocked by optics concerns and moral uncertainty (if we value something like sapience and culture of animals).

If factory farming can't be easily replaced with clean meat in the forseeable future, one might want to look for animals that are least unethicl to farm, mostly by them fulfilling the following conditions:

  • Small brain & low number of neurons
  • Easy to breed & fast reproduction cycle
  • Low behavioral complexity
  • Large body, high-calorie meat
  • Palatable to consumers
  • Stopped evolving early (if sentience evolved late in evolutionary history)

In conversation with various LLMs[1], three animals were suggested as performing well on those trade-offs. My best guess is that current factory farming can't be beat with these animals in effectiveness.

Ostriches

Advantages: Already farmed, very small brain for large body mass

Disadvantages: Fairly late in evolutionary history

Arapaima

Advantages: Very large for small brain size (up to 3m in length), fast-growing, simple neurology, already farmed, can be raised herbivorously, lineage is ~200 mio. years old bony fishes

Disadvantages: Tricky to breed

Tilapia

Advantages: Very easy to breed, familiarity to consumers, small neuron count

Disadvantages: Fairly small, not as ancient as the arapaima


  1. Primarily Claude 3.7 Sonnet ↩︎

Load more