Researcher at the Center on Long-Term Risk. All opinions my own.
Thanks for explaining! To summarize, I think there are crucial disanalogies between the "make their own life go well" case and the "make the future of humanity go well" case:
Should they follow those instincts even when they don't know why evolution instilled them? … (Note that this doesn't rely on them understanding evolution—even 1000 years ago they could have trusted that some process designed their body and mind to function well, despite radical uncertainty about what that process was.)
In this case, the reasons to trust "that some process designed their body and mind to function well" are relatively strong. Because of how we're defining "well": an individual's survival in a not-too-extreme environment. Even if they don't understand evolution, they can think about how their instincts plausibly would’ve been honed on feedback relevant to this objective. And/or look at how other individuals (or their past self) have tended to survive when they trusted these kinds of instincts.[1]
Now consider someone who wants to make the future of humanity go well. Similar, they have certain cooperative instincts ingrained in them—e.g. the instinct towards honesty. All else equal, it seems pretty reasonable to think that following them will help humanity to cooperate better, and that this will allow humanity to avoid internal conflict.
Here, the reasons to trust that the instincts track the objective seem way weaker to me,[2] for all the reasons I discuss here: No feedback loops, radically unfamiliar circumstances due to the advent of ASI and the like, a track record of sign-flipping considerations. All else is really not equal.
You yourself have written about how the AIS movement arguably backfired hard, which I really appreciate! I expect that ex ante, people founding this movement told themselves: “All else equal, it seems pretty reasonable to think that trying to warn people of a source of x-risk, and encouraging research on how to prevent it, will help humanity avoid that x-risk.”
(I think analogous problems apply to your subagent and forward-chaining framings. They’re justified when the larger system provides feedback, or the forward steps have been validated in similar contexts — which we’re missing here.)
How does this relate to cluelessness? Mostly I don't really know what the term means
The way I use the term, you’re clueless about how to compare A vs. B, relative to your values, if: It seems arbitrary (upon reflection) to say A is better than, worse than, or exactly as good as B. And instead it seems we should consider A’s and B’s goodness incomparable.
What if someone has always been totally solitary, doesn’t understand evolution or feedback loops, and hasn’t made many decisions based on similar instincts? Seems like such a person wouldn’t have reasons to trust their instincts! They’d just be getting lucky.
See here for my reply to: “Sure, ‘way weaker’, but they’re still slightly better than chance right?” Tl;dr: This doesn’t work because the problem isn’t just noise that weakens the signal, it’s “it’s ambiguous what the direction of the signal even is”.
The stuff on cluelessness feels like it's conceding a little too much to the EA/bayesian frame. It's implying that you should have a model of the entire future in order to make decisions. But what I think you actually want to claim is that it's sensible and even "rational" to make non-model-based decisions (e.g. via heuristics, intuitions, etc).
I'd be interested in hearing more on what exactly you mean by this. Insofar as someone wants to make decisions based on impartially altruistic values, I think cluelessness is their problem, even if they don't make decisions by explicitly optimizing w.r.t. a model of the entire future. If such a person appeals to some heuristics or intuitions as justification for their decisions, then (as argued here) they need to say why those heuristics or intuitions reliably track impact on the impartial good. And the case for that looks pretty dubious to me.
(If you're rejecting the "make decisions based on impartially altruistic values" step, fair enough, though I think we'd do well to be explicit about that.)
My best guess about which of 2 identical objects has a larger mass in expectation will be arbitrary is their mass only differs by 10^-6 kg, and I have no way of assessing this small difference. However, this does not mean the expected mass of the 2 objects is fundamentally incomparable
I worry you're reifying "expectations" as something objective here. The relative actual masses of the objects are clearly comparable. But if you subjectively can't compare them, then they're indeed incomparable "in expectation" in the relevant sense.
However, the same goes for comparisons among the expected mass of seemingly identical objects with a similar mass if I can only assess their mass using my hands, but this does not mean their mass is incomparable.
I don't exactly understand what argument you're making here.
My core argument in the post is: Take any intervention X. We want to weigh up its impact for all sentient beings across the cosmos, where this "weighing up" is aggregation over all hypotheses. Now suppose we want to force ourselves to compare X with inaction, i.e., say either UEV(do X) > UEV(don't do X) or vice versa. We have such an extremely coarse-grained understanding (if any) of these hypotheses[1] that, when we do the weighing-up, whether we say UEV(do X) > UEV(don't do X) or vice versa seems to depend on an arbitrary choice.
Can you say how your argument relates to mine?
Relative to the amount of fine-grained detail necessary to evaluate the hypothesis, when what we value is "well-being of all sentient beings across the cosmos".
In normal situations, an agent can rationally come to a single probability distribution, but, Greaves argues that, in a situation with complex cluelessness, an individual should instead have a set of probability functions that they are “rationally required to remain neutral between.” I’m not entirely sure what this means.
You might be interested in this post I wrote explaining imprecision — hopefully answers "what this means".
"[A]llowing [small] groups of humans to cooperate effectively" is very far from "making the far future better, impartially speaking". I’d be interested in your responses to the arguments here.
First, it’s not clear to me these people weren’t clueless — i.e. really had more reason to choose whatever they chose than the alternatives — depending on how long a time horizon they were aiming to make go well.
Second, insofar as we think these people’s choices were justified, I don’t see why you think their instincts gave them such justification. Why would these instincts track unprecedented consequences so well?
I don’t think "may well" gets us very far. Can you say more why this hypothesis is so much more likely than, say, “the dominant impacts are the damage that's already been done”, or “the dominant impacts will come from near-future decisions, made by actors who are still too ignorant about the extremely complex system they’re intervening in”?
No, because I think “founding AI safety movements that succeed at making the far future go better” is a pretty out-of-distribution kind of sociopolitical intervention.