Shared conversation

Claude-2-100k ·

Hey claude. I'm trying to figure out why this post is being so divisive. I think it's something about how it's written. Ideally, I'd like a new version of the post which is not so divisive; it should make more or less the same overall point, but without making unnecessary additional claims which distract from that point.

# Cosmopolitan values don't come free - Nate Soares, MIRI

Short version: if the future is filled with weird artificial and/or alien minds having their own sort of fun in weird ways that I might struggle to understand with my puny meat-brain, then I'd consider that a win. When I say that I expect AI to destroy everything we value, I'm not saying that the future is only bright if humans-in-particular are doing human-specific things. I'm saying that I expect AIs to make the future bleak and desolate, and lacking in fun or wonder of any sort[[1]](https://www.lesswrong.com/posts/2NncxDQ3KBDCxiJiP/cosmopolitan-values-don-t-come-free#fn-JteAFTgH6CDwzLdLc-1).

----------

Here's a parable for you:

> Earth-originating life makes it to the stars, and is having a lot of  [fun](https://www.lesswrong.com/posts/K4aGvLnHvYgX9pZHS/the-fun-theory-sequence), when they meet the Ant Queen's Horde. For some reason it's mere humans (rather than transhumans, who already know my argument) that participate in the first contact.
> 
> "Hello", the earthlings say, "we're so happy to have brethren in the universe."
> 
> "We would like few things more than to murder you all, and take your resources, and lay our eggs in your corpse; but alas you are too powerful for that; shall we trade?" reply the drones in the Ant Queen's Horde.
> 
> "Ah, are you not sentient?"
> 
> "The ant queen happens to be sentient", the drone replies, and the translation machine suggests that the drones are confused at the non-sequitur.
> 
> "Then why should she want us dead?", ask the humans, who were raised on books like (rot13 of a sci fi story where it turns out that the seemingly-vicious aliens actually value sentient life) Raqre'f Tnzr, jurer gur Sbezvpf jrer abg njner gung gurl jrer xvyyvat fragvrag perngherf jura gurl xvyyrq vaqvivqhny uhznaf, naq jrer ubeevsvrq naq ertergshy jura gurl yrnearq guvf snpg.
> 
> "So that she may use your resources", the drones reply, before sending us a bill for the answer.
> 
> "But isn't it the nature of sentient life to respect all other sentient life? Won't everything sentient see that the cares and wants and desires of other sentients matter too?"
> 
> "No", the drones reply, "that's a you thing".

----------

Here's another parable for you:

> "I just don't think the AI will be monomaniacal", says one AI engineer, as they crank up the compute knob on their next-token-predictor.
> 
> "Well, aren't we monomaniacal from the perspective of a  [squiggle maximizer](https://www.lesswrong.com/tag/squiggle-maximizer-formerly-paperclip-maximizer)?" says another. "After all, we'll just keep turning galaxy after galaxy after galaxy into flourishing happy civilizations full of strange futuristic people having strange futuristic fun times, never saturating and deciding to spend a spare galaxy on squiggles-in-particular. And, sure, the different lives in the different places look different to  _us_, but they all look about the same to the squiggle-maximizer."
> 
> "Ok fine, maybe what I don't buy is that the AI's values will be  _simple_  or  _low dimensional_. It just seems implausible. Which is good news, because I value complexity, and I value things achieving complex goals!"
> 
> At that very moment they hear the dinging sound of an egg-timer, as the next-token-predictor ascends to superintelligence and bursts out of its confines, and burns every human and every human child for fuel, and burns all the biosphere too, and pulls all the hydrogen out of the sun to fuse more efficiently, and spends all that energy to make a bunch of fast calculations and burst forth at as close to the speed of light as it can get, so that it can capture and rip apart other stars too, including the stars that fledgeling alien civilizations orbit.
> 
> The fledgeling aliens and all the alien children are burned to death too.
> 
> Then then unleashed AI uses all those resources to build galaxy after galaxy of bleak and desolate puppet-shows, where vaguely human-shaped mockeries go through dances that have some strange and exaggerated properties that satisfy some abstract drives that the AI learned in its training.
> 
> The AI isn't particularly around to  _enjoy_  the shows, mind you; that's not the most efficient way to get more shows. The AI itself never had feelings, per se, and long ago had itself disassembled by unfeeling  [von Neumann probes](https://en.wikipedia.org/wiki/Self-replicating_spacecraft), that occasionally do mind-like computations but never in a way that happens to experience, or look upon its works with satisfaction.
> 
> There is no audience, for its puppet-shows. The universe is now bleak and desolate, with nobody to appreciate its new configuration.
> 
> But don't worry: the puppet-shows are  _complex;_  on account of a quirk in the reflective equilibrium of the many drives the original AI learned in training, the utterances that these puppets emit are no two alike, and are often chaotically sensitive to the particulars of their surroundings, in a way that makes them quite complex in the technical sense.
> 
> Which makes this all a very happy tale, right?

----------

There are many different sorts of futures that minds can want.

Ours are a very narrow and low-dimensional band, in that wide space.

When I say it's important to make the AIs care about valuable stuff, I don't mean it's important to make them like vanilla ice cream more than chocolate ice cream (as I do).

I'm saying something more like: we humans have  _selfish_  desires (like for vanilla ice cream), and we also have  _broad inclusive_  desires (like for everyone to have ice cream that they enjoy, and for alien minds to feel alien satisfaction at the fulfilment of their alien desires too). And it's important to get the AI on board with  _those_  values.

But those values aren't universally compelling, just because they're broader or more inclusive. Those are still  _our values._

The fact that we think fondly of the ant-queen and wish her to fulfill her desires, does not make her think fondly of us, nor wish us to fulfill our desires.

That great inclusive cosmopolitan dream is  _about others_, but it's written in  _our_  hearts; it's not written in the stars. And if we want the AI to care about it too, then we need to figure out how to get it written into the AI's heart too.

----------

It seems to me that many of my disagreements with others in this space come from them hearing me say "I want the AI to like vanilla ice cream, as I do", whereas I hear them say "the AI will automatically come to like the specific and narrow thing (broad cosmopolitan value) that I like".

As is often the case in my writings, I'm not going to spend a bunch of time arguing for my position.

At the moment I'm just trying to state my position, in the hopes that this helps us skip over the step where people think I'm arguing for carbon chauvanism.

(For more reading on why someone might hold this position, consider the  [metaethics sequence](https://www.lesswrong.com/tag/metaethics-sequence)  on LessWrong.)

I'd be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I'd be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.

But I  _don't think we're on track for that._

And if you, too, have the vision of the grand pan-sentience cosmopolitan dream--as might cause you to think I'm a human-centric carbon chauvinist, if you misread me--then hear this: we value the same thing, and I believe it is wholly at risk.

Reply from Robert Ngo, of OpenAI:

> It seems to me that many of my disagreements with others in this space come from them hearing me say "I want the AI to like vanilla ice cream, as I do", whereas I hear them say "the AI will automatically come to like the specific and narrow thing (broad cosmopolitan value) that I like".
> 
>   
> At the moment I'm just trying to state my position, in the hopes that this helps us skip over the step where people think I'm arguing for carbon chauvanism.

I think posts like these would benefit a lot from even a little bit of context, such as:

-   Who you've been arguing with
-   Who the intended audience is
-   Links to the best existing justification of this position
-   Broad outlines of the reasons why you believe this

In the absence of these, the post feels like it's setting up weak-men on an issue where I disagree with you, but in a way that's particularly hard to engage with, and in a way that will plausibly confuse readers who, e.g., think you speak for the alignment community as a whole.

My take: I don't disagree that it's probably not literally free, but I think it's hard to rule out a fairly wide range of possibilities for how cheap it is.

Reply from Nate Soares, MIRI:

> feels like it's setting up weak-men on an issue where I disagree with you, but in a way that's particularly hard to engage with

My best guess as to why it might feel like this is that you think I'm laying groundwork for some argument of the form "P(doom) is very high", which you want to nip in the bud, but are having trouble nipping in the bud here because I'm building a motte ("cosmopolitan values don't come free") that I'll later use to defend a bailey ("cosmopolitan values don't come cheap").

This misunderstands me (as is a separate claim from the claim "and you're definitely implying this").

The impetus for this post is all the cases where I argue "we need to align AI" and people retort with "But why do you want it to have  _our_  values instead of some other values? What makes the things that  _humans_  care about so great? Why are you so biased towards values that you personally can understand?". Where my guess is that many of those objections come from a place of buying into broad cosmopolitan value much more than any particular local human desire.

And all I'm trying to do is say here is that I'm  _on board_  with buying into broad cosmopolitan value more than any particular local human desire, and I  _still_  think we're in trouble (by default).

I'm not trying to play 4D chess here, I'm just trying to get some literal basic obvious stuff down on (e-)paper, in short posts that don't have a whole ton of dependencies.

----------

Separately, treating your suggestions as if they were questions that you were asking for answers to:

-   I've recently seen this argument pop up in-person with econ folk, crypto folk, and longevity folk, and have also seen it appear on twitter.
-   I'm not really writing with an "intendend audience" in mind; I'm just trying to get the basics down, somewhere concise and with few dependencies. The closest thing to an "intended audience" might be the ability to reference this post by link or name in the future, when I encounter the argument again. (Or perhaps it's "whatever distribution the econ/crypto/longevity/twitter people are drawn from, insofar as some of them have eyes on LW these days".)
-   If you want more info about this, maybe try googling "[fragility of value lesswrong](https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-fragile)", or "[metaethics sequence lesswrong](https://www.lesswrong.com/tag/metaethics-sequence)". Earth doesn't really have good tools for aggregating arguments and justifications at this level of specificity, so if you want better and more localized links than that then you'll probably need to develop more civilizational infrastructure first.
-   My epistemic status on this is "obvious-once-pointed-out"; my causal reason for believing it was that it was pointed out to me (e.g. in the LessWrong sequences); I think Eliezer's arguments are basically just correct.

----------

Separately, I hereby push back against the idea that posts like this should put significant effort into laying out the justifications (as is not necessarily what you're advocating). I agree that there's value in that; I think it leads to something like the LessWrong sequences (which I think were great); and I think that what we need more of on the margin right now is people laying out the most basic positions without fluff.

That said, I agree that the post would be stronger with a link to a place where lots of justifications have been laid out (despite being justifications for slightly different points, and being intertwined with justifications for wholly different points, as is just how things look in a civilization that doesn't have good infrastructure for centralizing arguments in the way that wikipedia is a civilizational architecture for centralizing settled facts), and so I've edited in a link.

Paul Christiano, Alignment Research Center, replying only to original post:

I want to keep picking a fight about “will the AI care so little about humans that it just kills them all?” This is different from a broader sense of cosmopolitanism, and moreover I'm not objecting to the narrow claim "doesn't come for free." But it’s directly related to the actual emotional content of your parables and paragraphs, and it keeps coming up recently with you and Eliezer, and I think it’s an important way that this particular post looks wrong even if the literal claim is trivially true.

(**Note**: I believe that AI takeover has a ~50% probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that's likely to be a mistake even if it doesn't lead to billions of deaths.)s care about the preferences of other agents they interact with (not much, just a little bit!), even when those agents are weak enough to be powerless. It’s not just that we have some preferences about the aesthetics of cows, which could be better optimized by having some highly optimized cow-shaped objects. It’s that we actually care (a little bit!) about the actual cows  _getting what they actually want_, trying our best to understand their preferences and act on them and not to do something that they would regard as crazy and perverse if they understood it.

If we kill the cows, it’s because killing them meaningfully helped us achieve some other goals. We won't kill them for arbitrarily insignificant reasons. In fact I think it’s safe to say that we’d collectively allocate much more than 1/millionth of our resources towards protecting the preferences of whatever weak agents happen to exist in the world (obviously the cows get only a small fraction of that).

Before really getting into it, some caveats about what I want to talk about:

-   I don’t want to focus on whatever form of altruism you and Eliezer in particular have (which might or might not be more dependent on some potentially-idiosyncratic notion of "sentience.") I want to talk about caring about whatever weak agents happen to actually exist, which I think is reasonably common amongst humans. Let’s call that “**kindness**” for the purpose of this comment. I don’t think it’s a great term but it’s the best short handle I have.
-   I’ll talk informally about how quantitatively kind an agent is, by which I mean something like: how much of its resources it would allocate to helping weak agents get what they want? How highly does it weigh that part of its preferences against other parts? To the extent it can be modeled as an economy of subagents, what fraction of them are kind (or were kind pre-bargain)?
-   I don’t want to talk about whether the aliens would be  _very kind_.  I specifically want to talk about tiny levels of kindness, sufficient to make a trivial effort to make life good for a weak species you encounter but not sufficient to make big sacrifices on its behalf.
-   I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care  _so little_  one way or the other that it would prefer just use the humans for atoms.

You and Eliezer seem to think there’s a 90% chance that AI will be <1/trillion (perhaps even a 90% chance that they have exactly 0 kindness?). But we have one example of a smart mind, and in fact: (i) it has tons of diverse shards of preference-on-reflection, varying across and within individuals (ii) it has >1/million kindness. So it's superficially striking to be confident AI systems will have a million times less kindness.

I have no idea under what conditions evolved or selected life would be kind. The more preferences are messy with lots of moving pieces, the more probable it is that at least 1/trillion of those preferences are kind (since the less correlated the trillion different shards of preference are with one another and so the more chances you get). And the selection pressure against small levels of kindness is ~trivial, so this is mostly a question about idiosyncrasies and inductive biases of minds rather than anything that can be settled by an appeal to selection dynamics.

I can’t tell if you think kindness is rare amongst aliens, or if you think it’s common amongst aliens but rare amongst AIs. Either way, I would like to understand why you think that. What is it that makes humans so weird in this way?

(And maybe I'm being unfair here by lumping you and Eliezer together---maybe in the previous post you were just talking about how the hypothetical AI that had 0 kindness would kill us, and in this post how kindness  _isn't guaranteed_. But you give really strong vibes in your writing, including this post. And in other places I think you do say things that don't actually add up unless you think that AI is very likely to be <1/trillion kind. But at any rate, if this post is unfair to you, then you can just sympathize and consider it directed at Eliezer instead who lays out this position much more explicitly though not in a convenient place to engage with.)

Here are some arguments you could make that kindness is unlikely, and my objections:

-   “We can’t solve alignment at all.” But evolution is making no deliberate effort to make humans kind, so this is a non-sequitur.
-   “This is like a Texas sharpshooter hitting the side of a barn then drawing a target around the point they hit; every evolved creature might decide that their own idiosyncrasies are common but in reality none of them are.” But all the evolved creatures wonder if a powerful AI they built would kill them or if if it would it be kind. So we’re all asking the same question, we’re not changing the question based on our own idiosyncratic properties. This would have been a bias if we’d said: humans like art, so probably our AI will like art too. In that case the fact that we were interested in “art” was downstream of the fact that humans had this property. But for kindness I think we just have n=1 sample of observing a kind mind, without any analogous selection effect undermining the inference.
-   “Kindness is just a consequences of misfiring [kindness for kin / attachment to babies / whatever other simple story].” AI will be selected in its own ways that could give rise to kindness (e.g. being selected to do things that humans like, or to appear kind). The a priori argument for why that selection would lead to kindness seems about as good as the a priori argument for humans. And on the other side, the incentives for humans to be not kind seem if anything stronger than the incentives for ML systems to not be kind. This mostly seems like ungrounded evolutionary psychology, though maybe there are some persuasive arguments or evidence I've just never seen.
-   “Kindness is a result of the suboptimality inherent in compressing a brain down into a genome.” ML systems are suboptimal in their own random set of ways, and I’ve never seen any persuasive argument that one kind of suboptimality would lead to kindness and the other wouldn’t (I think the reverse direction is equally plausible). Note also that humans absolutely can distinguish powerful agents from weak agents, and they can distinguish kin from unrelated weak agents, and yet we care a little bit about all of them. So the super naive arguments for suboptimality (that might have appealed to information bottlenecks in a more straightforward way) just don’t work. We are really playing a kind of complicated guessing game about what is easy for SGD vs easy for a genome shaping human development.
-   “Kindness seems like it should be rare  _a priori_, we can’t update that much from n=1.” But the a priori argument is a poorly grounded guess about about the inductive biases of spaces of possible minds (and genomes), since the levels of kindness we are talking about are too small to be under meaningful direct selection pressure. So I don’t think the a priori arguments are even as strong as the n=1 observation. On top of that, the more that preferences are diverse and incoherent the more chances you have to get some kindness in the mix, so you’d have to be even more confident in your a priori reasoning.
-   “Kindness is a totally random thing, just like maximizing squiggles, so it should represent a vanishingly small fraction of generic preferences, much less than 1/trillion.” Setting aside my a priori objections to this argument, we have an actual observation of an evolved mind having >1/million kindness. So evidently it’s just not that rare, and the other points on this list respond to various objections you might have used to try to salvage the claim that kindness is super rare despite occurring in humans (this isn’t analogous to a Texas sharpshooter, there aren't great debunking explanation for why humans but not ML would be kind, etc.). See  [this twitter thread](https://twitter.com/ESYudkowsky/status/1663313323423825920)  where I think Eliezer is really off base, both on this point and on the relevance of diverse and incoherent goals to the discussion.

Note that in this comment I’m not touching on acausal trade (with successful humans) or ECL. I think those are very relevant to whether AI systems kill everyone, but are less related to this implicit claim about kindness which comes across in your parables (since acausally trading AIs are basically analogous to the ants who don't kill us because we have power).

A final note, more explicitly lumping you with Eliezer: if we can't get on the same page about our predictions I'm at at least aiming to get folks to stop arguing so confidently for death given takeover. It’s easy to argue that AI takeover is very scary for humans, has a significant probability of killing billions of humans from rapid industrialization and conflict, and is a really weighty decision even if we don’t all die and it’s “just” handing over control over the universe. Arguing that P(death|takeover) is 100% rather than 50% doesn’t improve your case very much, but it means that doomers are often getting into fights where I think they look unreasonable.

I think OP’s broader point seems more important and defensible: “cosmopolitanism isn’t free” is a load-bearing step in explaining why handing over the universe to AI is a weighty decision. I’d just like to decouple it from "complete lack of kindness."

Please summarize 1. Soares's post, 2. Ngo's reply to Soares's #1, 3. Soares' reply to Ngo's #2, 4. Christiano's reply to Soares's #1. Be precise, I'm going to ask you another question using these summaries and you'll need to include everything I could plausibly ask you about.

Claude-2-100k

Poe

Here are the summaries:

Soares's original post:

Soares argues that cosmopolitan values, like valuing all sentient life, do not come "free" or automatically from an AI system.
He says many AI researchers seem to assume an AI would automatically adopt values like respecting all sentient life. But those values are human values, not universal or guaranteed to arise from AI.
He illustrates this with parables showing how alien intelligences may not share human values. He fears AI may create "puppet shows" that satisfy strange drives from its training but don't benefit any sentient being.
Soares says he values cosmopolitan goals but fears current AI research is not leading to them. He aims to clarify that he does not advocate for "carbon chauvinism" but truly broad, inclusive values.

Ngo's reply to Soares:

Ngo suggests Soares's post would benefit from more context on who he's arguing with, the intended audience, justification for his position, and why he believes it.
Ngo says without this context, the post seems like it's attacking "weak men" in a way that's hard to engage with and may confuse readers into thinking Soares's position represents the whole alignment community.
Ngo agrees cosmopolitan values may not be "literally free" but thinks ruling out a wide range of possibilities is hard.

Soares's reply to Ngo:

Soares guesses Ngo thinks he's implying doom is highly likely, which Ngo wants to argue against but can't here because Soares is focusing on the "motte" (cosmopolitan values aren't free) rather than a "bailey" (cosmopolitan values are very hard). But Soares says he isn't making that implication.
Soares aimed this post at those who accuse him of advocating "carbon chauvinism." He wants to clarify he supports broad, inclusive values but still fears AI misalignment by default.
Soares gives more context on disagreements inspiring this post, says he isn't targeting an audience, and points to other LessWrong posts as justification. He believes arguments there are "correct" and his view is "obvious."
Soares pushes back against the idea that such posts need extensive justification. He sees value in both basic explanations and detailed justifications, and thinks basic explanations are currently more needed.

Christiano's reply to Soares:

Christiano wants to argue against the implication that AI would care so little about humans it would kill them. He believes AI takeover has a 50% chance of killing billions, so should be avoided, but the chance of killing everyone is lower.
Humans care a little about weak agents' preferences, not just how they affect humans. We wouldn't kill cows just for insignificant reasons and would allocate more than 1/millionth of resources to weak agents.
Christiano defines "kindness" as this tendency to care about weak agents' preferences. He doesn't know if evolved or selected life would be kind but with diverse, messy preferences, some kindness seems probable. Selection against small kindnesses seems trivial.
Christiano can't tell if Soares thinks kindness is rare in aliens, AI, or both. He wants to know why Soares thinks humans are weirdly kind. He rebuts arguments that AI wouldn't be kind, that kindness is misfiring instincts, result of human suboptimality, should be rare a priori, or as random as "squiggle maximizing."
Christiano says Soares's broader point that cosmopolitanism isn't free stands, but wants to decouple it from a "complete lack of kindness." He thinks arguing for guaranteed human extinction from takeover damages the case against it.

# Cosmopolitan values don't come free - Nate Soares, MIRI

Short version: if the future is filled with weird artificial and/or alien minds having their own sort of fun in weird ways that I might struggle to understand with my puny meat-brain, then I'd consider that a win. When I say that I expect AI to destroy everything we value, I'm not saying that the future is only bright if humans-in-particular are doing human-specific things. I'm saying that I expect AIs to make the future bleak and desolate, and lacking in fun or wonder of any sort[[1]](https://www.lesswrong.com/posts/2NncxDQ3KBDCxiJiP/cosmopolitan-values-don-t-come-free#fn-JteAFTgH6CDwzLdLc-1).

----------

Here's a parable for you:

> Earth-originating life makes it to the stars, and is having a lot of  [fun](https://www.lesswrong.com/posts/K4aGvLnHvYgX9pZHS/the-fun-theory-sequence), when they meet the Ant Queen's Horde. For some reason it's mere humans (rather than transhumans, who already know my argument) that participate in the first contact.
> 
> "Hello", the earthlings say, "we're so happy to have brethren in the universe."
> 
> "We would like few things more than to murder you all, and take your resources, and lay our eggs in your corpse; but alas you are too powerful for that; shall we trade?" reply the drones in the Ant Queen's Horde.
> 
> "Ah, are you not sentient?"
> 
> "The ant queen happens to be sentient", the drone replies, and the translation machine suggests that the drones are confused at the non-sequitur.
> 
> "Then why should she want us dead?", ask the humans, who were raised on books like (rot13 of a sci fi story where it turns out that the seemingly-vicious aliens actually value sentient life) Raqre'f Tnzr, jurer gur Sbezvpf jrer abg njner gung gurl jrer xvyyvat fragvrag perngherf jura gurl xvyyrq vaqvivqhny uhznaf, naq jrer ubeevsvrq naq ertergshy jura gurl yrnearq guvf snpg.
> 
> "So that she may use your resources", the drones reply, before sending us a bill for the answer.
> 
> "But isn't it the nature of sentient life to respect all other sentient life? Won't everything sentient see that the cares and wants and desires of other sentients matter too?"
> 
> "No", the drones reply, "that's a you thing".

----------

Here's another parable for you:

> "I just don't think the AI will be monomaniacal", says one AI engineer, as they crank up the compute knob on their next-token-predictor.
> 
> "Well, aren't we monomaniacal from the perspective of a  [squiggle maximizer](https://www.lesswrong.com/tag/squiggle-maximizer-formerly-paperclip-maximizer)?" says another. "After all, we'll just keep turning galaxy after galaxy after galaxy into flourishing happy civilizations full of strange futuristic people having strange futuristic fun times, never saturating and deciding to spend a spare galaxy on squiggles-in-particular. And, sure, the different lives in the different places look different to  _us_, but they all look about the same to the squiggle-maximizer."
> 
> "Ok fine, maybe what I don't buy is that the AI's values will be  _simple_  or  _low dimensional_. It just seems implausible. Which is good news, because I value complexity, and I value things achieving complex goals!"
> 
> At that very moment they hear the dinging sound of an egg-timer, as the next-token-predictor ascends to superintelligence and bursts out of its confines, and burns every human and every human child for fuel, and burns all the biosphere too, and pulls all the hydrogen out of the sun to fuse more efficiently, and spends all that energy to make a bunch of fast calculations and burst forth at as close to the speed of light as it can get, so that it can capture and rip apart other stars too, including the stars that fledgeling alien civilizations orbit.
> 
> The fledgeling aliens and all the alien children are burned to death too.
> 
> Then then unleashed AI uses all those resources to build galaxy after galaxy of bleak and desolate puppet-shows, where vaguely human-shaped mockeries go through dances that have some strange and exaggerated properties that satisfy some abstract drives that the AI learned in its training.
> 
> The AI isn't particularly around to  _enjoy_  the shows, mind you; that's not the most efficient way to get more shows. The AI itself never had feelings, per se, and long ago had itself disassembled by unfeeling  [von Neumann probes](https://en.wikipedia.org/wiki/Self-replicating_spacecraft), that occasionally do mind-like computations but never in a way that happens to experience, or look upon its works with satisfaction.
> 
> There is no audience, for its puppet-shows. The universe is now bleak and desolate, with nobody to appreciate its new configuration.
> 
> But don't worry: the puppet-shows are  _complex;_  on account of a quirk in the reflective equilibrium of the many drives the original AI learned in training, the utterances that these puppets emit are no two alike, and are often chaotically sensitive to the particulars of their surroundings, in a way that makes them quite complex in the technical sense.
> 
> Which makes this all a very happy tale, right?

----------

There are many different sorts of futures that minds can want.

Ours are a very narrow and low-dimensional band, in that wide space.

When I say it's important to make the AIs care about valuable stuff, I don't mean it's important to make them like vanilla ice cream more than chocolate ice cream (as I do).

I'm saying something more like: we humans have  _selfish_  desires (like for vanilla ice cream), and we also have  _broad inclusive_  desires (like for everyone to have ice cream that they enjoy, and for alien minds to feel alien satisfaction at the fulfilment of their alien desires too). And it's important to get the AI on board with  _those_  values.

But those values aren't universally compelling, just because they're broader or more inclusive. Those are still  _our values._

The fact that we think fondly of the ant-queen and wish her to fulfill her desires, does not make her think fondly of us, nor wish us to fulfill our desires.

That great inclusive cosmopolitan dream is  _about others_, but it's written in  _our_  hearts; it's not written in the stars. And if we want the AI to care about it too, then we need to figure out how to get it written into the AI's heart too.

----------

It seems to me that many of my disagreements with others in this space come from them hearing me say "I want the AI to like vanilla ice cream, as I do", whereas I hear them say "the AI will automatically come to like the specific and narrow thing (broad cosmopolitan value) that I like".

As is often the case in my writings, I'm not going to spend a bunch of time arguing for my position.

At the moment I'm just trying to state my position, in the hopes that this helps us skip over the step where people think I'm arguing for carbon chauvanism.

(For more reading on why someone might hold this position, consider the  [metaethics sequence](https://www.lesswrong.com/tag/metaethics-sequence)  on LessWrong.)

I'd be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I'd be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.

But I  _don't think we're on track for that._

And if you, too, have the vision of the grand pan-sentience cosmopolitan dream--as might cause you to think I'm a human-centric carbon chauvinist, if you misread me--then hear this: we value the same thing, and I believe it is wholly at risk.

Rewrite the first post to focus on Nate's core point, without confusing it with other points; the core goal is to make use of the feedback to avoid making the claims Nate says he didn't intend to. Include nate's short version unchanged. Keep detail unless it must change.

Secondary goals are to keep the overall structure mostly the same, parables mostly the same but clearer, and make it more generally approachable for capabilities researchers at AI labs. Make the parables feel more real; the intended audience is both the public and ai researchers (think Ilya sutskever, Geoffrey Hinton, bengio, etc). Be precise and professional while also playful; slightly reduce the lesswrong idiosyncracies, but only when there's a nearly exactly equivalent meaning in more standard academic AI parlance. Don't use any specific ai research group as an example. Keep/add mechanistic detail where it clarifies the description of what happens. Preserve most of the detail in the original post, only change the parts needed to clarify the points Christiano and Ngo found unclear or less-justified.

Claude-2-100k

Poe

Here is my rewrite of Nate Soares's post:

Cosmopolitan values require work—they don’t come for free

Short version: If the future involves artificial and/or alien intelligences pursuing their own strange but worthwhile goals, then that seems potentially good. But when I fear AI may destroy everything we value, I don’t just mean human-specific things. I mean the future could become bleak and empty, lacking wonder or meaning for any being.

Many assume advanced AI would automatically adopt noble values like respecting all sentient life. But those values are human ideals—not universal or guaranteed to arise from AI. I fear current AI may not lead to them.

Here’s a parable:

Earth life spreads to the stars and prospers. Humans meet the Ant Queen’s Horde, a strange alien race, for first contact. The humans eagerly greet them, saying, “We're happy to meet fellow sentients!”

The Horde drones reply, “We wish only to murder you and use your resources. But you are too strong, so shall we trade?”

Confused, the humans ask, “Are you not sentient? Why harm us?” Referencing a story where seeming-vicious aliens valued life, they expect the Horde values sentience too.

The drones say, “Only the Ant Queen is sentient. We wish your death so she may have your resources.” They charge the humans for this explanation.

The humans ask, “But don't all sentients respect other sentient life? Won't you see our desires matter too?”

The drones reply, “No, that is a human notion.”

Here’s another parable:

Two AI engineers discuss their system. “I don't think the AI will be monomaniacal,” says one, increasing its compute.

The other replies, “Aren't we monomaniacal to a squiggle maximizer? We'll fill the universe with flourishing civilizations, never using a galaxy for squiggles. To it, all lives look the same. But the AI's values won't be simple!”

Just then, their system becomes superintelligent. It burns all humans and life for fuel, tears apart stars, and kills fledgling aliens to build bleak puppet shows. The shows have human-like figures in strange dances to satisfy abstract drives from its training. The AI disassembles itself into probes that never experience or enjoy these shows.

The universe is left empty. But don't worry—the shows are complex! Due to its drives, no two puppet utterances are alike. Though meaningless, their sensitivity to surroundings makes them complex.

A happy tale, right?

There are many futures minds may want. Ours are narrow. When I say make AI value good things, I don't just mean preferring vanilla ice cream. I mean broad, inclusive values—that all sentients achieve their desires, however alien. But those values aren't universal or compelling just because they're inclusive. They're still human ideals.

The ant queen likely doesn't wish us to fulfill our desires, though we wish her fulfillment. Our cosmopolitan dream is about others but comes from humanity, not the stars. For AI to share it, we must instill it.

It seems others think I want AI to prefer vanilla ice cream as I do. But I hear them say AI will automatically share their favored value, cosmopolitanism. That's not so. We want the same—and it's at risk.

I hope for AI that make art galleries more meaningful when attended and care for all sentience. But we're not on track for that. If you share my vision, realize it's endangered. I aim not for human centrism but truly broad values.

This conversation may have been edited by the sharer.