So, what are you going to do? Remember, the supercomputer has always been right in the past.
This problem has baffled no end of decision theorists. The alien can’t change what’s already in the boxes, so whatever you do, you’re guaranteed to end up with more money by taking both boxes than by taking just Box B, regardless of the prediction. Of course, if you think that way and the computer predicted you’d think that way, then Box B will be empty and you’ll only get $1,000. If the computer is so awesome at its predictions, you ought to take Box B only and get the cool million, right? But what if the computer was wrong this time? And regardless, whatever the computer said then can’t possibly change what’s happening now, right? So prediction be damned, take both boxes! But then …
The maddening conflict between free will and godlike prediction has not led to any resolution of Newcomb’s paradox, and people will call themselves “one-boxers” or “two-boxers” depending on where they side. (My wife once declared herself a one-boxer, saying, “I trust the computer.”)
TDT has some very definite advice on Newcomb’s paradox: Take Box B. But TDT goes a bit further. Even if the alien jeers at you, saying, “The computer said you’d take both boxes, so I left Box B empty! Nyah nyah!” and then opens Box B and shows you that it’s empty, you should still only take Box B and get bupkis. (I’ve adopted this example from Gary Drescher’s Good and Real, which uses a variant on TDT to try to show that Kantian ethics is true.) The rationale for this eludes easy summary, but the simplest argument is that you might be in the computer’s simulation. In order to make its prediction, the computer would have to simulate the universe itself. That includes simulating you. So you, right this moment, might be in the computer’s simulation, and what you do will impact what happens in reality (or other realities). So take Box B and the real you will get a cool million.
What does all this have to do with Roko’s Basilisk? Well, Roko’s Basilisk also has two boxes to offer you. Perhaps you, right now, are in a simulation being run by Roko’s Basilisk. Then perhaps Roko’s Basilisk is implicitly offering you a somewhat modified version of Newcomb’s paradox, like this:
Roko’s Basilisk has told you that if you just take Box B, then it’s got Eternal Torment in it, because Roko’s Basilisk would really you rather take Box A and Box B. In that case, you’d best make sure you’re devoting your life to helping create Roko’s Basilisk! Because, should Roko’s Basilisk come to pass (or worse, if it’s already come to pass and is God of this particular instance of reality) and it sees that you chose not to help it out, you’re screwed.
You may be wondering why this is such a big deal for the LessWrong people, given the apparently far-fetched nature of the thought experiment. It’s not that Roko’s Basilisk will necessarily materialize, or is even likely to. It’s more that if you’ve committed yourself to timeless decision theory, then thinking about this sort of trade literally makes it more likely to happen. After all, if Roko’s Basilisk were to see that this sort of blackmail gets you to help it come into existence, then it would, as a rational actor, blackmail you. The problem isn’t with the Basilisk itself, but with you. Yudkowsky doesn’t censor every mention of Roko’s Basilisk because he believes it exists or will exist, but because he believes that the idea of the Basilisk (and the ideas behind it) is dangerous.
Now, Roko’s Basilisk is only dangerous if you believe all of the above preconditions and commit to making the two-box deal with the Basilisk. But at least some of the LessWrong members do believe all of the above, which makes Roko’s Basilisk quite literally forbidden knowledge. I was going to compare it to H. P. Lovecraft’s horror stories in which a man discovers the forbidden Truth about the World, unleashes Cthulhu, and goes insane, but then I found that Yudkowsky had already done it for me, by comparing the Roko’s Basilisk thought experiment to the Necronomicon, Lovecraft’s fabled tome of evil knowledge and demonic spells. Roko, for his part, put the blame on LessWrong for spurring him to the idea of the Basilisk in the first place: “I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm,” he wrote.
If you do not subscribe to the theories that underlie Roko’s Basilisk and thus feel no temptation to bow down to your once and future evil machine overlord, then Roko’s Basilisk poses you no threat. (It is ironic that it’s only a mental health risk to those who have already bought into Yudkowsky’s thinking.) Believing in Roko’s Basilisk may simply be a “referendum on autism,” as a friend put it. But I do believe there’s a more serious issue at work here because Yudkowsky and other so-called transhumanists are attracting so much prestige and money for their projects, primarily from rich techies. I don’t think their projects (which only seem to involve publishing papers and hosting conferences) have much chance of creating either Roko’s Basilisk or Eliezer’s Big Friendly God. But the combination of messianic ambitions, being convinced of your own infallibility, and a lot of cash never works out well, regardless of ideology, and I don’t expect Yudkowsky and his cohorts to be an exception.
I worry less about Roko’s Basilisk than about people who believe themselves to have transcended conventional morality. Like his projected Friendly AIs, Yudkowsky is a moral utilitarian: He believes that that the greatest good for the greatest number of people is always ethically justified, even if a few people have to die or suffer along the way. He has explicitly argued that given the choice, it is preferable to torture a single person for 50 years than for a sufficient number of people (to be fair, a lot of people) to get dust specks in their eyes. No one, not even God, is likely to face that choice, but here’s a different case: What if a snarky Slate tech columnist writes about a thought experiment that can destroy people’s minds, thus hurting people and blocking progress toward the singularity and Friendly AI? In that case, any potential good that could come from my life would far be outweighed by the harm I’m causing. And should the cryogenically sustained Eliezer Yudkowsky merge with the singularity and decide to simulate whether or not I write this column … please, Almighty Eliezer, don’t torture me.