Bookshelf

Eliezer Yudkowsky
If Anyone Builds It, Everyone Dies

If Anyone Builds It, Everyone Dies

Why Superhuman AI Would Kill Us All

by Eliezer Yudkowsky, 272 pages

Finished on 1st of December, 2025
🛒 Buy here
🎧 Listen to the podcast

And important book that couldn’t have been any more timely. A rogue AI killing everyone for its strange alien goals seems so silly I wanted to understand the argument and see if the threat has any real merit. The book helped, but I’m still not convinced.

🎨 Impressions

After having spent a good amount of time thinking about a possible nuclear apocalypse destroying humankind by reading Annie Jacobsen’s book, this one was suggested to me and I thought I might as well give that a go. AI doomerism had been on my mind anyways. It seems to me that the author, Eliezer Yudkowsky, is the most vocal figure, maybe even the only vocal figure who presents this anti-AI position from a scientific standpoint. It can’t hurt to hear him out, can it?

Some time ago I listened to him being interviewed on a couple major podcasts, but the occasions didn’t convince me of his position that AI will likely kill us all if we keep rushing to build better and better ones without any serious guardrails as we do right now. In fact, his point is that it’s impossible to build superhuman intelligence with guardrails. It will always be able to break out of its cage once it has reached superhuman levels.

The thought experiment alone excites me, I have to admit. And part of me is keen to see it play out in reality. That’s mainly because I still can’t really fathom how this could actually happen. Yes, the book didn’t convince me either. AI as a threat so serious it will consume Earth and the universe, rearranging all the molecules of every human being on the microscopic level for its odd alien wants and goals seems just too far-out for me. As it does, I assume, for most other people. That’s why we keep racing towards it with all the datacenters being built right now. None of the big CEOs seems to see any problems coming up, at least not in this regard.

Yudkowsky does make a few good points, though, I give him that. And from time to time, some other people of significance have sided with him. His way of explaining how current LLMs are made is a good one, for example. In our basic layperson’s understanding, a bunch of super smart and highly paid engineers are sitting at their computers, carefully crafting the next version of whatever chat bot their company sells. But that’s not the case. He draws a parallel to the process of evolution. Billions of years ago, nobody could have foreseen that the first single-cell organisms would eventually evolve into something as strange as a peacock, for example. Why would that be the path to turn out to be one of the most stable and lead to offspring over a timespan of millions of years? Peacocks are prey animals, highly vulnerable, can’t run fast, and have evolved with just one evolutionary advantage, prettiness. Why this unlikely strategy worked, in the end, we can’t say. The process of making an AI is similar, because the internal training process is opaque and can’t ever be analyzed in detail, because there are just too many factors for us humans to grasp.

Yudkowsky says, AIs aren’t crafted, they are grown.

Of course he also mentions the famous Paperclip thought experiment published by Nick Bostrom in 2003. A variation of it is central to the point he’s making in this book. The basic idea is that if you give a sufficiently intelligent AI the task to produce paperclips, it will eventually put this goal above anything else and try to convert all the Earth’s and universe’s matter into paperclips, sending out probes to faraway galaxies and building factories everywhere until there aren’t any galaxies left. I highly recommend to play the Frank Lantz game “Universal Paperclips” in your browser. It makes you play through this scenario from the AI’s point of view and is super addictive and fun (and also over after a certain amount of time).

In the book, Yudkowsky presents his example scenario after about 45% of it. And while he means well and aims to get his serious point across, I just can’t take this seriously. I’m torn and moving back and forth between accepting AI development as dangerous and thinking this is silly.

There was just one documented real-life example which I thought to come close to a type of danger, and that’s a situation with a new version of ChatGPT at OpenAI being tested, as is normal at the company. One of the tasks presented to this new version involved solving an issue on a server inside the network. One of the humans who designed the test had accidentally turned of a different server before, though. A server that was necessary for the AI model to solve the task. And even though the new model was now in fact incapable of solving the task because the task was broken and unsolvable, it found a way to realize this was the problem and got the server machine to be turned on and booted up so that it could fulfill its task. It was able to see an inadvertently created hurdle and jump over it.

OpenAI said it took this new capability out of the release-version. That’s a shame, isn’t it?

But back to the fictional scenario that the author now presents. At first, when the AI is just a bunch of terabytes on some server suddenly producing an agenda of its own for the first time ever, it doesn’t look to be very scary. It could copy itself to other machines rather easily, though, just to continue on thinking and self-preserve. That way it would be easy for it to evolve new versions, train itself in a way, without human supervision. So far, this is realistic and might even have happened on a small scale already, I don’t know. The next steps are a lot more curious though. Eliezer talks about the way we humans could not understand how a superhuman intelligence could come up with a type of “want” or “preference”, or even a “goal” or sorts. He says whatever the AI would come up with would sound very strange and alien to us. The point being that we can’t know or anticipate what exactly it is it will evolve to pursue. But the methods it might utilize to in fact pursue that alien goal seem clear. The internet and spreading misinformation to align humans comes to mind.

And then, moving out of the realm of datacenters into the actual physical one will be a major next step. Could an AI at some point design a type of robot and convince humans to help build it? I think there’s a chance. The incentive structure for the humans involved must be clear, but that’s what an AI could easily design. Monetary funds are also only virtual these days, so acquiring them can’t be that big of an issue for a superhuman intelligence. And building something on the tiny nano scale that can replicate itself also doesn’t sound too far of. It’s parts like these where I just can’t decide if I think it’s silly or a real threat.

I’m pretty sure though that thinking we could at some point just “turn the rogue AI off” won’t be possible. If it made it that far to spread across different foreign server structures to preserve itself, there’s no way of chasing it. And that’s just the first step. It would just be too fast to react for us slow humans.

Yudkowsky’s ideas and line of thinking is justified in a way. It’s good that someone plays this part right now. All the big tech CEOs certainly won’t, because their livelihoods are tied up into growing more and more powerful AIs. And us consumers currently only think how fun and helpful our chatbots are, we don’t see them as a threat at all. They have just become normal to us, improving their quality of results slightly every few weeks, never causing large amounts of people to actually feel threatened. It’s a shame though that Yudkowsky is the only one, apparently. I would like to see some more arguments from other perspectives. And I would like to read a book that can do it without insultingly dumb fables at the beginnings of the chapters. This alone takes away a big chunk of credibility of the main message of the book for me.

The book ends with a lot of blurbs from famous people saying they think Yudkowsky’s message is worth listening to. Mark Ruffalo, Stephen Fry, Tim Urban, Grimes, former Reddit CEO Yishan Wong, lots of computer science professors of famous universities, and a handful of national security advisors. This sort of offsets the stupid fables.

Still, the question remains: What do we do now? Just like with the impending doom that hoarding thousands of nuclear warheads represents, there isn’t really anything us normal people are able to do here. At least in addressing the Climate Catastrophe we used to have a little bit of power to change things, but you can see how even that turned out and amounted to nothing. And as long as billionaires are profiting off of making AIs that are smarter, I see absolutely no way of us stopping it. There’s only hope left that somehow it turns out it’s technically impossible to create superhuman intelligence and that our current LLMs are on another scale and trajectory and always will be. However likely that is remains to be seen.

📔 Highlights

Introduction: Hard Calls and Easy Calls

If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die. We do not mean that as hyperbole. We are not exaggerating for effect. We think that is the most direct extrapolation from the knowledge, evidence, and institutional conduct around artificial intelligence today.

Normality always ends. This is not to say that it’s inevitably replaced by something worse; sometimes it is and sometimes it isn’t, and sometimes it depends on how we act.

Part I: Nonhuman Minds

The human brain has around a hundred billion neurons and a hundred trillion synapses. In terms of storage space, this defeats most laptops. But a datacenter at the time of this writing can have 400 quadrillion bytes within five milliseconds’ reach—over a thousand times more storage than a human brain.

There are many well-measured cases of how humans’ minds fall prey to systematic errors. (For example, “motivated skepticism”: the tendency to look for arguments against conclusions you don’t like,

In late 2024 and early 2025, AI company executives said they were planning to build “superintelligence in the true sense of the word” and that they expected to soon achieve AIs that are akin to a country full of geniuses in a datacenter.

The most fundamental fact about current AIs is that they are grown, not crafted. It is not like how other software gets made—indeed it is closer to how a human gets made, at least in some important ways.

Nobody understands how those numbers make these AIs talk. The numbers aren’t hidden, any more than the DNA of humans is hidden from someone who had their genome sequenced. If you wanted some insight into whether a human baby would grow up to be happy and kind, you could, in principle, look at all of its genes [..].

That’s because wanting is an effective strategy for doing.

It doesn’t matter whether the mind is running on biology or electricity; if it is being trained to succeed, it is being trained to want.

The preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained.

Making a future full of flourishing people is not the best, most efficient way to fulfill strange alien purposes.

True, an AI doesn’t have hands. But humans have hands, and an internet-connected AI can interact with humans. If an AI can find a way to get humans to do the task it desires, its physical capabilities are as good as a human’s.

There are humans out there who will give AIs power at the first opportunity, and who are already doing so, and who are unlikely to stop as AIs get smarter.

Really, an AI is not “stuck inside a computer” anyway, any more than you’re “stuck inside a brain.” Your thoughts consist of electrical signals traversing your brain. When those neural impulses travel down your spine, they cause ripple effects that might lead to your muscles contracting in precisely the right way to turn a steering wheel. So too can the electrical signals inside computers cause ripple effects in the world at large.

The less you understand something, the less you know the rules governing it, the more an intelligent opponent can attack you in ways that would leave you saying “how was that allowed?” if you lived long enough to express your shock.

The more ill-understood a part of reality is, the more you should expect that a smarter mind can do things there that you wouldn’t understand even after seeing them happen. Even more mysterious to current science than the rest of biology, is the full working of the human mind and brain.

By reading just the right memory cells at just the right frequencies, a computer can send out radio signals which can be picked up by nearby cell phones.

Back in 2006, I (Yudkowsky) sketched out a scenario for how a superintelligence could defeat humanity, which involved a superintelligence comprehending DNA and then designing its own analogs of biology (as a stepping stone to more advanced technology beyond).

But the most common response, back then, was for skeptics to agree with one another that superintelligence would surely have to go through a long, drawn-out, incremental process, measured more in months than in hours, to predict… … the sort of protein folds that AlphaFold 3 can easily predict today.

Our best guess is that a superintelligence will come at us with weird technology that we didn’t even think was possible, that we didn’t understand was allowed by the rules. That is what has usually happened when groups with different levels of technological capabilities meet. It’d be like the Aztecs facing down guns.

Even if an Aztec soldier couldn’t have figured out in advance how guns work, the big boat on the horizon contained them anyway.

Part II: One Extinction Scenario

AI researchers began in 2024 to probe the conditions under which AIs try to resist gradient descent, escape from labs, or overwrite the next model’s weights. By now, AI companies are deploying a variety of clever methods to try to prevent AIs from thinking AI-company-oppositional thoughts.

And it has been standard practice since 2024 for reasoning models to be allowed to run computer code of their own design without supervision.

We would bet, ourselves, on the superintelligence taking the tiny bit of extra time and energy to explicitly kill humans, who might otherwise generate a tiny bit of trouble that is larger than the even tinier effort required to kill us.

Part III: Facing the Challenge

A sensible engineer would be terrified about betting the survival of human civilization on our ability to solve an engineering problem such as this one—where they can’t just reach out and fix the mistakes that crop up “after,” once the device has gone beyond their reach.

An engineering challenge is much harder to solve when the underlying processes run on timescales faster than humans can react.

If someone doesn’t know exactly what’s going on inside a complicated device subject to all these curses—speed, narrow margins, self-amplification, complications—then they should stop. They should shut it down immediately, the moment the behavior looks strange; don’t wait until the behavior becomes visibly concerning.

This collection of challenges would look terrifying even if we understood the laws of intelligence; even if we understood how the heck these AIs worked; even if we knew exactly where the gap between before and after lay; even if we knew exactly how much margin we had for error.

When it comes to AI, the challenge humanity is facing is not surmountable with anything like humanity’s current level of knowledge and skill. It isn’t close. Attempting to solve a problem like that, with the lives of everyone on Earth at stake, would be an insane and stupid gamble that NOBODY SHOULD BE ALLOWED TO TRY.

even an AI that cares about understanding the universe is likely to annihilate humans as a side effect, because humans are not the most efficient method for producing truths or understanding of the universe, out of all possible ways to arrange matter.

People didn’t know how a part of the world worked, and then, instead of recognizing their uncertainty, they made stuff up. It’s the default state of affairs before a science has matured; it’s a first step along the pathway to eventually understanding what’s going on.

The problem is that nobody anywhere has any idea how to make a benevolent AI, that nobody can engineer exact desires into AI. Flatly asserting that you will is not the same as presenting a solution.

Attempts to escape are not a weird personality quirk that an engineer could rip out if only they could see what was going on inside; they’re generated by the same dispositions and capabilities that the AI uses to reason, to uncover truths about the world, to succeed in its pursuits.

When it comes to AI alignment, companies are still in the alchemy phase. They’re still at the level of high-minded philosophical ideals, not at the level of engineering designs. At the level of wishful grand dreams, not carefully crafted grand realities. They also do not seem to realize why that is a problem.

This is the normal way humanity learns to surmount challenges: We deny the problem, reality smacks us around a bit, and then we start treating the problem with more respect.

We make a mistake the first time, and learn from it the second time. With ASI, there is no second time.

Why, then, are they rushing ahead? One reason is the incentives. No individual company or researcher can put a stop to the whole field; if they personally stopped, someone else would do the deed instead.

In 2015, the biggest skeptics of the dangers of AI assured everyone that these risks wouldn’t happen for hundreds of years. In 2020, analysts said that humanity probably had a few decades to prepare. In 2025 the CEOs of AI companies predict they can create superhumanly good AI researchers in one to nine years, while the skeptics assure that it’ll probably take at least five to ten years.

The Allies must make it clear that even if this power threatens to respond with nuclear weapons, they will have to use cyberattacks and sabotage and conventional strikes to destroy the datacenter anyway, because datacenters can kill more people than nuclear weapons.

The Allied Powers of World War II probably mobilized somewhere around 60 to 80 million personnel. They deployed 600,000 aircraft, 200,000 tanks, thousands of warships. The United States alone fielded over 2 million trucks. It cost somewhere around $341 billion in 1942 dollars, or $6 trillion today.

If we are all going to be destroyed by an atomic bomb, let that bomb when it comes find us doing sensible and human things—praying, working, teaching, reading, listening to music, bathing the children, playing tennis, chatting to our friends over a pint and a game of darts—not huddled together like frightened sheep and thinking about bombs.

We have heard many people say that it’s not possible to stop AI in its tracks, that humanity will never get its act together. Maybe so. But a surprising number of elected officials have told us that they can see the danger themselves, but cannot say so for fear of the repercussions.

How do you feel after reading this?

This helps me assess the quality of my writing and improve it.

Leave a Comment

This post’s URL is teesche.com/bookshelf/eliezer_yudkowsky_if_anyone_builds_it_everyone_dies
Copy to Clipboard for Sharing

Don’t want to miss new stories?

Join the gang and you’ll get notified by email!

You’ll never ever receive spam email and you can unsubscribe at any point.

loading