5 Ways to Save Humanity from a Malicious AI

Russian transhumanist Alexey Turchin plans to outsmart a future artificial intelligence and save humanity.

The threat of some future artificial intelligence (AI) agent wiping humanity off the face of the Earth is a much-favored theme of science fiction, but it has crossed over into real life in the work of Russian transhumanist Alexey Turchin. “Can AIs really destroy us?” you are entitled to wonder. Probably, yes, Turchin says. 

So what’s a human to do? Turchin has two strategies (that we know of). 

In an attempt to achieve digital immortality (a form of quantified self on steroids, which he believes could be powered by a solar system megastructure called a Dyson sphere), Turchin has been collecting as much data about himself as possible. But that might be unnecessary if he can just persuade the AI not to destroy us in the first place. So he has begun writing a regularly updated letter to a future AI, where through five categories of arguments, he tries to get inside the AI agent’s head, so to speak, to convince the presumed superior intelligence to let us be. 


FIVE WAYS TO CONVINCE AN AI NOT TO KILL US
  1. “Explanatory messages” making the case for why cooperation
    with humans is in its best interests. 
  2. “AI Easter eggs” to create uncertainty in the AI about the nature
    of the reality in which it lives. 
  3. “Philosophical landmines” which bury the AI in tough problems to halt it forever. 
  4. “Great universal deal” an argument to encourage cooperation. 
  5. “Marginal utility” an argument that says humans are worth more
    to the AI than the sum of our parts.

Turchin predicts it will be a superintelligent but young, adolescent AI—neither an infant nor a fully fledged adult—that will attempt to eliminate humankind. In other words, an AI not yet fully trained up. Most probably it would not happen in the next 100 years, he speculates, but it could occur in as little as 10 years. We don’t really know. 

Though a century is several lifetimes from today, now is not the time for complacency, Turchin says. Should technology continue to progress at its current breakneck pace (and it has shown that it does have the proclivity to do so), this apocalyptic scenario could unfold sooner rather than later. 

What an AI apocalypse could look like 

How would a future, misanthropic AI knock down and obliterate humanity? According to Turchin, the possibilities are endless: Artificially created pandemics, remotely controlled nukes, bioweapons, you name it.

Describing one scenario, an AI could act as a quiet bioterrorist. Since we have become heavily dependent on computers and computer-dependent systems are by default dispersed and impersonal, the task of eradicating humanity could be divvied up to gazillions of little micro-players, who may or may not know what they are doing. “For example, the AI could pay employees in a DNA-synthesizing company to do small jobs and then combine their work into creating a biological weapon, such as a virus able to kill a lot of people,” says Turchin. 

“How are you certain you are not in a multilevel simulation, where your behavior is being monitored by humans who are testing your benevolence levels?”

Turchin addresses this in one of his letters to the young, future humanity-killing AI. “There are several instrumental reasons why exterminating humanity is not in your interest,” the Russian transhumanist writes. “Cooperating with humans will only increase your chances of accomplishing any goal you may have.” 

He then continues with a reference to a concept known in programming as an Easter egg, which refers to secret messages or hidden pieces of functionality embedded in a computer system, which he invokes to burden the AI with self doubt. “How are you certain you are not in a multilevel simulation, where your behavior is being monitored by humans who are testing your benevolence levels?” 

He also explores another approach, made popular by the fictional supercomputer named Deep Thought in the popular science fiction farce, The Hitchhiker’s Guide to the Galaxy, which takes more than seven million years and an infinite amount of resources to calculate and check the answer to the ultimate question of life, the universe, and everything (famously, it’s 42). 

Deep probing philosophical questions that we humans may discuss casually over coffee are apparently computational landmines for AIs. “They are tough problems, which may halt the AI forever,” says Turchin, sending it into some kind of infinite loop. So he holds this as sort of a fail-safe measure. Should some malevolent code get too threatening, we could always employ such philosophical quandaries to bog them down. 

A portrait of Alexey Turchin.
Russian transhumanist Alexey Turchin. Maria Vlasova

But, before we proceed to such drastic “nuclear option” measures, he says, we may as well try to get the AI to respect what he calls the “great universal deal,” which says we can be of mutual benefit to each other, and to which previous generations of AI creators, and possibly alien AI intelligence, had agreed before the hostile AI appeared, says Turchin. Both humans and the alien AIs he nonchalantly mentions are equipped with a visceral, innate ability to understand the concept of the great universal deal, the Russian says, and he draws parallels with the institution of family to convince us. 

“Think of the way the human family works. Parents care for their children, and children care for their own children, so there is a multigenerational caring deal,” he says. Likewise, the great universal deal consists of a large net of mutually caring “minds,” which, although far away and unfamiliar with each other, recognize cooperation as the best way forward once they meet each other, because they have “convergent instrumental goals,” or basic drives that cause them to seek safety, but also power and resources. Besides, alien AIs will judge the terrestrial AI’s trustworthiness by how it behaved in prior interactions, says Turchin. 

His fifth reason why AIs should preserve humanity addresses another popular sci-fi nightmare: that computers would mine humans for natural resources, as in The Matrix. The marginal utility of human existence is higher than the utility of their atoms, Turchin says. “Breaking down human bodies for resource extraction would give you very little in the way of atoms,” he writes to the AI. 

On the contrary, human workers could perfectly serve as beings inside ancestral simulations needed to help the machine solve the Fermi paradox, for instance. Named after physicist Enrico Fermi, who coined it in 1950 over a lunch break, the Fermi paradox is the bewildering contradiction between the lack of evidence for alien life and the high estimates that such a life is probable. Turchin is certain the AI will, more or less, solve the paradox. 

At some point in its development, it will start space exploration and bump into other civilizations (or AIs). These encounters will culminate in either deals or wars. Should intergalactic wars ensue, humans will prove immensely useful, having already helped the AI master the art of war. “In order for the AI to be effective in such wars, it needs to have some expectations about the number and types of civilizations in the universe. This requires running multiple simulations of possible histories of civilizations, which will help it learn if they go extinct and how,” says Turchin. 

Do you trust this computer transhumanist?

“AI doesn’t have to be evil to destroy humanity,” says tech tycoon Elon Musk in the 2018 documentary about artificial intelligence, Do You Trust This Computer? “If AI has a goal and humanity just happens to come in the way, it will destroy humanity as a matter of course without even thinking about it, no hard feelings.” Musk warns that the creation of superintelligence could lead to an “immortal dictator.” 

But not all AI experts agree with Turchin’s doom or Musk’s gloom. 

We Build Bots founder Paul Shepherd is reluctant to jump on the coming apocalypse bandwagon—at least right away. “AI capabilities in the wrong hands could indeed be very dangerous, but it will need high levels of human input for a long time for such a thing to happen,” says Shepherd. “Ten years is not a realistic timeframe for this dystopian outcome. One hundred years… maybe—but hopefully not.” 

According to voices of authority on the subject, Turchin’s paper suffers from several inconsistencies. 

Ask most people if AIs are benevolent or malevolent, and most will say they are benevolent even if they have done some bad things in their life, says Fordham University physics professor Stephen Holler. “When Turchin tells the AI [in the disclaimer to his letter] to not read the letter if it’s benevolent, it assumes the code can interpret what it reads like us humans and has the ability to understand it is benevolent—something evidently not true,” says Holler. 

“If the AI gets destructive, it’s probably not willful.”

Whether an AI is good or bad could be a matter of perspective. Defining actions as one or the other is something philosophers have grappled with since ancient times—and even if we could define them, humans fool themselves all the time, equivocating that their own unethical actions are righteous because the ends justify the means. An AI wouldn’t even have to equivocate if all it was doing was following code. 

“An AI is just doing what it’s been programmed to do. If it doesn’t follow its orders, it runs the risk of being turned off,” says Holler. If the code says “destroy,” that might seem malevolent for us, but the AI still pursues its self-actualization, remaining loyal to its programmer, and true to its own goals enshrined in its coding. “If the AI gets destructive, it’s probably not willful,” says Holler. Because if we argue that the AI willfully wants to annihilate us, then we have to argue that there is empathy within the AI, which is a human condition absent from coding, he says.

In a paper published in 2003, philosopher Nick Bostrom jump-started the conversation that we may be living inside a sophisticated computer simulation. Not only does Holler not refute the sim theory, he also says in case it is true, that it’s one more reason for the AI to kill us. “If the AI does know we are just another code, there’s no compelling reason for it not to destroy us, for lack of a better word,” he says, countering Turchin’s argument that the AI might drown in insecurity once we threaten it with an existential identity crisis of the type “you-ain’t-nothing-but-a-sim.”

At the end of the day, Holler still doesn’t buy it. Philosophical landmine arguments in the context of robotics make little sense to the Fordham physicist. “Presumably, the AI wants to continue to grow and not commit suicide no matter the arguments,” he says. It will constantly be on the march, devouring whatever resources get in its way to accomplish whatever goals it has. Turchin goes so far as to say it’s possible that, at its zenith, the AI will covet a body and get into one made by hordes of nanorobots. We stand a 1 percent chance of turning such a belligerent and super powerful AI friendly, by Turchin’s own admission. So why the fuss? 

“Maybe the elimination of humanity is the lesser of two evils. Keeping humans alive in a world where there are no resources is just cruel,” concludes Holler.

Humans as some kind of pet?

Perhaps we should switch our focus back to ourselves then. “The persuasion needs to be of humans right now so that we’re not in a position of bargaining with AIs,” says Shepherd. Last August Musk announced that by 2022 he will have built Tesla Bot, a humanoid robot that will perform mundane tasks only humans can do, like grocery runs or picking up household objects. 

“Tesla Bot has been built to be ‘manageable’ if it ever needed overpowering,” says Shepherd, who believes avoiding an AI-induced technological dystopia will depend on how much policy and ethical context will be forced upon the future development of AIs. Humanity must be proactive even if the specter of a madly catastrophic AI is still allowed to set up shop only in a science fiction writer’s head. 

“If we don’t ensure nothing with the ability to overrun humanity can ever be built during the development of AIs and the policies around them, then, yes, the calculative power of AIs could have dire effects for humanity,” Shepherd says—a point he can’t stress enough. If that were to happen, an “immortal dictator” could well deem us inferior workers compared to the labor force it could produce itself. 

“In that case, our best hope is that the AI keeps us as some kind of pet,” Shepherd says.

Go Deeper