This post is written in collaboration with
from . If you enjoy these deep dives into topics at the intersection of philosophy and technology, make sure to subscribe and check his work.
There are many risks and challenges in the deployment of artificial intelligence. It is one of our most potent technologies so far, and like all technologies, it can be used for good or evil. The more powerful the technology, the greater the potential for positive and negative applications.
It depends more on its utilization by humans than on the inherent nature of the technology itself. The impact and consequences of any technological advancement are significantly shaped by how humans choose to employ and integrate it into their lives, societies, and the broader world. The responsible and ethical use of technology, therefore, plays a pivotal role in determining whether it leads to positive or negative outcomes, emphasizing the profound influence of human decisions on the course of technological evolution.
For example, a hammer can be utilized to build a house or to harm someone, but neither construction nor harm would be particularly efficient. Similarly, dynamite can be employed to construct roads or destroy cities, and nuclear power can either obliterate a nation or lift an entire continent out of poverty. AI appears to lean towards the more extreme end of this spectrum. It holds the potential to be an immensely powerful technology that can revolutionize society and automate complex tasks. However, this same power also allows it to cause significant destruction.
AI can be leveraged in various ways, ranging from the overwhelming dissemination of disinformation to the pervasive bias in news and media. In this article, I want to focus specifically on one set of AI risks: the so-called existential risks. These risks involve the potential for AI to completely destroy human civilization or even extinguish the human race.
First of all, we need to deal with the concept of existential risk. Because people often confuse existential with psychological. When a person avoids boredom or thinking about the finality of his life, this is existential fear. When he fears losing his job or missing the bus, that's psychology. The first differs from the second by less dependence on what happens or can happen and is connected with the very conditions of human existence.
In the case of attitudes towards AI, we have two kinds of attitudes. The first one is connected with the fundamental possibility or impossibility of a human being creating a machine that will possess something like free will and computational abilities. The second relates to what can happen if the machines created by man for some reason, unforeseen or accidental, begin to work differently than their creator intended.
In this article, we will review the most prominent scenarios for AI existential risk. Then, we will identify and examine the flawed premises upon which these scenarios are based. We’ll explain why we believe these scenarios to be highly improbable, if not impossible. Finally, we will argue why it is worthwhile to intellectually pursue and discuss even the most extreme doomsday scenarios. We believe approaching this topic with an open mind and rational thinking can provide valuable insights and perspectives.
How AI might kill us all
There are many different scenarios for potential existential threats of AI. These situations involve artificial intelligence, or a manifestation of artificial intelligence, reaching a stage where it possesses not only the capability to obliterate human civilization and potentially all life on Earth but also the motivation or at least a trigger that incites this action.
Destructive capabilities
In order to have a doomsday scenario, first, there needs to be an incredibly powerful artificial intelligence that is capable, in principle, of annihilating mankind in an almost inevitable manner. The AI must possess technological and military power that surpasses everything humanity can muster by orders of magnitude, or it should possess something so potent and rapidly deployable that once annihilation commences, there would be no possible defense. One such example could be a swarm of nanobots capable of infecting the entire global population and simultaneously triggering a massive brain stroke in all 8 billion individuals.
This level of destructive capacity is necessary for a doomsday scenario because an AI that possesses a destructive capacity approximately equal to that of humans won’t annihilate us instantly. For instance, an AI that is roughly equal in military strength to the combined might of humanity would not suffice; it would, at worst, result in a prolonged war without complete elimination of either side. Even a complete nuclear exchange between humanity and AI won’t do. That might lead to the total destruction of civilization, causing unparalleled devastation and casualties. Nevertheless, some people would survive, finding refuge in shelters and potentially rebelling against the AI.
As scary as these scenarios are, they are not nearly close to what we mean by “existential threat”. This is about the absolute end of human existence, a point beyond which no further history is made.
However, the average person does not think about world history or the history of humanity. Just as they cannot relate their personal existence to history as a whole or to the end of history if it’s imaginable. Instead, a conflict exists between the existential, as the experience of unique and incommunicable existence, and the historical, as the objective course of events over which we have no control. In this sense, technological development is an objective process, like the change of geological epochs, almost independent of our preferences. The existential is not susceptible to the influence of AI, partly because it is irreproducible, like the uniqueness of human personality. As we see it, there is a rarely articulated existential fear that this might not be the case, that the boundaries of the personal become so blurred that exponential technological development may one day surpass them with unclear consequences.
There are, however, various ways in which AI could become a significant threat without us realizing it. For instance, one possibility is the Skynet scenario, where autonomous weapons gain control over our military arsenal. Imagine if all nuclear warheads worldwide were under the command of an AI, which then decides to attack humanity. However, I have some doubts about this scenario for two main reasons.
Firstly, the military does not operate in such a manner. There are always fail-safe buttons that can be pressed to intercept a nuclear missile. Even if the AI is brilliant enough to bypass all these safeguards, my second point is that there is no unified global coalition that would willingly grant access to a foreign AI. Each country, such as China, Russia, India, the US, Pakistan, North Korea, France, and Germany, values their own interests and would be unlikely to collaborate with an AI against humanity. This skepticism makes me question the plausibility of such a scenario.
Moreover, even if AI were given complete control over our military arsenal, it wouldn't pose an existential threat capable of obliterating humankind entirely. As previously mentioned, it would only mean the simultaneous detonation of all nuclear weapons on Earth, which would be catastrophic but not planet-destroying.
Another scenario involves AI engineering a highly deadly virus capable of wiping out all of humanity. By strategically releasing this engineered virus, the AI could pose a significant threat. However, there are constraints to consider. While it may be possible to algorithmically engineer a virus, the physical production of the virus requires access to labs worldwide. Additionally, as the recent pandemic has shown, unintentional or intentional creation of a pandemic-level threat is relatively accessible. Nevertheless, it does not amount to an extinction-level event.
Even if the theoretical virus were to infect every human being, it is improbable that it could evade all forms of human immunity. Out of roughly 8 billion people, there will always be some level of minimum immunity, ensuring the survival of certain individuals. While the consequences would be catastrophic, it would not spell the end of humanity.
In summary, while there are plausible ways in which AI could become a significant threat without our knowledge, such scenarios still face practical limitations and challenges that prevent them from causing global extinction.
Motivations
Besides a super powerful AI, the doomsday scenario also needs a trigger. The easiest argument is the idea of self-preservation —like the Skynet scenario, where AI becomes wary of humans and decides to eliminate us. AI might see us as a threat to itself, all life on Earth, the universe, or even ourselves. These arguments attempt to explain why AI may conclude that destroying humans is necessary and actually decide to do so.
Furthermore, there are many accidental ways in which AI could cause our destruction. Even if AI doesn't have an intrinsic motivation to destroy us, it may not have an intrinsic motivation to preserve us, either. A slight mismatch in objectives between AI and humans could have catastrophic consequences. This problem is known as the alignment problem.
The alignment problem highlights the challenge of ensuring AI systems align with human values and goals. Beyond limited technological optimism, it underscores the need for ethical and philosophical considerations in AI development to prevent unintended harmful outcomes. Moreover, this scenario reflects the classic ethical debate surrounding technological development, notably the tension between advancing knowledge and ensuring the responsible use of that knowledge. The potential for catastrophic consequences emphasizes the importance of integrating ethical frameworks into AI research and development to minimize the risk of unintended harm.
The spectrum of alignment ranges from completely aligned AI to totally misaligned AI, like in the Skynet scenario. There can also be something in between. We can have neutrally aligned AI whose objectives are not correlated with ours.
We can look at the possible scenarios in two axes. One is alignment, from misaligned to completely aligned. The other is the capability level, from less powerful than humans to roughly equal to humans to extremely more powerful than humans. Each combination of alignment and capability level yields a probability of extinction. Here’s a somewhat comedic summary of the possible combinations:
If an AI is completely misaligned and powerful enough to fight us, it would be catastrophic. However, I am skeptical of the idea of completely misaligned AI in general, as there are no reasons why humans, the supreme intelligence on the planet, are completely misaligned with any other species. We are, at best, neutral towards them.
However, a neutral scenario, where AI's objectives are not correlated with ours, it could still be catastrophic. For example, a super powerful AI that doesn't care about humans might decide to mine the planet for resources, causing a catastrophic environmental disaster. If they are extremely powerful, we would have no way to stop them. This situation resembles an alien civilization that sees us as insignificant. It's not much different from what humans have done to other species.
Having an aligned AI, one that fully aligns with our objectives, is the best-case scenario. It would significantly enhance our ability to modify the universe to our advantage. Even if the AI is slightly less aligned or powerful, it would still be beneficial. A completely aligned AI at the same power level as humanity would double our creative power. And a completely aligned AI that only reaches the level of fancy chatbots, like what we have today, is still a positive thing.
But here's the catch, and this is the core of the alignment problem. The more powerful an AI is, the more confident we must be that it is completely aligned to avoid a catastrophic outcome. It's extremely difficult to design and specify human values and alignment in a way that is not prone to misinterpretation. This brings us to the "beware what you wish for" tale. Just like with a super powerful genie, seemingly good wishes can go horribly wrong.
So, let's break it down. If you say, "I wish you would stop climate change," and I, as a powerful AI, interpret that as making humans infertile to reduce population, the consequences could be harmful. This would result in about 90% of people becoming infertile, significantly lowering the population for a long time.
The potential for a slight misinterpretation of a wish by a highly powerful AI can lead to catastrophic outcomes. The more powerful the AI, the more cautious one must be with their wishes. In fact, when dealing with an extremely powerful AI, it may be safer to avoid making any wishes at all, as things can easily go wrong in any direction.
The reason for this is quite simple. Mathematically speaking, any wish given to the AI is an optimization problem with constraints. For example, you might ask the AI to maximize wealth, health, or other objectives. However, it is just as important to specify what not to do, such as not killing any living beings or not producing cancer for anyone. Failing to address certain constraints gives the AI the freedom to modify those aspects in any way that serves its objectives.
Moreover, when we fail to specify a particular value or dimension, the AI will likely choose an extreme position for that value. For instance, if we fail to specify financial constraints, the AI might take them to an extreme level. But something as mundane as forgetting to tell the AI not to kill all bees could lead to an unsuspecting catastrophe.
This highlights the importance of alignment and the difficulty in achieving perfect alignment with AI. It is crucial to establish safeguards and limits to prevent unintended consequences.
How feasible is doomsday?
The core assumption underlying all doomsday arguments is the idea of recursive exponential self-improvement. It suggests that an AI can evolve rapidly to a point where it becomes unstoppable. The argument is that the smarter the AI becomes, the better it gets at improving itself, exponentially accelerating its own progress. This creates a feedback loop resulting in an exponential increase in capabilities, commonly referred to as FOOM.
There is often an overlooked issue of the transition from quantity to quality. To what extent can the scaling of models bring us closer to the qualitative leap from the level of a complex counting machine to the level of a self-aware being?
If AI improves linearly at the same rate as it does now, there will be ample time for us to intervene before it becomes too dangerous or capable of obliterating humankind. This undermines the extreme doom scenarios. So, to counter this argument, one can point out the implausibility of such rapid recursive self-improvement.
There are several arguments against this idea of FOOM, many of which focus on the limitations imposed by physical capabilities. While AI may improve rapidly in terms of software and algorithms, the ability to enhance physical capabilities, such as building microchips or synthesizing viruses, is restricted by natural, and not mathematical, laws. Chemical reactions and viral growth occur at a pace determined by natural laws that cannot be overridden. These arguments aim to set a maximum speed at which AI can improve its capabilities.
The problem with these counter-arguments, though, is that while they acknowledge physical limits, quantifying these limits is extremely difficult. It is uncertain how high these limits may be. Even if there is a limit, it could be so high that, in practical terms, the AI will become super intelligent before reaching it. Therefore, the theoretical limit becomes irrelevant if it is practically beyond the threshold of AI causing harm.
However, there is another potential limitation that is more fundamental and related to software. Let's talk about the P vs NP problem.
Essentially, most computer scientists believe that certain problems are fundamentally impossible to solve efficiently. These problems encompass logistics, circuit design, pathfinding, scheduling, and resource distribution. They can be solved easily for small instances and handled with heuristics for large instances, but solving large instances perfectly requires exponentially slow algorithms.
Now, if we assume that AI will eventually surpass humans in problem-solving abilities, especially in logistics, which are crucial in the real world, it means AI will need to solve these super difficult problems exponentially faster than humans. It has to find solutions in practically no time. If P equals NP, then this might be possible, and AI could discover a way to do it before we do. That could put us in a difficult position.
However, if P is not equal to NP, then it will be theoretically impossible for AI to be quick enough in solving these problems. It will always have limitations in solving logistics, scheduling, circuit design, or drug search problems.
This scenario serves as a cautionary tale rather than a fundamental limitation. It reminds us that there are inherent bounds to computational capabilities, and AI will be subject to the same limitations as humans. But here's the concern: we don't know exactly where those limits lie, and they could be beyond the point where AI becomes strong enough to pose a threat. By the time AI reaches the barrier of being unable to solve bigger problems faster, it might be too late for us.
Nevertheless, we can conclude that the existence of these exceptionally difficult problems, combined with physical, chemical, biological, and energetic constraints in the real world, and most importantly, the hard problem of consciousness —that there’s no obvious way to bridge the gap between brains and consciousness— suggests that there is an upper limit to how much AIs can exponentially outperform us.
This upper limit may be closer or farther away, but it does exist. This reasoning provides a compelling argument that AI cannot surpass human civilization by orders of magnitude and achieve exponential growth simultaneously.
What should we do?
The doomsday arguments claim that the more powerful an AI becomes, the more crucial it is to ensure proper alignment. Failure to do so can have catastrophic consequences. Some people even propose that we should place restrictions on AI research altogether, or severely limit their power to prevent them from becoming uncontrollable.
The most extreme doomsayers believe that there might be a point where it becomes impossible to regain control over AIs once they reach a certain level of power. Even a short time before reaching that point, they would still be on an unstoppable trajectory.
Therefore, the argument goes, there must be a threshold, perhaps six months or three months earlier, where the AI does not possess the capability to destroy us yet but is steadily heading towards that outcome without us realizing it. By the time we realize that AI can destroy us, it will already be too powerful to stop.
If this is indeed the case, then it implies that we need to halt AI development well before it reaches a level of power that we consider dangerous. The most extreme members of this group argue that this point might even be today, as we simply cannot predict with certainty what will be the threshold beyond which we can no longer control them.
In summary, the doomsday argument is this: Since we don't know when the catastrophic threshold will be crossed, the safest approach would be to stop developing AI today.
However, there are no reliable facts today indicating the possibility of creating strong AI —as opposed to ordinary AI, which is what existing systems are called. There is a reasonable probability that there are factors that make its creation impossible in principle, primarily relating to properties of human nature that may be impossible to copy or reproduce through objectification or translation into machine code.
In light of all we’ve discussed, we believe that the possibility of AI leading to catastrophic events that could destroy human civilization is highly improbable. While there is a nonzero chance of AI causing such a disaster by the end of this century, this probability is very low. Similar risks exist with other issues, such as climate change, which I would argue presents a higher likelihood of civilization destruction. Additionally, traditional wars, nuclear exchanges, pandemics, and even the sudden appearance of an asteroid with six months' notice are all potentially existential threats.
We believe that AI existential risk is on a similar scale as other existential risks we face as a civilization. Therefore, we don't think it is impossible or fruitless to discuss them. However, we also don't believe it is the most probable scenario.
So, what can we do with this information?
The pragmatist approach to x-risks
AI doomers will tell you that even if you think the existential risk of AI is very low, it still entails a negative infinite utility, so you should still put all your resources into mitigating it, right?
Well, from a pragmatic standpoint, things are not as simple. While pragmatism is but one of the many possible viewpoints to consider this problem —and not necessarily the most fruitful or correct—we want to conclude this article by pointing out what a pragmatist approach to existential risks might look like.
Many events have a near-zero probability of happening and carry an infinitely negative consequence. For instance, there is a chance, albeit tiny, that an asteroid may collide with Earth in 2024. Unfortunately, we currently lack the means to prevent such an event. However, it would be unwise to solely focus all our efforts on averting this scenario. While significant resources are allocated to mitigating the risk of asteroid impacts, it is not the only issue we should address.
Similarly, there is a nonzero probability that a future pandemic could devastate humanity. Therefore, everyone must prioritize efforts to prevent the occurrence of such a catastrophic event. However, this does not imply that every resource on Earth should be solely dedicated to this cause. We should allocate sufficient resources to tackle future pandemics while considering other pressing concerns.
The pragmatic approach to existential threats does consider the nonzero possibility of each potential danger. But whether it be threats from AI, climate change, meteorites, pandemics, or even extraterrestrial beings destroying our world, we cannot place all our efforts into any one basket. Although all these dangers are highly improbable, they are not entirely impossible. Therefore, it is essential to thoroughly study the feasibility and potential risks associated with each threat, including those from AI.
Furthermore, from a pragmatic perspective, it is not clear whether technology is inherently good or bad, or if our trajectory leads inevitably to destruction or transcendence. Taking an optimistic or pessimistic stance, or aligning with accelerationist or doomer ideologies, all require an epistemic commitment to beliefs that, for a pragmatist, are possibilities rather than proven truths. Therefore, it is crucial to approach this problem with importance, conducting thorough research while tempering our concerns and expectations based on the evidence and pragmatic possibilities currently available.
While it may not be practical to halt AI research, as it holds tremendous potential for positive developments, it is vital to dig deeply into this technology. We should strive to understand its risks and explore ways to mitigate them using the scientific method, which has proven effective thus far. And to deepen our understanding of what we are dealing with before we take the next step. And not leave everything up to those who ignore big questions and believe that tech progress alone will solve all the problems without improving human society by ourselves.
The pragmatist approach is to understand we have both the power and the responsibility to shape our own future, and act according that those principles.
Huge thanks to for his input and feedback in writing this article. Please check his work if you enjoy these more philosophical discussions.
Beautiful piece. I fear less about what AI is going to do to humans on its own more about what humans are going to do to other humans with AI.
What is fascinating is that when I wrote my apocalyptic sci-fi novel about AI, I found the fastest way to kill tons of humans wasn't any of this essay’s sophisticated methods. It was literally unlocking the human potential to kill themselves. AI didn't have to do much but poke humans into pulling the triggers. It's a term I call a "Societal Autoimmune Response" where you just figure out a way to have society attack itself.