The Hardest Problem

The most important theoretical question in Computer Science asks about the existence of truly hard problems, that can't be solved efficiently regardless of how powerful is your tech.

Jan 25, 2024

This article is part of a chapter on Computational Complexity in my upcoming book The Science of Computation. You can read all the details here:
Book Teaser - The Science of Computation
Alejandro Piad Morffis
·
January 20, 2024
Read full story
And you can support the book by preordering your digital, DRM-free copy, with early and frequent updates, including all future editions.
Get early access - $10

What is a Hard (Computational) Problem?

There are many problems in computer science for which, even though they seem complicated at first, if we think hard enough, we can come up with clever algorithms that solve them pretty fast.

A very good example is the shortest path problem. This is basically the problem of finding the shortest path between two given points in any type of network. For example, you have a map with millions of places and intersections in your city, and you have to go between one point at one extreme of the map and another at the other end of the map.

Our cleverest algorithms, on average, have to analyze very few options to get you the shortest path. They are very fast. This is what powers your GPS route planner, for example.

However, other problems can sometimes be very similar to the first type, but no matter how hard we think, we cannot find any clever algorithm that always works.

The quintessential example of this is the traveling salesman problem. Consider a similar setting: you have a city and a bunch of places you need to visit in one tour. The question is, what is the best possible tour —the one that takes the least time or travels the least distance?

This problem seems very similar to the shortest path problem, but if you solve it by starting at one point and traveling to the closest next location iteratively, you can end up with a cycle that is several times worse than the best possible cycle.

There are approximate solutions that you can try in very constrained settings. Still, in the general case, no one has ever developed a clever algorithm that solves this problem any faster than checking all possible cycles.

And the thing is, all possible cycles are a huge number.

If you have 20 cities, all possible tours amount to something close to 20 factorial, which is … (checks numbers) … well, huge. Not only that, if you have a computer today that can solve the problem for, let's say, 20 cities in one second, then 22 cities will take nearly 8 minutes, 25 cities will take 73 days, and 30 cities will take you… 3 and a half million years!

This is an exponential solution, and they become bad really fast. But for many problems, like the traveling salesman, we have nothing better —that always works.

Easy versus hard problems

The first type of problem, like the shortest path, are called P-problems —technically, polynomial time complexity, but don’t worry about that. P basically means problems that are easy to solve.

Now, for the second type of problem, we don't really know whether they are P-problems, but there is a subset of these that has an intriguing characteristic.

If I ask you to find me a cycle with, say, less than 100 km in total, that will be extremely hard. However, if I tell you, “here is a cycle that takes less than 100 km” you can easily verify it —just add up the length of each road.

These types of problems are called NP problems —technically, non-deterministic polynomial time complexity, but again, forget about that. NP basically means problems that are easy to verify.

So, the most fundamental question in computer science, P versus NP, ultimately is the following:

P vs NP: Are all problems that are easy to verify also easy to solve? Or are there problems that are intrinsically harder to solve than to verify?

Intuitively, for many, it seems it must be the latter. Think about what it would mean to easily solve any problem that is also easy to verify. In a sense, it would be as if recognizing a masterpiece would be the same difficulty as creating the masterpiece in the first place. Being a great critic would be the same as being a great artist.

This seems false, but intuitions in math and computer science are often wrong. We cannot use this kind of analogy to reach any sort of informed conclusion.

Are there really hard problems?

However, theoretical reasons hint at the possibility that P is indeed distinct from NP —that there are some truly, fundamentally difficult problems. The strongest one is the existence of so-called NP-complete problems. Problems, like the traveling salesman, so difficult to solve efficiently that if you could solve one of them, you would solve all the other NP problems at the same time.

When we say a problem is difficult, we mean that it is exponentially easier to verify a solution's correctness than it is to find the solution in the first place. This concept is crucial because, for most problems in computer science, such as sorting, searching for elements in an array, or solving equations, the effort required to solve the problem is roughly the same as the effort needed to verify the solution.

In technical terms, the advantage of verifying over solving is only polynomially easier, not exponentially. However, there are certain problems, like the traveling salesman problem, for which we have been unable to find a fast and efficient algorithm – only exponential ones. Nevertheless, these solutions are very easy to verify.

The heart of the P versus NP debate lies in whether these inherently difficult problems truly exist. Do problems exist that are far more challenging to solve than to verify? To answer this question fully, we would need to find a problem for which a polynomial time algorithm cannot exist.

P vs NP remains an unsolved question, and although it has seen a lot of progress, it appears to hint at the need for a new kind of mathematics. Our current mathematical methods lack the power to tackle these challenging meta-questions. However, we do have the next best thing —a wealth of empirical and theoretical evidence suggesting that, indeed, many of these problems may be unsolvable efficiently.

One of the simplest forms of empirical evidence is the vast number of problems for which we lack a polynomial time algorithm to solve them. However, this evidence is not very strong, as it could simply mean we aren’t smart enough to find those algorithms.

What we need is a more principled way of answering this question. When mathematicians are faced with an existential problem —in the sense of, does there exist an object with these properties, not in the sense of, you know, God is dead and all—what they do is try and find extreme cases that can represent the whole spectrum of objects to analyze.

In this case, we are dealing with finding problems that are difficult to solve. So it makes sense to ask, “What is the most difficult problem possible?” A problem so hard that if we crack it, we crack them all.

That’s the idea behind NP-completeness. Let’s focus on these super tricky problems – the toughest ones in this field. If there are problems so tough that solving just one of them would also solve all the others, that would seriously challenge the idea of P equals NP.

These really tough problems are called NP-complete problems, a concept defined by Stephen Cook in the 1970s. An NP-complete problem is basically a problem that is NP —that is, easy to verify—, and a solution to it in polynomial time would also give us a solution to any other problem in NP. So, in a way, these are the only problems we need to look at. If we solve one of these, we've proven P equals NP. And if we can't solve just one, we've proven P not equal to NP.

In short, we just need to focus on NP-complete problems. So, the main question then becomes: Are there problems so complex that they are essentially equivalent to all other problems?

Obviously, the hardest problems wouldn't revolve around everyday topics like finding paths in maps, sorting numbers, or solving equations. No, these problems should be much more abstract, so that they could encompass numerous other problems in their definition.

One problem to rule them all

Cook's idea was to create a problem centered around problem-solving itself. For instance, how about simulating a computer? That would be an incredibly abstract problem. To make it even more challenging, they could turn this computer simulation into a decision problem —keep in mind that NP problems are decision problems.

One way to define this problem is to imagine having an electronic circuit that computes some logical formula —this is all computers do at their core, as all computable arithmetic can be reduced to logic. Let's take the simplest logical circuit, a completely stateless one, and ask: Is there any input that will make this circuit output True? If so, we call the circuit satisfiable. Otherwise, it is deemed unsatisfiable.

Hence, given an arbitrary logical circuit, the task of determining if it's satisfiable or not is called circuit satisfiability, or Circuit-SAT for short.

In 1970, Stephen Cook proved that Circuit-SAT is as hard or harder than all other NP problems. If you can solve circuit satisfiability, you can solve any other problem in NP. This means you can solve the traveling salesman problem and the knapsack problem, but also sorting, searching, and basically any easily verifiable problem.

The proof of this idea is quite involved and complex, and not something I can fully explain in this post, so I'll save that for a more detailed discussion later. But basically, since logical circuits are extremely expressive and powerful, and can compute almost everything, any decision problem in computer science can be transformed into a logical circuit that simulates it solution. Cook’s proof actually involves constructing a circuit for an abstract NP problem, so it’s pretty technical. But the intuition is that these things are basically computers, so you’re just simulating any possible (decision) algorithm.

And voilá! This proves the existence of at least one NP-complete problem, meaning there is one problem in NP that is as difficult as all problems in NP. Stephen Cook's work in 1971 essentially kick-started the entire field of computational complexity and defined the most important question in computer science: P versus NP.

The story, however, doesn't just end there. Circuit-SAT is great, but it is too much of an abstract problem. Just one year later, Richard Karp came along and demonstrated that 21 well-known and very concrete computer science problems were also NP-complete problems. These problems included the traveling salesman problem, the knapsack problem, various scheduling problems, and many graph coloring problems. In short, there turned out to be a whole bunch of problems that fell under this category, not just some abstract circuit simulation task.

These NP-complete problems aren't just theoretical issues, either. They are practical problems that we encounter regularly in logistics, optimization, and scheduling. After Karp proved his 21 original NP-complete problems, a wave of people started proving that nearly any problem in computer science involving combinatorics could be classified as NP-complete. As a result, there are now over 4,000 papers proving different NP-complete problems, ranging from determining the best move in Tetris to folding proteins.

This compelling evidence has led many computer scientists to believe that P != NP and that there are, indeed, problems that are fundamentally harder to solve, not just because we lack some understanding about them but because, by their very nature, they just cannot be solved efficiently.

What does this mean?

If it turns out that if P != NP, then there are some fundamental limits to how fast a computer can solve the majority of the most important problems in logistics, planning, simulation, etc. While we have many heuristics to solve either some cases perfectly or most cases approximately, all these problems may ultimately be unsolvable quickly.

Not even a superadvanced alien —or computer— could be exponentially faster than us. Thus, P vs NP might be our best weapon to stop a self-improving AGI from reaching superhuman capacity. Even gods can’t escape complexity theory.

Ping

Wow this is a really well written article and a far better introduction to P vs NP and complexity than what my university course had.

Expand full comment

1 reply by Alejandro Piad Morffis

Ernesto

Feb 5, 2024Edited

This explanation is amazing! It will be my go-to recommendation for anyone looking to start learning about this. Personally, I gave it a try and initially switched to P vs PSPACE because it intrigued me. Although I haven't given up, my focus has shifted to more important things. I don't want to burn myself out on a problem that might remain unsolved for over 50 years.

Based on my experience, here's a list of topics I'm personally interested in reading from you over the next years. I also believe some could be included in this discussion or a future one, or even in a book.

1-) A fantastic future article could delve into Memory. While P and NP focus on time, memory is equally crucial. Exploring the contrast in memory since PSPACE = NPSPACE, even if the proof is challenging, could be insightful. It might be interesting to understand why this proof can't be applied to P and NP.

2-) Discuss the balance between memory and time. The difficulty in proving P vs NP and P vs PSPACE lies in limiting algorithm capabilities when more memory can be added. Classic algorithms often show that higher memory bounds lead to lower time bounds. However, the question remains: can problems like PSPACE-Complete problems be solved in polynomial time, regardless of how much polynomial memory is given to them?

It's interesting to draw parallels with the current challenge we face with LLMs. Despite providing them more memory and time for training, we don't know if, no matter how much we give them, they'll reach the pinnacle of AGI or acquire emergent abilities we're not aware of yet. It feels like we're encountering the same problem repeatedly :)

3-) And in the future when those exist then it could be good definitely to see discussions about the Hierarchy Theorems and their implications and once more why that proof is not applicable to P and NP because of the relativization.

4-) Finally, show a problem that we are actually certain is intractable no matter what we do because it belongs to EXPSPACE the EQ_REX problem: Two regular expressions are equivalent with exponentiation. In theory no matter how much we advance with AI an AGI cannot solve fast and efficiently this problem if is using classic computation to solve it of course.

I have many topics in mind but we can discuss it in the future this message is too long already hehehehe.

Thanks again for the enlightening discussion!

4 replies by Alejandro Piad Morffis and others

24 more comments...