On the Limits of Computation with Spear of Lugh
A conversation about computability, complexity theory, artificial intelligence, and the meaning of life.
This is the transcript of a fascinating conversation I had with a colleague computer scientist who goes by the Internet persona of . We talk about the limits of computation and what it means for science, AI, and life in general. I hope you enjoy and learn from it as much as I did.
Alejandro: Let's begin with some introductions. Could you tell me about yourself, your background, your professional interests, and also what your Internet persona "Spear of Lugh" means?
Spear of Lugh: I am a researcher and professor of computer science in a school of engineering (this is a new position; I was an associate professor at a French university previously). I did my PhD in theoretical computer science in which I worked on the problem of non-interference, which can be presented as follows: how can you prove that variable x does not interfere with variable y in program P? Meaning whatever the initial value of x is it should not affect the terminal value of y.
This problem has a logical counterpart: can you prove theorem T without hypothesis H? It can also be seen from a security point of view: call x public and y private, and you have proved that there is no flow of information from private to public; or from optimization: you can parallelize the computation of x and y on different CPU if they don't interfere. There are even applications in quantum computing: whether or not two qubits are entangled.
A few years back, I took a pseudonym account to feel more free to talk on social media. Not that I am hiding my identity (the Algo proposed me to follow myself on both accounts), but I didn't want my student to find me and that my public activity interfered with my academic life. My pseudo is a reference to the place I live and also to Celtic mythology. Lugh is a polytechnician god, and I liked this idea of not being closed into a single branch of intellectual activity.
I am as much interested in science as philosophy or theology. Nowadays, I am researching how to apply incentives for distributed systems. I am interested in two subjects: the inheritance of crypto and the economics of reputation.
Alejandro: Remarkable how such an abstract problem from theoretical CS can have so many applications. That's part of the reason I tell my students about the importance of solid theoretical foundations. However, it seems nowadays very few professionals in the software engineering world have strong insights into computability theory, formal languages, and so on. I wonder if you see this lack of theoretical foundations as a downside of how broad computer science has extended in terms of applications to real-world problems and what kind of issues it could bring to have vast teams of engineers with little to no theoretical knowledge.
Spear of Lugh: I don't know. Computer systems are more complex today. When I was a student, I was able to play with an operating system (minix) and see what happened. It was possible to understand what was going on from a keystroke to how the computer reacted. It is almost impossible today: you have multi-core systems, many levels of cache, etc.
So in some sense, you have to abstract away from details to do things in such a complex environment. But it is typical with students who program in Python: there are many predefined operations, and as they take little space as written programs, they intuitively think it doesn't take a long time to be executed. Everything looks like an atomic operation —which is false. But I am afraid that the new "digital generation" has a magic-like approach to programming. They push "run," and many things happen without being mastered. At least there is a huge waste in terms of efficiency. Also, some conversations sound like they're talking about perpetual motion sometimes.
Alejandro: Let's dive into that discussion, then! I feel a large part of the infatuation with large language models, for example, and other recent breakthroughs in machine learning, as incredible as they are, partly stems from a misunderstanding of basic computability principles.
On one hand, these are "just programs", albeit hugely complex, of course. On the other hand, what "just programs" means, seems to be also often both over and underestimated, sometimes simultaneously, in a single news piece. How about we break it down a little?
Could you tell me your preferred way of explaining what a computer is? What is the fundamental thing this type of machine can do that makes it qualitatively different from other machines we have invented?
Spear of Lugh: I have a view of computers that comes more from mathematics than practice. It is partly due to my academic curriculum. One interesting point about computers is that their invention is not a natural process. It is not by tinkering on mechanical computers (like Pascal or Leibniz calculator) that we invented general-purpose computers.
One way to tell the story is: there was a crisis in mathematics when they started to formalize things at the end of the XIX. Basically, it was possible to encode paradoxes like "the set that contains all sets not containing themself" which is a mathematical version of the liar's paradox. In a famous math conference, Hilbert proposed 23 problems for future mathematicians to solve. Among them was one about the fundamental properties of mathematics (it was a particular logic, but this is technical).
There were three subquestions:
consistency (the maths can't allow the proof of A and not A)
completeness (everything true is provable)
decidability (there exists a method to "mechanically" prove true things).
At the time, people thought that 1, 2, and 3 were true. Long story short, Göedel proved that 2 is false in his famous incompleteness theorem, and... Turing invented the general-purpose computer to show that 3 was false also! So the computer is the result of deep reflection on the nature of truth and provability.
An experimental way to see it is: at the beginning of the computer era, people did not know what to do with computers. This is very strange because computers are not "natural". So you have a very sophisticated machine but you don't know what to do with it! It is because it is an intellectual machine that was invented for logical reasons. Of course, the first uses of computers were easy to find: cryptanalysis and other military applications (complex computations for nuclear bombs) and space exploration (to compute trajectories, etc.).
Then there were interplays between technology and computer usage. Because of the incredible progress in computing power and information storage, more and more new applications were found. The telecommunication aspects added complexity to this: today, it is almost a caricature; every app is an app that is distributed. It means that the "system" is no longer one program running on one computer (answering a simple number as a result) but many programs on many computers. The system is the interaction between many machines (think about what happens when you book a flight ticket through the Internet).
Alejandro: One of the most fascinating aspects of the history of computer science is precisely that it is the only science (of the major ones) that's built in such a bottom-up way. This means that from the beginning, we knew more about what we couldn't do with computers than, as you say, what we actually could do. But the more philosophical question is, is this limitation something specific to this abstract device (and the physical implementations), or is it a fundamental limitation of cognition? In this sense, I think the most relevant concept to draw from is the Church-Turing thesis, (in my own words) that all we can ever mean by "computation" is captured in the concept of Turing machine (or equivalently in lambda calculus). Can we ever actually prove the Church-Turing thesis, and what should we make of it, if not? Should we be looking for more general models of computation?
Spear of Lugh: I think that having a machine performing computations helps. For instance, there is the notion of partial function in mathematics. For some value x, depending on f, f(x) might be undefined. Usually, mathematicians use the symbol "bottom" to denote such cases. But what does "there is no answer" mean when it is a machine that computes? It means that there is literally no answer, i.e., that the program does not stop. Hence the importance of the halting problem in theoretical computer science. It is not something peculiar. It is a reflection of the fact that negation and provability do not commute. Gödel incompleteness was a logical version (not everything true is necessarily provable). In other words, it is a technical version of this common-sense remark that "knowing that no" and "not knowing" are different things. It is a very deep remark that goes way beyond computer science.
Another point is: computability is the mother of all sciences. Because when you do science, you have to make predictions and check your predictions with experimental results. But the unsaid assumption is that you can actually compute in your model, otherwise, where would the prediction come from? So it means that the scope of science is limited to what is computable. And as we said earlier, not every function is computable. Epistemological humility is not an option: there are hard limits for science, at least the ones of computability.
Regarding the Church Turing hypothesis, I don't think it is possible to "prove" it in a mathematical understanding of "proving". Now there are very compelling reasons why this Church-Turing thesis is true. You have very different models of computations that every time describe exactly the same set of computable functions. One of the most striking for me is Diophantine equations.
A diophantine equation is simply a polynomial expression with multiple variables—something like x²+y²=z². You can have parameterized equations like x³+3y⁴-5z^7+a=0. Now you may ask the question: are there values for which this equation has integer solutions? For instance, here, a=0 is a value of the parameter such that there is a solution (because x=y=z=0 works). But what about a=1, etc? Now you can consider the set of all values of a such that the equation is solvable. Now you have a link between an equation and a set of integers. And it turns out that the sets that you can define this way are exactly the sets that are Turing-definable! Obviously, Diophantine equations existed millennia before Turing Machines, and Turing was not thinking about arithmetics when he defined the Turing machine model.
So the fact that all those models are defined for different reasons and that using different techniques produces the exact same result is striking. For me, it is very convincing. But it is not a proof at the end of the day.
Are you enjoying this discussion? Do you know someone else who might? Share this post with them and invite them to read Mostly Harmless Ideas!
Alejandro: This is fascinating. Assuming we got the notion of computability right, this means that, ultimately, all science is bounded by the limits of computation, right? So, in a sense, even if the universe is more than computation (whatever that means), it seems that the part of the universe we can know about in sufficient detail to make accurate predictions will forever be limited by Turing-decidability. I wonder what kind of epistemology stems from this realization. I'm not aware if modern philosophy is playing at this level of understanding, but it does put some concrete meaning to the "veil of ignorance".
Backtracking a bit, one of the central notions of computability theory is the idea that for some problems, you have to be willing to loop forever at least in some inputs, to be able to answer for the rest of the inputs at all. And one of such problems is precisely detecting which those inputs will make you loop forever! This has some very concrete implications for programming in general. It seems to imply that we can never be completely certain that our program is correct (at least for some programs). I would expect this to be a major blow to software engineering, but it seems to me that for most practical purposes, we do assume we can verify programs. Do you have any insights on the actual impact of the limit of computability for practical software development?
Spear of Lugh: The first obvious limitation is that AI will replace programmers... There is a technical result, Rice's theorem, that shows that it is impossible in general. More precisely, the following problem is uncomputable: to build a program that takes as input a nontrivial semantics description and produces as output a program that meets this semantics description.
We can have programs that produce other programs: typically a compiler. But the input of a compiler is not a semantical description of what the program should do. It is already a program, and what is asked is to produce a program with the same semantics in another language (so it is not a counter-example to Rice's theorem). By the same token, if you provide non-semantical guides (e.g., a program that halts after x steps), you may produce a program that meets those requirements. But if you come up with a description of "what" the program should do, there is no hope of automatizing this in general.
In a similar way, proving that a given program is correct with respect to a given semantics is not possible. What is proved as impossible is not that there is a proof of the correctness of the program but that you cannot automatize this proof. Now you have a third way to look at the problem: given the proof and the program, it is possible to check that the proof is correct. What is impossible is to find the proof in the general case.
Alejandro: But who's to say these limits also don't apply to human ingenuity? I mean, is the set of programs for which we cannot automatically prove they work as intended (that they implement the semantic description we provided) different from the set of programs for which a human can find a proof using whatever human cognitive skills we may have? The AI programmer might be just as bad as the average human, and that would already be something! Or do you have an intuition that humans can "find" proofs by non-computable means? I guess I'm asking, do you think human creativity is bounded by computability?
I don't mean "proof" necessarily in the formal sense, but rather in the sense that a human can be rightly convinced that a program is correct even though there is no way to prove it automatically.
Spear of Lugh: The short answer is: we don't know. We know that humans can simulate Turing Machines, but not vice versa. Now we can say a bit more, and it is surprising. If the human mind can do things that no Turing machine can do, then we will never have scientific proof of that. Indeed to have a proof we would first have to have a precise and working definition of what the human mind is. Because you cannot scientifically prove that doing a thing is impossible if you don't have a definition meeting those standards.
But a precise and working definition would be a Turing Machine, which would contradict the fact that the human mind is not Turing-computable. So we are left with something that is an equivalent of faith: we are forced to search for a Turing machine (definition) of mind that would work without knowing whether or not this search will succeed.
This is the dual of what happens in science: you are looking for a counter-example (the famous Popperian falsificationism). But here you are, waiting for a positive answer. If the answer is yes (meaning that it is possible to find a Turing machine that simulates the mind), then maybe one day we will find out; otherwise, we will search until the end of times. Just like the halting problem!
Alejandro: Haha, the Halting problem strikes again! It is, in a sense, one of the deepest notions I have come across to wrap your mind around, but it is also surprisingly intuitive once the gist of it. The idea that, for some questions, there is no shortcut to the answer. You have to take the full journey. Or, put another way, there's no way to know how a sufficiently complex process will evolve other than by running it. I think it says something beautiful about life. In a sense, the only way to know what it is about is to live it :)
Back to the AI topic, now that you brought it up let's talk a bit about how the limit of computation impacts what we think AI can or cannot do in the near term.
To start, we know neural networks are universal approximators, informally, because for any total (computable?) function, there is a large enough neural network with appropriate weights that computes an arbitrarily good approximation of such function. But we also know that the set of computable functions contains partial functions. Sometimes, you have to loop forever. And neural networks cannot loop forever, at least in the classic feedforward paradigm. This immediately rules out the idea that Transformers can be Turing-complete.
Spear of Lugh: Another remark is the following: the time to compute the answer in the LLMs is a function of the length of the question. Not of the difficulty of the question! This is a first hint that even though it is called AI it has not a lot to do with intelligence. Because we all know very small questions that are very difficult to answer... and somehow that would not appear in LLMs.
Now there is another field of theoretical computer science that is interesting: complexity. There are a lot of results on how fast a Turing machine can solve a problem. Typically it is expressed as a function linking the input size to the computation time. Some algorithms are linear, like searching for a value in an unordered list. Other algorithms are polynomic, meaning that the number of steps to solve the problem is a polynomial. For instance, a naive sorting algorithm is in n².
But you have algorithms that are exponential. The corresponding problems are told to be "untractable". It means that even if they can be solved in theory, in practice, they are not. Because as the size of the input grows, the time to solve the problem will become larger than the universe's age.
To give an idea: it is not possible to enumerate all integers on 120 bits. That is a very stupid program: you count 0, 1, 2, 3, 4 until 2¹²⁰. Doing it is physically impossible; it would take too much time and energy. But it turns out that many optimization problems —think something like doing the schedule for a college with constraints or computing the smallest path that visits cities in a list such that you never go through the same city more than once etc., seem to be in this category. AIs are just Turing machines. Whatever the way they are programmed, they are not going to collapse complexity classes. That is why the fear of a machine that would be omniscient is bizarre, to say the least.
On top of that, you have to add another layer: most things that happen in real life cannot be forecasted. Just take something like the weather two weeks from now... This is why even a superintelligent AI could not solve what is known as the "economic calculation problem". Basically, you don't know how to organize a large economy efficiently. We solve this problem as humans by failing: you build a company, and it files for bankruptcy. The ones that were doing it right thrived, but you couldn't know beforehand which one was going to succeed. It is once again this idea that you cannot compute things: you have to do things for real and adjust. But you cannot plan 257 moves ahead. Life is not like a chessboard in which moves are predetermined by rules.
Alejandro: This notion of complexity limits for AI is super interesting, and I haven't seen it discussed before. I made a short note a while back, more or less on this same idea, that for AI to self-improve beyond our wildest expectations, it would need to solve NP-Hard problems exponentially faster than we can, and that might be actually impossible unless P = NP. Still, polynomially faster AI could be terrifying! Backing up a bit, I want to explore this idea of Turing completeness in neural networks. We already ruled out feedforward networks as Turing-complete, and so LLMs. But you can kind of get away with it if you put an outer loop on the network, just like LLMs do. If you look at the output of the LLM not as the output of the function it is computing but as a scratch pad for some internal computations, such that after a potentially arbitrarily long output, the LLM would emit an [OUTPUT] token and then that's what you consider the beginning of the output... Wouldn't that be at least potentially equivalent to a Turing machine? The model could potentially learn to loop for as long as necessary before deciding what the output is. Of course, we still face the question of what kind of data would lead to learning such a general notion. Maybe even with this kind of architecture, learning to read and write from memory would be unfeasible. (There are some attempts at neural turning machines, but I don't think they got too far yet.) The question is, do you see any fundamental limitations to learning the correct turing machine for a given problem just from sufficient input/output data, given an appropriate dataset? Or is it just we lack a proper architecture?
Spear of Lugh: The question behind this is: is everything learnable? I mean, just having all scientific data up to Newton is enough to infer gravity theory? It is unclear to me (but I am convinced that any GPT couldn't do it). There is something that we haven't yet understood in those genius strokes. Also, by definition, learning works well for some probability distributions... Very less so for other things, typically fat tail distributions in which small events may change the average. If you compute the average weight of 1000 people and then add one person, it doesn't change anything, even if it is a baby or the fattest person on Earth. But if you do the same experiment with wealth and Elon Musk enters the room... So the question is rather to have a clear view of what could work and what will not work. I doubt you can cover everything with "learning".
Alejandro: I get what you mean. One simple argument maybe one could make is that any computation based on probabilistically choosing the next state will fail to loop forever if the probability of halting is not strictly zero. So even if some statistical model could learn that the most likely output for this particular input is "bottom," it could still decide to halt after a finite number of steps. In a sense, it would "hallucinate" in this answer.
Spear of Lugh: There is another remark about learning. As humans, we have the ability to observe ourselves while solving mathematical problems. We can then abstract a method without using gigabytes of learning material. It looks like the current architecture of neural networks cannot do that. There are no "introspective networks" as far as I know.
Alejandro: This has been a fascinating talk, and there are many topics I'd like to go deeper into on a future occasion. To close this conversation, I'd like to ask you a bit about the future. What do you expect the role of computer science to be in shaping the next decade or the next century? And, despite so many claims (mostly bogus, as we've discussed) that AI will replace programmers, why would you recommend our readers pursue a professional computer science career?
Spear of Lugh: Regarding the future of our digital society, this is the subject of my substack. I think that the transition to a digital society is a major one. Maybe more important than moving from oral tradition to a written tradition. In my opinion, it goes deeper than the transformation induced by the printing press. We see all institutions tumbling: academia, journalism, politics; you name it. The world 15 years ago has nothing to do with our world. Social media and smartphones have had a global impact. Maybe the most important problem comes from the fact that speech is no longer local: a tweet is instantaneously available all across the globe. AI is just adding another layer to this phase transition.
For future computer scientists, I think there are many things to do. The first one is the development of distributed systems. How do systems cooperate? How do you achieve stability, etc., are important issues. Bitcoin is one solution to the Byzantine agreement problem. There is much more to find. The guy who is going to find how to build the right architecture of incentives is going to be the next richest guy on earth. Another huge area of research opened by LLMs is databases. This new technology changes how we access data. It is still in its infancy, but you can see that it allows you to have a book with which you can talk to. There is a lot of work to do to ensure some quality of results. Interesting times!
Alejandro: Interesting indeed! This has been an insightful conversation. I want to thank you for your time, and I'm sure we'll repeat it soon ;)
Spear of Lugh: Thank you for the invitation. That was fun!
If this talk was as insightful for you as it was for me, make sure to check for similarly deep and thoughtful discussions on the relationship between technology and society. And leave us a comment if you have any follow-up questions!
I’m always fascinated by these discussions. As someone with not formal CS background, I’m always keen to fill in the gaps I have on many theoretical aspects of CS.
Here’s a thought that I’m sure is discusses in some parts of CS that someone will direct me to. I’m using my own terminology here since I don’t know the formal ones. LLMs seem to have what I’d define as “very broad but shallow” knowledge. They know facts about almost everything but can’t “reason” beyond a certain amount of logical steps. We humans, individually, have a much narrower knowledge base, but it’s much deeper. We can keep following a train if thought for as long as needed, as deep as needed. It may be intellectually hard to do so, and not everyone has the same capacity.
How much if this is a fundamental limitation of either LLMs specifically, or computation in general? I suspect there’s no definitive answer to this, but I’ll ask anyway
Very cool! I am learning to leverage collaborating here on Substack as well; there are so many great minds right here for us to tap into. I have little to contribute to the "meaning of life" speculation, but I'm riveted by the idea of what an appropriate Turing Test would look like for an LLM.