Jul 28, 2023·edited Aug 3, 2023Liked by Alejandro Piad Morffis
Thisnwas great, very well explained and from a point of view I didn't read before.
Just a thought before I forget to mention, that could be interesting to discuss in the future. I would like if at some point there is a discussion on how the theory of ML might fit with Complexity theory. For instance, most of us believe P!=NP. However, if Metric TSP has a polynomial approximation then P=NP.
So a question I ask myself usually is what happen if we find a ML algorithm that approximates TSP amazingly well, it cannot be categorized as a polynomial algorithm unless is a Decider of the problem, which means 100% accuracy.
So I always have the question, is it possible to reach this 100% in a problem like this (probably not) and if not, then Why? What would be a mathematical proof for that, again from theorethical point of view, assuming we have all the data that we might need, all the features that we need, etc.
Wait, metric TSP has a 2-approximation, doesn't it!?
Anyway, the general question is super interesting. I'm not aware of how complexity theory factors in probabilistic correctness. That's something I'll have to look into.
Basically, it's one thing to say you always have a bounded approximation (for some bound) than saying with very high probability you can have a polynomial solution. I'm sure there's a complexity class for that, i.e. problems you can solve polynomially with some bounded probability. But I don't know the relationship between those complexity classes and the traditional P and NP.
First, yes I put the wrong name my bad, I was talking about general TSP. Second, for the probabilistic aspect I have other question apart from P and NP that is part of some things I have been thinking lately I will put it in other reply. But the original question was more in the sense that I am not familiar how is Complexity Theory studied in ML.
For instance, imagine that you have a LLM for TSP , and you have all the data in the world and all the features that can be extracted from a graph no matter how big it is but extracted polynomially. Then you spent 50 years training that big LLM, even hundreds of years and pass generation to generation hehehe. Then it finally finished training, now in inference is extremely fast, extracting the features is polynomial because we enforce it, and inference is very fast (I do not know how to put a bound for inference that is one question).
So from this example come the questions:
1-) If the inference is polynomial and we assumed that we have all data need it for the ML to generalize amazingly, then if we prove somehow that the ML approximates polynomial TSP solution for all cases then is it P=NP?
1.1-) Can be proven that an ML algorithm has generalization that approximates polynomially the solution to a problem assuming all data? Or to the contrary it can be proven that this never will happen no matter the problem?
1.2-) If we can prove 1.1 somehow, then the ML algorithm is polynomial in inference, but was ETERNAL :) during training. Then for Complexity theory we have an algorithm that is not polynomial in the traditional sense, but in the practice perspective it is. Which is a paradox since it makes TSP polynomially approximated in practice and therefore P = NP in practice, but in theory we haven't found a polynomial algorithm for TSP since the graph is the input during inference, but for the entire algorithm the input was the dataset + the graphs during inference.
2-) Can actually be proven that the inference of an ML algorithm is polynomial? How a prove for that will look like?
Again is all in the air, I do not even know if the questions make sense, maybe there is a straight answer like: Dah, for this theorem you cannot obtain polynomially bounded generalization in a regression problem and therefore none of this makes sense hehehehehe.
Good questions! So, here's my superficial understanding at the moment.
First, you can definitely prove that a machine learning model inference cost is polynomial in the size of the input. All neural networks are, in fact, because it's just multiplying the input times a huge but finite number of matrices, so that's trivially polynomial although with a very large constant. I cannot think of any ML model that's not polynomial in inference, actually.
Now, regarding training, the problem is that your input is no longer an example, but the whole dataset. And then, e.g., all NN training is polynomial in dataset size, trivially (a constant number of epochs).
So LLMs are polynomial to train (wrt to dataset size) and polynomial to run (wrt to input size). Incidentally, that's actually one of the reasons why transformers are trivially not Turing-complete. There's no while-true in there.
Now to the latter question, can we prove that the performance of a probabilistic model is a polynomial approximation of the optimal solution? E.g., can we prove that when an LLM outputs, say, a list of edges that form a cycle, are those edges a polynomial approximation of TSP?
Seems super hard to do for me because, even if we had some formal guarantee that, say, more than 95% of the edges always belong to the optimal cycle (which I have no idea how to prove, but suppose that's what 95% correct means), still the other 5% allows me to build an arbitrarily bad solution. So I don't see how we can escape the fact that anything less that 100% will be able to include arbitrarily bad solutions.
Again, this is just off the top of my head, I need to read a lot more on this topic. But you just opened a can of worms for my next Algorithm Design semester :)
Very easy to understand, and not the most simple subject. Nicely done, Daniel and Ale!
Thanks man. We put a lot of effort to get this as simple as possible without dumbing it down.
As Einstein put it, "as simple as possible, but not one bit simpler."
Thisnwas great, very well explained and from a point of view I didn't read before.
Just a thought before I forget to mention, that could be interesting to discuss in the future. I would like if at some point there is a discussion on how the theory of ML might fit with Complexity theory. For instance, most of us believe P!=NP. However, if Metric TSP has a polynomial approximation then P=NP.
So a question I ask myself usually is what happen if we find a ML algorithm that approximates TSP amazingly well, it cannot be categorized as a polynomial algorithm unless is a Decider of the problem, which means 100% accuracy.
So I always have the question, is it possible to reach this 100% in a problem like this (probably not) and if not, then Why? What would be a mathematical proof for that, again from theorethical point of view, assuming we have all the data that we might need, all the features that we need, etc.
Thank you for such a great article
Wait, metric TSP has a 2-approximation, doesn't it!?
Anyway, the general question is super interesting. I'm not aware of how complexity theory factors in probabilistic correctness. That's something I'll have to look into.
Basically, it's one thing to say you always have a bounded approximation (for some bound) than saying with very high probability you can have a polynomial solution. I'm sure there's a complexity class for that, i.e. problems you can solve polynomially with some bounded probability. But I don't know the relationship between those complexity classes and the traditional P and NP.
First, yes I put the wrong name my bad, I was talking about general TSP. Second, for the probabilistic aspect I have other question apart from P and NP that is part of some things I have been thinking lately I will put it in other reply. But the original question was more in the sense that I am not familiar how is Complexity Theory studied in ML.
For instance, imagine that you have a LLM for TSP , and you have all the data in the world and all the features that can be extracted from a graph no matter how big it is but extracted polynomially. Then you spent 50 years training that big LLM, even hundreds of years and pass generation to generation hehehe. Then it finally finished training, now in inference is extremely fast, extracting the features is polynomial because we enforce it, and inference is very fast (I do not know how to put a bound for inference that is one question).
So from this example come the questions:
1-) If the inference is polynomial and we assumed that we have all data need it for the ML to generalize amazingly, then if we prove somehow that the ML approximates polynomial TSP solution for all cases then is it P=NP?
1.1-) Can be proven that an ML algorithm has generalization that approximates polynomially the solution to a problem assuming all data? Or to the contrary it can be proven that this never will happen no matter the problem?
1.2-) If we can prove 1.1 somehow, then the ML algorithm is polynomial in inference, but was ETERNAL :) during training. Then for Complexity theory we have an algorithm that is not polynomial in the traditional sense, but in the practice perspective it is. Which is a paradox since it makes TSP polynomially approximated in practice and therefore P = NP in practice, but in theory we haven't found a polynomial algorithm for TSP since the graph is the input during inference, but for the entire algorithm the input was the dataset + the graphs during inference.
2-) Can actually be proven that the inference of an ML algorithm is polynomial? How a prove for that will look like?
Again is all in the air, I do not even know if the questions make sense, maybe there is a straight answer like: Dah, for this theorem you cannot obtain polynomially bounded generalization in a regression problem and therefore none of this makes sense hehehehehe.
Good questions! So, here's my superficial understanding at the moment.
First, you can definitely prove that a machine learning model inference cost is polynomial in the size of the input. All neural networks are, in fact, because it's just multiplying the input times a huge but finite number of matrices, so that's trivially polynomial although with a very large constant. I cannot think of any ML model that's not polynomial in inference, actually.
Now, regarding training, the problem is that your input is no longer an example, but the whole dataset. And then, e.g., all NN training is polynomial in dataset size, trivially (a constant number of epochs).
So LLMs are polynomial to train (wrt to dataset size) and polynomial to run (wrt to input size). Incidentally, that's actually one of the reasons why transformers are trivially not Turing-complete. There's no while-true in there.
Now to the latter question, can we prove that the performance of a probabilistic model is a polynomial approximation of the optimal solution? E.g., can we prove that when an LLM outputs, say, a list of edges that form a cycle, are those edges a polynomial approximation of TSP?
Seems super hard to do for me because, even if we had some formal guarantee that, say, more than 95% of the edges always belong to the optimal cycle (which I have no idea how to prove, but suppose that's what 95% correct means), still the other 5% allows me to build an arbitrarily bad solution. So I don't see how we can escape the fact that anything less that 100% will be able to include arbitrarily bad solutions.
Again, this is just off the top of my head, I need to read a lot more on this topic. But you just opened a can of worms for my next Algorithm Design semester :)
Truly masterful. It provides a truly intuitive approach to understanding the basics of ML.