Why Large Language Models Cannot (Still…

Alejandro Piad Morffis

Sep 19, 2024

Despite what OpenAI tells you...

30 Comments

Terry underwood

This post makes a lot of sense, Alejandro, at least to this layperson. I have had a sense for a long time that LLMs are fundamentally different from humans and will never emulate human thought. Your point here, if I understand you, is that natural language as a pragmatic human invention cannot be perfectly modeled in mathematical terms. Thought and language in humans are not stochastic, not random like an LLM seeking probabilities for meaning irrespective of its reference in reality. Humans think with language specifically looking for an empirical reference in reality, not for a meaning which is probably sufficient for any particular moment. Imagine if humans behaved this way. They’ll say one thing now because it’s within the weighted mean, but another thing in a few minutes when the weights change or the wind blows. LLMs are trapped by language, playing the odds. They’re like magicians, karaoke singers, card sharps. Imagine an LLM as a very talented magician. This magician can mimic human speech incredibly well and even string together sentences that sound smart. However, no matter how well it talks, it doesn't truly understand what it's saying. Even if you teach it to "show its work" or check itself, it's still fundamentally limited by its nature as a mimic, not a thinker. it can’t identify it’s own logical or empirical errors and fix things up. It has no metacognition. Am I close? Btw, this is damn good technical writing. It’s disciplined, organized, clear—yet your voice is personal, at home, comforting for laypeople like me which helps us tolerate our own inadequate background knowledge and STILL “get it.” I’m hoping I did:)

Expand full comment

Alejandro Piad Morffis

Hey Terry. You're too kind, as usual 😃. I think your analogy really nails it. By their very nature, these are stochastic parrots, extremely good at producing seemingly coherent language but little else. There is nothing intrinsically bad in that, all tools have their limitations, we just have to be aware of where they start to break.

Expand full comment

Terry underwood

I can’t see the harm in AI. The harm derives from the human. If there is harm, it’s because the user doesn’t understand the tool or sets out to do mischief. Your article truly is well designed. I haven’t seen any do what this one does in terms of keeping the factual complexity re how LLMs work while communicating the substance to those of us who don’t have the depth of knowledge to comprehend everything fully. That’s tough to do. Your attitude is patient and encouraging, the voice I’m referring to. I’m not kind when I give writing feedback unless it’s genuine. I’m rarely unkind. You and Suzi are really good writers, and your competencies are in the same genre. Very cool to see here on Substack

Expand full comment

Alejandro Piad Morffis

Regarding potential for harm, 100% agree with you. Tech isn't necessarily good or evil by itself, although some tech may be designed with intention, but humans always have the power and the responsibility to choose how to best use the tech we have available.

Expand full comment

Alejandro Piad Morffis

❤️❤️❤️

Expand full comment

Excellent analysis as usual

Expand full comment

Alejandro Piad Morffis

Thanks :)

Expand full comment

Thank you for very interesting article. In your last section about LLM using external tools, you explained that "This conversion requires programming syntax and logic knowledge that may not be intuitive for an LLM trained primarily in natural language data."

Can we train an intermediatery LLM that has a full understaing of the programming syntax and logic required to pass valid request to the external system?

Expand full comment

Alejandro Piad Morffis

Yes I think something like that should be part of the solution. You make the decoding restricted by the programming language grammar and train the model to generate only semantically valid ASTs. Still I'm not sure if that'll be enough given the stochastic sampling part, since there are many valid programs that would compile and run but only one would have the right semantics. I think something along the lines of RL guided by unit testing or other forms of program validation might be part of it.

Expand full comment

Michael Kowalik

It appears that reasoning requires reflexive consciousness, which has special ontological criteria that no algorithm can satisfy. LLM’s are merely faking it, non-reflexively, unconsciously, and can be deterministically looped to self-destruct:) https://michaelkowalik.substack.com/p/openai-on-the-question-of-vaccine

Expand full comment

Michael Woudenberg

There's a concept I'd like to explore in the near future (maybe it's a collab opportunity) but there's a behavior amongst the techno optimists that strikes me as a "Wish for Sentience" like Gepetto wishing Pinochio were a real boy. It's more than just anthromoporphizing. It has a feeling of hubris at being a god-like creator. Or perhaps, since many of them are childless it feels like, we are merely dealing with misplaced nurturing?

Expand full comment

Hi Ale, awesome as always! I really enjoyed this article, it goes directly to the point and explains it in great detail. By seeing the arguments, I suppose some of the options to "fix" this reasoning problem could be:

1-) Make the LLM generate code and then run the code, as you have mentioned in other posts and people on these comments. However, what I do not like about this one is that given a prompt, you are trying to generate a computation to solve the problem, but that only helps for computation problems and does not help with the not computable ones, which you can get better or worse answers depending on better reasoning.

2-) But I am curious if it would be possible to also tackle it as an optimization problem. Like that given a prompt a Network (which is not just the LLM, or it could even not be an LLM), explores the search space of all plausible answers and picks the correct one. In this scenario the System is taking longer depending on the question, since during inference is running another optimization problem to find the right answer which the time will depend on the optimization and space of answers. Of course, I am just saying it to a high level, and there are many issues in this, as anyone can see, like how to explore "all plausible answers." I am not even sure if it is possible, but it is something I will try to explore in the future.

Thanks for the great article as always!

Expand full comment

Alejandro Piad Morffis

I'm absolutely convinced program synthesis is the path to general AI, and per P vs NP, I'm also convinced program synthesis requires search, so maybe the LLM can be the search controller, but there's definitely an interative process here. On the other hand, positing program synthesis is kind of cheating because of course if you can synthesize the right program for any problem you can solve everything that's solvable. The problem is you need to generate provably correct code so maybe the LLM generates Coq or other semi decidable language, and you search for the solution + the proof at the same time.

Expand full comment

I agree :)

Expand full comment

Great article. Thanks for that.

Question: So how do we call or think about the point at which the LLM decides which tool or approach to use?

For example, the model fails to count the R’s in strawberry when it simply tries to do it, but it can pretty easily write the code and produce a React app based on one prompt that counts and highlights the instances.

So the failure is in the LLM not “realizing” or making the decision to use code.

Expand full comment

Alejandro Piad Morffis

That's a great example. Code generation so far is just another insurance of pattern matching, the model will generate code when it sees prompts similar to those that during training resulted in a code response. So we can teach models to generate code even if not explicitly asked, but this still doesn't circumvent the basic limitations of the stochastic language modelling paradigm: it may fail to pick the right pattern.

Expand full comment

Yes, I understand that. But it seems the general perception is that the model is getting smarter when/if it uses code when it makes sense.

At the same time, I get that this decision wouldn’t be based on reasoning.

Expand full comment

Or, the model is acting less intelligent and/or is kind of overestimating itself thinking it can get the Rs in strawberry right.

Expand full comment

Alejandro Piad Morffis

The model doesn't know what it doesn't know. One of the hardest things to get right in AI. These models have a closed-world paradigm, everything that is not explicitly true, it is assumed false. There is no place in the architecture to model uncertainty, and token probabilities are not calibrated, meaning the odds ratio of different tokens doesn't necessarily map to the uncertainty between those two choices.

Expand full comment

Alejandro Piad Morffis

It kinda is. I see it as a system 1/system 2 kind of architecture. The learning part is system 1, everything the model knows from training and everything the model can decide on via "intuition" (i.e., matrix multiplication). When system 1 detects a problem that requires reasoning, it invokes system 2, which is code generation.

Expand full comment

Nice! I haven't learned much about LLMs, so these posts are very educational.

Expand full comment

Alejandro Piad Morffis

Thanks 🙏

Expand full comment

Very interesting stuff! It seems like there will never really be any "there" there.

I'm sure you've thought a lot about emergence, but it's only recently really, truly arrived on my radar. It's like all I can think about now. I love the tantalizing idea that reality itself could be emergent, and that certainly includes everything to the north of that baseline. Things emerge everywhere... could logical reasoning potentially emerge here through some means we're not aware of?

You can sort of see that I'm just really wrapping my mind around all this stuff, but I tried to jot my thoughts down here: https://goatfury.substack.com/p/emergence

Expand full comment

Alejandro Piad Morffis

Oh man, emergence is one of the most intriguing things in the universe. Are there really properties that simply cannot be reducibly explained? Or is it just a lack of imagination or knowledge on our side? Or even the lack of a proper language to even talk about the thing? Take consciousness, for example. How does non-conscious stuff like neurons turns conscious when lumped together on just the right way?

Expand full comment

Yeah dude, I am fully on Team Emergence! Ever turn any focus toward RQM (Relational Quantum Mechanics)? The central idea is intriguing, and pretty much exactly what we're describing - the ultimate emergence, of mass and matter and particles and all that.

Expand full comment

Alejandro Piad Morffis

Rings a bell but I can't tell you if I saw something in Wikipedia or Youtube. Will definitely check it out.

Expand full comment

So you're saying it was a bad idea to outsource my entire decision-making process in all areas of my life to ChatGPT several months ago? That would explain a lot!

Expand full comment

Alejandro Piad Morffis

Nah it just depends on how complex your decision making already was.

Expand full comment

Phew, luckily, I never made a complex decision in my life. I'm safe!

Expand full comment

Alejandro Piad Morffis

Living the life of a state machine. That is the way.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts