Mostly Harmless #5 - Beyond the Chatbot Revolution
Here's what a truly powerful generative AI could do, and it's so much more than fancy chatbots
It seems like everybody's integrating chatbots everywhere. Sure, there are many examples of applications where a conversational interface is better than any previous alternative. A classic example is for improved non-professional search. I you want to search for something factual, asking a conversational agent may be a far better experience than typing keywords in Google because you can ask follow-up questions. So, baring hallucinations —a topic for another day— we can all agree there is merit to adopting a conversational interface, at least in some cases.
Improved search is a very exciting use case that works reasonably well, but there are many other cases in which natural language is far from the best input. When we’ve needed to give machines precise instructions, we have built task-specific interfaces to interact with them at a level of control beyond what natural language can provide.
Now that we can build super powerful conversational agents, there is a misconception that natural language is the best and ultimate interface for all tasks. We can forget about buttons, sliders, dials; everything you can achieve by clicking some control on a traditional UI, you could do it just with language, right? This bias is part of why we are placing overly high expectations on chatbots as the fundamental engine of the AI revolution.
Mostly Harmless Ideas is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Language as an interface
One of the reasons we want to use chatbots for everything is because humans use language for everything. However, language didn't evolve specifically for giving instructions. It evolved as a means to communicate within a community, collaborate, agree, and sometimes provide instructions for activities such as hunting and building.
However, the majority of human language is not well-suited for precise instructions. When communicating with machines or even between humans, we resort to using a subset of language that is better suited for instructions. For instance, we invented mathematical notations to give precise meaning to formal claims.
The ultimate embodiment of this idea is programming languages. They are highly restricted to prevent misinterpretation, as the syntax enforces the semantics, ensuring that a computer —or another human programmer— cannot interpret any part of the language in a way unintended by the original programmer. And this is why programming languages remain the primary way to give computer instructions.
Now, of course, all of that is changing with language models because until now, we couldn't even give a computer complex instructions in natural language. The best we could do was some sort of pattern matching with keywords, and the most successful pre-LLM application of natural language is search.
When searching in Google or any search engine, you are writing a query that resembles natural language. However, it doesn't necessarily have to be a question or a well-formed sentence. You are writing a semi-natural language request that triggers a process —that doesn't require understanding the full meaning of that request, but it does require finding at least some partial meaning of the words that you're using— to instruct Google to search a vast database of the whole internet and give you a very summarized subset of that database.
Search is just a way to instruct a computer with natural language —or something close to natural language— to perform a concrete task: finding a given set of documents. But we all know that search is far from perfect, and sometimes, it's tough to narrow down search terms to pinpoint the exact thing you want. This is why advanced search engines have filters for dates, topics, tags, sorting, etc. You have many controls over the search beyond just the natural language because it would be too cumbersome to say, "starting last week, sort by whatever field."
Well, of course, now we have large language models, and it seems we are almost at the point where we can give very precise instructions to the computer in natural language, and the computer will do what we want. It could fully understand the semantics of the language.
Whether probabilistic language modeling allows for full natural language understanding or not, that's a question for another essay.1 For this article, let's assume that language models, either in the current paradigm or with a more advanced paradigm in the near future, reach a point where we have full natural language understanding. We can tell the computer exactly what we want it to do, it will be transformed flawlessly into instructions in some formal language, and the computer will execute those instructions.
Suppose you could ask Photoshop, your favorite video editing software, or any application to do whatever a human expert can. Would that be the best possible interface?
I claim it isn’t. Even perfect natural language understanding is far from the best possible interface for many tasks. There is something even better than perfect NLP, and we may be closer to achieving that than this perfect understanding of natural language.
Let’s talk about the true power of generative AI.
If you can’t or don’t want to get a full subscription, you can unlock this single post in the following link for a minimal fee (requires Telegram).
Keep reading with a 7-day free trial
Subscribe to Mostly Harmless Ideas to keep reading this post and get 7 days of free access to the full post archives.