Mostly Harmless #2: The Future of AI is Open Source
Why the future of artificial intelligence, and large language models in particular, will be built on open-source foundational models.
I strongly believe in the potential of open source. Open source software offers numerous benefits over closed source. Over the past 30 years, we have witnessed the growth of this movement from being a fringe ideology to becoming a widely embraced mindset, even by the world's leading companies.
Nevertheless, not everything can be open source. Building a business around software requires having some proprietary elements that can be monetized for profit. However, many successful business models involve a hybrid approach, where certain portions of your codebase are open-sourced while others remain closed-source.
One effective approach to capitalize on open source is releasing your codebase as an open-source project, allowing you to leverage the community effect. In addition, you can offer premium services such as cloud hosting and enterprise solutions, including single sign-on and customer service. This is particularly advantageous for those who prefer not to self-host.
This model is widely employed in the realm of backend-, platform-, and infrastructure-as-a-service. It is prevalent in various domains, including database systems and productivity tools like GitLab. While you can self-host these services, selling their cloud version is often their main business model.
The hybrid model combines the advantages of the open-source and closed-source models. With open source, you benefit from the community effect, as well as a large number of beta testers and early adopters, which improves the reliability of your product. Additionally, public development allows you to receive feedback and reports on platforms like GitHub. Even small contributions, such as documentation or user examples, greatly enhance the open-source model.
On the other hand, maintaining a closed part of your application has its benefits. For instance, you can close your user interface while releasing the backend and core functionality. By offering a cloud-hosted user interface, along with advanced features like drag-and-drop interfaces and logging administration, you can cater to enterprise users willing to pay for these services. Furthermore, you can also provide customer service, on-premise deployment, and develop client-specific plugins or components.
As AI becomes the foundation of the new Software 2.0 paradigm, the discussion between open- and closed-source software becomes central issue. What does open-source AI look like? Are there clear benefits to open-sourcing at least some part of your AI stack? Can you gain more by giving away more?
In this issue, I want to explore these questions, focusing on the rise of large language models as the backbone infrastructure for a significant part of the AI applications of the near future.
Mostly Harmless is a premium newsletter.
Upgrade your subscription to unlock all past and future posts.
You can also unlock this single issue for less than a cheap cup of coffee.
What does open-source AI look like?
Open source is progressively dominating the realm of software, starting from the core layers and expanding towards the infrastructure layers of computing. The primary type of software likely to be open source is infrastructure software. For instance, operating systems, virtualization software, and many drivers are open source. Moving up the hierarchy, development tools also fall within the realm of open source, as they form a slightly higher infrastructure level.
However, regarding consumer applications, open source does not excel similarly. Open-source compilers tend to be superior, but image, video, and audio editors are not competitive compared to closed-source alternatives such as Photoshop or other Adobe products. Similarly, open-source video games are far behind their closed-source counterparts regarding entertainment value. The closer the software is to lower-level implementation details, the more beneficial it becomes to adopt an open-source approach.
Now, let's examine how this applies in the context of Artificial Intelligence. First, it's important to note that the development tools and frameworks in AI, such as TensorFlow and PyTorch, while open source, are not the core of my argument —their open-source nature primarily stems from their classification as development tools, even though they are part of the AI stack.