The State of AI for Software Development
Tools of the Trade, and Why You Should Still Learn to Code...
This article is based on Chapter 5 of my in-progress book Mostly Harmless AI.
Few developments in the generative AI space have been as exciting lately as the rise of code generators. The evolution of these AI coding assistants is best understood not as a single leap, but as a progression of capabilities, moving from simple autocomplete to what may one day be fully autonomous agents.
At their core, code generators are Large Language Models trained on vast amounts of public code. They treat programming languages just like human languages, learning the patterns, syntax, and structure to predict what comes next. These models can take a natural language prompt and some contextual code and produce new code that mostly aligns with the prompt's intention. For example, you can provide a function signature and a comment like, "This function finds the minimum of an unsorted list," and the model will generate the function's body.
This uncanny ability to comprehend and generate code based on human communication is transforming the development landscape, but it also requires special considerations, as code is not just another natural language.
In this article, we will explore the landscape of AI for software development. We will begin by looking under the hood, examining the spectrum of capabilities that allow AI to generate code. Next, we will explore the use cases for developers across the development lifecycle. Then, we will discuss some important things to keep in mind, from hallucinations and security to theoretical limitations of AI for coding. Finally, we will look to the future of coding, consider how the developer role is evolving, and try to answer one crucial question: is coding dead?
How to Make a Code Generator
Let's imagine we are building our own code generator from scratch. The journey from a simple code predictor to a sophisticated development partner is a journey of adding layers of capability, moving up a spectrum of increasing autonomy.
The first thing we want is next-token prediction for code. The foundational layer is built on unsupervised training, making it essentially autocomplete on steroids, like a super duper IntelliSense. We start by training a model on vast amounts of code, teaching it to predict the most likely next token based on the immediate context. If we have variables or functions declared nearby, our model is more likely to generate code that references them, simply because that's the most common pattern in its training data.
Now that we have a basic generator, our next step is to teach it to follow instructions. To do this, we can compile a dataset of instruction pairs—for example, a natural language command like, "In the previous code, change the loop to be more efficient," paired with the corrected code. By training on these examples, our model learns to go beyond simple prediction and follow specific, human-given directions. We can further enhance this process with Reinforcement Learning, where we have human or automatic evaluators rank different code outputs. This teaches our model to not only generate syntactically correct code, but also to respect desired styles and naming conventions.
Our generator is getting smarter, but it's still limited by the immediate context. To give it a long-term memory, we move to context-aware generation. We can dramatically enhance our model's ability to generate relevant code by allowing it to pull from a broader context. This is a form of Retrieval-Augmented Generation (RAG) for Code. We can index an entire codebase or external API documentation, allowing our model to find relevant examples and patterns. When a developer asks a question, our system retrieves these examples and feeds them into the prompt, allowing the model to generate accurate code by combining and refactoring snippets from the provided context, even for libraries it wasn't explicitly trained on.
So far, our model can only write code. To up our game, we can give it the ability to interact with its environment by making it use external tools. This represents a significant leap. We can equip our model with a set of tools it can invoke on demand. For example, in response to a prompt like "add a library for charting," our model could invoke a tool to install the missing dependency in the project. If you ask it to "check whether this works," it could invoke another tool to run the unit tests and report back the results. It could even use tools to directly modify files in the codebase. By giving our model the ability to take actions beyond just generating text, we empower it to participate more actively in the development process.
The final step on our journey is to give our code generator a bit of autonomy, creating what’s called an agentic system. This is the most advanced and forward-looking form of AI-based code generation. We can design an AI agent that takes a high-level goal, breaks it down into sub-tasks, writes code, generates tests, runs the code, and then analyzes the output or errors. Based on the results, it can then debug or modify the source code in a continuous loop, acting as a semi-autonomous developer to see a task through from start to finish.
Beyond these training methodologies, we can leverage the formal nature of code to improve our model's performance. Unlike natural language, code has strict syntactic rules that can be programmatically checked. One simple but effective technique is trial and error during inference: we can have our model produce several potential code snippets, run them through a linter, and automatically reject any that have parsing errors.
More advanced techniques can pre-process the training data, for instance by normalizing all variable names to a generic format like var0
, var1
, etc. This makes it much easier for our model to learn the structural relationships in code without being distracted by specific naming conventions, and we can substitute the actual names back in a post-processing step. These tricks leverage the fact that we are dealing with a very restricted syntax to make it easier for our language model to learn the rules.
Finally, to create the ultimate specialized assistant, we can go beyond RAG and fine-tune a model on a specific codebase. While RAG provides external context, fine-tuning actually updates the model's internal weights. By training a model on a company's entire private and proprietary codebase, we can create a version that has deeply internalized that organization's specific architectural patterns, internal APIs, and coding standards. This results in an AI partner that not only answers questions correctly but does so in a way that is idiomatic and aligned with the team established practices.
Use Cases for Developers
Understanding the engine is one thing; knowing how to use it is another. AI offers you a powerful toolbox that you can apply across the entire software development lifecycle. In this section, you will explore practical use cases across three key phases: Ideation and Design, Implementation and Development, and Verification and Explanation. You will see how you can use both sophisticated, LLM-based coding tools integrated directly into your IDE, as well as techniques that you can use with standard, general-purpose chat apps like ChatGPT or Claude, requiring no special integration at all.
Phase 1: Ideation and Design
Before you even dare writing a single line of AI-generated code, you can already use LLMs as powerful brainstorming partners for exploration and design. This is one of the most accessible ways you can use AI, as it doesn't require a specific tool or editor extension; you can do it effectively using general-purpose conversational AI applications like ChatGPT, Perplexity, or Gemini. Models with live browsing capabilities are often even better for this phase, as they can pull in the latest information about new frameworks, libraries, and design patterns.
The key is for you to treat this phase as an interactive exploration. Instead of asking for a single, ready-made answer, you should guide the model through an ideation process. Here's how you can do it: use a chain-of-thought approach by asking the model not just for a solution, but to "think step-by-step" through the pros and cons of different architectural choices.
A powerful pattern you can use is to ask the model to generate several variants—for example, "Propose three different ways to design the database schema for a social media app." Then, you can discuss the options back and forth, using self-critique prompts to have the model compare the alternatives it just generated. At the end of this collaborative session, you can ask the model to provide a structured summary of all the design decisions you have agreed upon, acting as an executive design document.
With this document in hand, you can then start a new, more focused session with a code-oriented model for the actual implementation.
Phase 2: Implementation and Development
The most straightforward way you can use AI in this phase is for generating short, self-contained code snippets. This can be for a well-known algorithm, a common pattern, or the use of a well-documented API. This is a task you can accomplish with any standard chat app, even outside your IDE. This is especially powerful for navigating the complex world of APIs and libraries.
As a professional programmer, you probably aren't spending that much time doing basic coding, like inserting numbers in a list. No, reality is 90% of the code you write is interface code with some external library you may not know well. Instead of manually searching documentation, you can simply ask an AI assistant, "How do I use this library to make a query that does X?" and get a ready-to-use snippet.
The next level of integration is bringing AI directly into your IDE. This can start with simple code completion, but the real power comes from integrating a full chat experience. This allows you to highlight a block of code and ask for specific changes, such as, "Refactor this function to be more efficient," and have the model modify the file directly. If the model has RAG capabilities and can scan your entire codebase, it gets even better. The modifications and additions it suggests will be consistent with your existing coding style and use your own libraries and methods, making the integration seamless.
At the far end of the spectrum is the full agentic mode, which is still in its infancy with tools like Cursor. This offers a much more hands-off development experience. Here, you can give a coding agent a high-level task, and it can modify several files, create new ones, and even run commands in the terminal to install missing dependencies.
Finally, you don't always need a full IDE. For one-off scripts or quick prototypes, you can use the "Canvas mode" in apps like ChatGPT, Claude, or Gemini. These provide a simple editor-like interface where you can iterate back and forth with the model to update a script. Some tools even allow you to run these scripts directly in the cloud, letting you build and test disposable web apps instantly.
Working with these tools introduces a new core skill, an AI-in-the-loop coding workflow—the day-to-day interactive process of collaborating with an AI. It involves an iterative cycle of prompting with a clear goal, carefully reviewing the AI's output, correcting its mistakes or flawed assumptions, and then re-prompting with more specific instructions or feedback.
Phase 3: Verification and Explanation
To ensure your code quality, you can use an AI to help generate a wide range of test cases. This is especially useful for uncovering corner cases that might not be immediately obvious, such as handling empty inputs, maximum values, or unusual user behaviors. You can do this with integrated AI coding tools, but it can also be as easy as uploading your codebase or relevant files to a standard chat app and asking it to suggest test cases.
You can ask for both code-based tests (like unit and integration tests) as well as descriptive tests (like user stories or manual testing scripts). In all these scenarios, it helps to instruct the model with a Chain-of-Thought prompt, asking it to first explain what behavior it wants to test, and only then provide the actual test. This ensures the tests are intentional and well-understood.
Furthermore, when you're faced with a cryptic error, you can use AI for debugging. You can feed the AI the error message, stack trace, and relevant code, and it can analyze the context to suggest potential causes for the bug and possible fixes, acting as an experienced pair programmer.
The opposite, code-to-language direction allows you to create powerful new workflows for understanding code. You can ask an AI for automatic documentation of functions or for natural language explanations of a complex code fragment. This can be done directly inside your IDE with an integrated tool, or with standard chat apps. For example, some tools allow you to connect a public GitHub repository and ask high-level questions on the fly, which is very good for getting a quick overview of a new codebase.
A particularly valuable use case is in legacy code modernization. One of the biggest challenges in the software industry is maintaining and updating old codebases. You can use AI to tackle this problem by feeding it legacy code (e.g., from an old COBOL or Java system) and asking it to analyze the logic, add explanatory comments, or even translate the entire system to a modern language and architecture. This can dramatically reduce the cost and risk associated with modernizing critical systems.
However, you should be aware of the critical gap between syntax and semantics—that is, between understanding what the code says versus what the code does.
The weaker models are mostly limited to describing what the code is saying syntactically (e.g., "this variable is changed to this array position"), this capability is improving all the time. More powerful models can often provide higher-level, semantic explanations of what the code is doing (e.g., "this loop is ensuring the first part of the array is always sorted"). But even the best models may not be able to grasp the full architectural details or business logic of a complex application.
Putting It All Together
Putting this all together, let's see how a complete workflow might look for tackling a specific, somewhat complicated feature in an ongoing app, like adding OAuth login.
First, you would start in ideation mode, interacting mostly in text with the AI. You would discuss a high-level overview of the required architecture changes, which parts of the app might be impacted, and the best libraries to use. The goal here is to produce a clear design roadmap before any code is written.
Next, you would move to implementation mode, going full hands-on with an agentic tool. You could assign the agent the high-level task from your roadmap: "Implement the OAuth login feature using the chosen library." The agent would then get to work, creating new files, modifying existing ones, and writing the necessary code. As it encounters errors or ambiguities, you would engage in a back-and-forth conversation to guide it, but the bulk of the mechanical coding would be handled by the agent.
Finally, you would enter review mode. Once the agent reports that the feature is complete, you could have a final conversation with the AI. You could ask it to analyze the git diff
of all the changes it made, explain the rationale for its implementation choices, and generate comprehensive documentation for a pull request. After your final review and approval, you would then submit the PR for human review by your team.
Things to Keep in Mind
While the toolbox is powerful, it comes with sharp edges. The most important limitation in language modeling, in general, has been called the problem of hallucinations. In the context of code, this means AI-generated code is not infallible and can contain subtle bugs that require constant vigilance.
Hallucinations and Mistakes
The simplest way you can see hallucinations is when you get code that uses a new variable that doesn't exist or fails to close a parenthesis. Unlike with natural language, you can often detect these syntactic errors automatically with a linter or compiler, so many of the more harmless hallucinations are not relevant as they won't introduce subtle bugs.
A slightly more difficult hallucination is what we can call a semantic hallucination, where the model uses a wrong variable or function name that does exist in your codebase. In this case, you will not get a compiler error because you're using an existing symbol, but you will get the wrong behavior. This is much harder to find because it has the same problem as most hallucinations: you have to review the code and be knowledgeable enough to have been able to generate that code yourself.
The most insidious errors are logical flaws. This occurs when the code doesn't do anything obviously wrong—it uses the right variables and looks plausible—but it has some subtle logical mistake that leads to a bug. For example, finding that a variable is not updated at the right moment in a nested loop is a tricky problem even for human experts. These kinds of mistakes will introduce subtle, hard-to-detect bugs.
But even if the bugs are no worse than what a human would introduce, they pose a threat because of "automation bias." When you check code written by humans, you expect bugs. But when you're looking at machine-generated code, the only way programmers have ever interacted with it has been with rule-based systems like compilers, and that code is basically without mistakes.
So even if the language model makes errors that are, on average, no worse than what a regular programmer would make, they can still be harder to detect because they won't be the exact same mistakes a human would make, and we may be less on guard.
AI's Impact on Technical Debt
The rapid generation of code by AI presents a double-edged sword for technical debt. On one hand, you can use AI as a powerful tool to reduce existing debt. You can ask it to analyze your codebase for inefficiencies, suggest refactorings, or add missing documentation and tests, thereby improving code quality.
On the other hand, the very speed of AI can create new and more complex forms of technical debt. Relying heavily on "vibe coding" to quickly generate features without rigorous human review can lead to a codebase filled with poorly understood, inefficient, or subtly buggy logic. This AI-generated debt can be even harder to untangle later, as the original human intent behind the high-level prompt may be lost.
Biases in Generated Solutions
Models trained on a vast corpus of public code from the internet will inevitably learn from outdated examples. This can lead them to perpetuate outdated practices by suggesting deprecated functions, old library versions, or inefficient algorithms that are no longer considered best practice. An AI model will also often default to the most statistically common solution it has seen in its training data.
This can stifle creativity and lead to a homogenization of code, discouraging the exploration of more elegant or contextually appropriate solutions. Finally, just as AI can perpetuate harmful societal biases, it can also reproduce human biases from the code it was trained on. This can manifest as non-inclusive language in generated comments or variable names.
Security and Licensing Risks
A significant risk is that an AI can generate code with known security vulnerabilities. If the model was trained on public code containing flaws like SQL injection or buffer overflows, it may reproduce those same insecure patterns in its suggestions, creating a major security risk for the application.
Furthermore, the use of AI-generated code introduces complex legal questions. A model might reproduce a code snippet verbatim from a repository with a restrictive open-source license (like the GPL), inadvertently pulling that license's requirements into a proprietary project. The legal ownership of the AI-generated code itself remains a gray area, creating potential intellectual property challenges for companies.
The Economics of AI Development Tools
While these AI tools offer significant productivity boosts, they are not free. For development teams and organizations, it's important to consider the practical economics of their adoption. Most advanced AI coding assistants operate on a subscription model, which introduces a new operational cost. Team leads and CTOs must perform a cost-benefit analysis, weighing the price of the tools against the expected gains in developer speed, code quality, and reduced time-to-market. The return on investment (ROI) will depend heavily on how well a team integrates these tools into their workflow and whether the productivity gains justify the recurring expense.
Theoretical Limitations
Beyond the practical issues of hallucinations and biases, there is a more fundamental, formal limitation to what we can do automatically. This is captured by Rice's theorem, a cornerstone of theoretical computer science. In short, the theorem proves that there is no algorithm that can automatically check for any non-trivial semantic property of a program.
What does this mean in practice? A "non-trivial semantic property" is basically any interesting question about what a program does. For example: "Does this program ever crash?" or "Will this function always return a positive number?" or "Is this code free of security vulnerabilities?" Rice's theorem tells us that it is mathematically impossible to build a universal program that can answer these kinds of questions for every possible piece of code.
This highlights the theoretical impossibility of perfect, automated code verification. We will never be able to build an AI that can look at code generated by another AI (or a human) and formally guarantee that it does exactly what the natural language prompt intended. That problem is, in the general case, unsolvable.
However, this doesn't mean we should give up. Engineering isn't about theoretical perfection; it's about solving the average case in the best possible way and handling the most important edge cases reasonably well. While we can't achieve perfect verification, we can get pretty far with a combination of AI-generated tests, linters, and, most importantly, expert human review.
The Future of Coding
Given these tools and guardrails, the very nature of programming is set to transform. The focus will shift from the mechanics of writing code to the art of building systems. The term "vibe coding," popularized in developer communities, captures the essence of this shift. It describes a workflow where the developer's primary job is no longer to write precise, line-by-line syntax, but to describe the high-level behavior, intent, or "vibe" of the desired software to an AI partner. The focus moves from how to do something (the specific algorithm and syntax) to what needs to be done (the ultimate outcome and user experience), leaving the mechanical implementation details to the AI assistant.
This approach is incredibly powerful for rapid prototyping, hackathons, and short-term projects. A developer can quickly scaffold an entire application by describing its components in natural language, getting a functional prototype up and running in a fraction of the time it would take manually.
However, this method has significant limitations for larger, more detailed projects. "Vibe-based" instructions are often ambiguous and can be misinterpreted by the AI, leading to code that works for the happy path but fails on edge cases. For long-term, mission-critical software, the precision, maintainability, and strict adherence to architectural standards that come from deliberate, human-led coding remain indispensable. Vibe coding is a tool for speed and exploration, not a replacement for rigorous engineering.
In this new paradigm, future developers will become experts at wielding a suite of AI tools and agents. Skills in "prompt engineering," system design, and the critical review of AI output will become more valuable than the ability to recall specific syntax. The developer's role becomes one of guidance and orchestration, knowing which tool to use for which task and how to verify the results.
Looking ahead, this elevated role may involve assigning entire features or bug fixes to autonomous agents. These agents would manage the full lifecycle: understanding the ticket, writing the code, creating tests, committing to version control, and responding to feedback from the CI/CD pipeline. This doesn't eliminate the developer but elevates their role to that of a system architect and project manager, overseeing a team of AI agents.
Beyond the changes in workflow, it's worth contemplating how these tools will change the qualitative experience of being a developer. We must ask ourselves how it feels to code this way. Does offloading the cognitive burden of syntax and boilerplate make you dumber and cause you to forget how to code, or does it free up mental space, allowing you to become even more proficient in the things that truly matter—the high-level ideas and architecture?
We must also consider the social aspects. You now have a partner that is not a human. How will this impact teamwork? Will this AI partner become a virtual member of the team, participating in code reviews and design discussions? Or will it alienate developers into more lonely roles, as they interact more with their AI than with their human colleagues? How does a senior developer mentor a junior who can always get an instant answer from an AI, potentially masking gaps in their fundamental knowledge?
These are open questions we must navigate as we integrate these powerful new collaborators into our teams.
Final Remarks
So, Is Coding Dead?
There is a real concern that if AI can write 90% of the code in 10% of the time, nine out of ten programmers could be out of a job. And yes, every time automation has reached an existing industry, some jobs are destroyed as some skills become irrelevant.
However, I claim we must not fear the advent of AI coding assistants. Here’s why.
Writing code is by far neither the hardest nor the most time-consuming part of software development. The process of making software involves understanding requirements, talking with customers, user testing, and product design, all of which are at least one order of magnitude more difficult that actually typing code.
A hundredfold boost in productivity for a task that is only 10% of the overall process is huge, but it still leaves the other 90% of the human-centric work. We will still need to understand what our customers want, guide them through designing a software product, know the user base, and find a sustainable business model.
And no, you cannot simply simulate the end user with a language model, so the AI can prompt itself into making a usable product, because your end user will still be human. Human users are slow, get angry easily, don't understand your application, and don't know what it is they don't like about it. Until an AI can really replicate what it feels like to be a human—and at that point, will we still call it “artificial”—we can’t take the human out of the software development loop—or any creative loop, for the matter.
The biggest progress in the software creation process has always been because of innovation in the human side, not the machine side. Innovation in software engineering, management, and how you get people to work together and collaborate will continue to be the most important part of the software pipeline for a long time.
Furthermore, software is an industry that is nowhere near its saturation point. We have far more need for software than the number of people who can currently write it. Increased productivity will likely be met with increased demand, creating more and better software for more users.
Every leap in software productivity—from assembly to compilers, from C to object-oriented frameworks—has lowered the barrier to entry and brought more people into programming. AI tools will likely do the same, empowering more people to create software. The modern world runs on software, and in the future, basic programming literacy may become as common as basic math literacy is today.
Most people know enough math to get by in daily life without hiring a mathematician, and in the same way, more people will know enough programming to automate simple tasks. They will learn to say to their home computer, "When I get home, I want you to turn my lights on, but only if it's night and the electric bill is not above the average," and an AI will generate the code to make it happen. This expands the field rather than shrinking it.
So, should you learn to code? Definitely. There's going to be orders of magnitude more code written in the next few years than everything we've written in history.
But even if you never end up writing a single line of code unaided by AI—like I've never written a single line of production code unaided by syntax highlighting, a linter, or a type verifier—knowing how code works, how algorithms work, and why a specific programming construction works the way it works is the same as knowing basic math. Coding changes how your brain is wired, makes you think clearer, and increases your creativity.
Furthermore, even if you are not working in the software industry, learning to code is still an immensely enjoyable experience. Being able to create something that keeps working on its own is, I think, the ultimate toy.
So if you want to make a dent in the software industry and you're wondering if AI will get you out of the picture, don't worry. That won't happen anytime soon. Learn to code, learn the fundamentals, but also learn how to use these new tools. As in every moment in human history, if you apply yourself and do your best, you will be at the top of the league, and there will be a spot for you.
This is my second review.
Could we see some worked examples for some of these scenarios?
I would love to see more actual working prompts.
--
The statement I really take issue with is:
"Putting this all together, let's see how a complete workflow might look for tackling a specific, somewhat complicated feature in an ongoing app, like adding OAuth login."
Given the limitations you have discussed, do you think it's safe to have the AI writing authorization code?
--
But I know you said, your using the AI to get an overview of the workflow. But even so, a bad architectural recommendation could result in vulnerabilities. For secure code guidance, who do you trust?
Maybe for secure code, if you're going to use AI, you can ground the AI by saying ... use only the following sources: oauth.net and owasp.
--
I'm sorry, but ha ha, but this seems like a terrible example to use. :-)
--
The best part about your book, IMHO, is when you express your strong opinion, like whether LLM can reason.
The part I like least about your book, is when you make unworked, pie-in-sky statements.
--
I've learned so much from your book. I liked learning about Rice's theorem, and how obfuscating the variable names can help the LLM learn.
Thank you for writing all this great content! First of all, love your thoughtfulness, insight and depth!
It seems like the clean-room concerns for AI-generated code are similar to the issues with Open Source use and Stack Overflow use.
I'd love to see more on contamination concerns.
1. AI generated code gets shipped to production
2. AI generated code gets cut-n-pasted into Stack Overflow and then propagated
3. AI generated code gets into Open Source
4. AI generated code is itself trained on AI generated code
5. Can AI generated code by patented? Licensed? Released as Open Source? Released as Libraries?
6. If a module has a mixture of AI collaborated code and hand-written code, how will they be distinguished?
7. How we will track down the AI generated code in our code base if the AI generated code becomes out of date (like in-line code snippets or deprecated packages) - or is the AI itself is refreshed and we need to re-gen all the AI snippets
8. What is the target project uses AI generated code, and a security scanner (auditing authority, linter, etc) also uses AI generated code - how do we know they were generated by clean-room AI's of different lineage - worse - from the same AI that can't detect it's own mistakes.
Seems like companies need to think long and hard about their AI policy and IP policy.
Anyway you're producing great stuff. I'd love to hear about some corporate tooling to handle these kinds of issues - we used to use Black Duck at my last company.
Regards!