This article is part of my wip book How to Train your Chatbot, a handbook packed of practical advice—and fully working and reusable applications—on using LLMs to build all sorts of cool stuff. Get it while on Early Access at 50% the usual price.
In a previous issue, we discussed Retrieval Augmented Generation (RAG) and argued why it was the most straightforward approach to grounding LLMs on actual data, avoiding (although not completely eliminating) hallucinations, and making them more knowledgeable and reliable. However, RAG has one major caveat: the LLM has little control over the query process.
In this issue, we will learn about function calling, a technique that generalizes RAG and makes integration with any external tool not only feasible but actually easy. Whereas RAG is primarily passive, function calling is an active method where the LLM can decide to invoke a helper function for specific functionality at any point in a conversation. The most common use case is to implement dynamic querying of a private API.
For example, suppose you’re making a bot for a delivery service. To implement this functionality, you’d need elaborate prompts to instruct the bot to ask the user for the necessary data (such as the package ID) first. Then, you’d need a way to produce a well-formatted API call, invoke your API, and inject the resulting data in a prompt template to produce the final answer.
This might not be enough, though. Getting the correct answer might require more than one API call, with some back-and-forth between bot and user to narrow down the precise information the user needs.
This back-and-forth between bot, user, and API is so typical that it makes sense to abstract it into a design pattern. This is what function calling is meant to support. Instead of manually crafting detailed prompts describing your API and implementing the whole back-and-forth conversation workflow, most LLM providers already support function calling as an explicit feature.
How does function calling work
First, you define a set of “functions,” which can be anything from actual code functions (e.g., Python methods) to API calls. The underlying implementation doesn’t matter, as the LLM will never directly interact with the function. It will just tell you when and how it should be invoked.
For that to work, you need to provide the LLM with a natural language description and a structured definition of the arguments of every function. This is usually all encapsulated in a standardized JSON schema, such as the following:
{
"functions": [
{
"name": "get_user_info",
"description": "get information about what a user has
bought.",
"arguments": [
{
"name": "user_id",
"description": "The unique user identifier",
"type": "string",
"mandatory": true,
}
]
},
{
"name": "get_item_info",
"description": "get information about an item's status and
location.",
"arguments": [
{
"name": "item_id",
"description": "The unique item identifier",
"type": "string",
"mandatory": true,
}
]
}
]
}
Then, at inference time, a particular system prompt instructs the LLM to respond as usual or produce a function call. The function call is a structured response in which the LLM provides just a JSON object with the function's identifier to call and the values for all mandatory arguments. An oversimplified example might be:
The following is a set of API functions you can invoke to obtain
relevant information to answer a user query.
{functions}
Given the following user query, determine whether an API call is appropriate.
If any arguments are missing from the conversation, ask the user.
If all arguments are available, output your response in JSON
format with the corresponding function call.
Otherwise, answer to the user in natural language.
{query}
Given a prompt like the above, a well-tuned LLM should be able to determine whether, given a specific query, it needs to call an API function or not. The developer must capture these function-calling replies and, instead of outputting them to the user, call the appropriate function and inject the result back into the LLM context. Then, the LLM will produce an appropriate natural language response.
An example of a possible conversation in this fictional setting would be as follows.
First, the user asks for a specific information.
USER: Hey, please show me my latest purchases.
Given this query and an appropriate prompt like the one shown above, the LLM might recognize it needs to call the get_user_info
function, but it’s missing the user_id
argument.
ASSISTANT: Sure, I will need your user ID for that.
The user replies back.
USER: Of course, my user ID is 12345.
Since the LLM receives the whole conversation story, the second time it’s called, it will recognize that it has all the appropriate arguments and produce a function call.
ASSISTANT: {"function": "get_user_info", "arguments": {"user_id": {"12345"}}}
This time, instead of showing this message to the user, the developer intercepts the function call, invokes the API, and outputs the return value, presumably a list of purchases.
TOOL: {"function": "get_user_info", "result": [ ... ]}
Given this new information, the LLM can now answer back to the user.
ASSISTANT: You have bought 3 items in the last month...
This process can occur as many times as necessary in a conversation. With a suitable prompt, the LLM can detect when some argument value is missing and produce the corresponding natural language question for the user. This way, we can naturally weave a conversation in which the user supplies the necessary arguments for a given function call in any order. The LLM can also call multiple functions in the same conversation, giving more flexibility than a rigid RAG cycle.
Use cases for function calling
Function calling is particularly useful for integrating an LLM with an external tool that can be consumed as an API. A typical use case (which we will see next week) is building a shopping assistant for an online store that can suggest products, add or remove items from the shopping cart, provide information on delivery status, etc.
A neat trick is to use function calling for structured generation. When you want an LLM to produce a JSON-formatted output, it’s typically hard to guarantee you always get the exact schema you need—except maybe when using the best models.
However, even some of the smaller models, once fine-tuned for function calls, are extremely robust in generating the exact argument names and types for any function. Thus, if you can frame your generation prompt as an API function call, you get all this robustness for free.
But the possibilities don’t end here. Whatever service you can encapsulate behind a reasonably well-structured and fine-grained API, you can now stamp an LLM upfront and make your API queryable in natural language. Here are some typical examples:
Customer support: Integrate an LLM with a company’s knowledge base, product information, and customer data to create an intelligent virtual agent for customer support. The LLM can handle common queries, provide product recommendations, look up order status, and escalate complex issues to human agents.
Information systems: Connect an LLM to a query API that provides real-time information about a specific domain, from weather to sales to stocks. It can be used for internal tools connected to a company dashboard and to integrate a conversational-style interface with a traditional graphical user interface.
Workflow automation: Connect an LLM to APIs for various business tools like CRM, project management, HR systems, etc. Allow users to automate common workflows by querying the LLM in natural language, e.g., “Create a new Salesforce lead for this email,” “Schedule a meeting with the team next week,” or “Approve this time off request.”
Collaborative writing: Integrate an LLM with document editing and collaboration tools to assist with writing tasks. The LLM can help brainstorm ideas, provide feedback on tone and structure, check for grammar and spelling, and even generate content based on prompts. We will see an example of this use case in later articles.
Software development: When combined with language models' powerful code generation skills, another possibility opens up: connecting an LLM to code repositories, documentation, and APIs to create an AI programming assistant. Developers can ask the LLM to explain code, debug issues, suggest improvements, and generate new code based on high-level requirements. We will see an example of this use case in later articles.
The key is identifying areas where humans currently interact with APIs and information systems and seeing how an LLM can make those interactions more natural, efficient, and productive.
Some caveats and limitations
As usual with LLMs, any integration has significant caveats and limitations. Although, in general, you can mitigate hallucinations considerably, the LLM can still hallucinate a wrong function call by, e.g., passing the wrong arguments. In the simplest case, maybe you can catch that error when arguments have the wrong type or are out of range. However, subtle hallucinations might result in a function call that succeeds but isn’t the user's intention.
For this reason, in all critical systems, it is crucial that you don’t simply call an API blindly on behalf of the user, especially when doing so can have irreversible effects. For example, in a banking app, your LLM might hallucinate an incorrect destination in a transference, effectively sending the user money to an arbitrary third party. Furthermore, hackers might find a way to mess with your prompt and trigger the hallucination.
In these cases, you should always make the user explicitly trigger the final action and ensure they have reviewed and understood the implications of such action. This enhances reliability at a small cost in usefulness, turning the LLM into an assistant that fills in the data for you but doesn’t click the red button.
Another possible source of concern is when the LLM hallucinates the response, even though it made the right call and received the right data. This is the same problem we had with RAG: even if the context contains the right answer, there is no guarantee the LLM will pick it. One easy fix in many cases is to display the function result next to the LLM interpretation so the user can double-check the response.
One final caveat that may be relevant in many cases is regarding privacy. Suppose you are interacting with a private API—e.g., a banking app—using a commercial LLM as a controller. In that case, you are effectively sending your users' information to OpenAI (or any other provider) as part of the prompts, and this may include user IDs, addresses, financial details, etc. This underscores the need for powerful open-source LLMs that companies can self-host for added privacy and security.
Conclusions
Function calling can be seen as both a particular case and a generalization of retrieval augmented generation. It is a special case because it involves injecting external information into the prompt to enhance the capabilities of an LLM. It is a generalization because you can implement RAG with function calling by encapsulating your search functionality in a function call specification.
This pattern is extremely flexible and repeatable. However, to make it work, the prompt must be right. Since prompts are generally not entirely portable across different models, implementing this workflow from scratch every single time is a chore.
For this reason, most LLM services provide a native way to perform function calling, basically abstracting away the fragile prompt engineering component. Moreover, the LLM provider might have fine-tuned their model to a specific function-calling prompt and formatting. Since most LLM providers implement the OpenAI API specification, porting function calling between different providers is much easier this way.
Next time, we will take one step further in our journey to make LLMs as valuable and reliable as possible and explore the exciting world of code generation.
The more I read from your work, the better I grasp the possibility of actually programming an LLM to do something. When I read the RAG piece, I didn’t really get it. After reading this piece, I understand the importance of the LLM prompting for arguments. Now when I reread the RAG piece I’ll understand. This is all really brand new to me. You are such a good teacher, Alejandro, so clear and patient. Thanks!