Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3.5): Advanced Concepts of Function Calling
In our previous article, “Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3): Basics of Function Calling,” we introduced the basic concept of function calling in AI systems, explaining how function calling works at a high level.
Now, we’re taking things a step further to explore the advanced technical mechanics behind function calling — how it really works.
This article will unravel the deeper layers of function calling like examining the role of schemas and techniques like token-level guidance, fine tuning & etc in ensuring structured outputs needed for function calling.
We’ll also discuss how function calls are orchestrated in real-world implementations.
Overview of Function Calling
Function calling enables AI models like OpenAI’s models to dynamically interact with external systems by generating structured outputs, such as JSON. These outputs provide the necessary parameters for APIs or functions, allowing the system to fetch or process data. The results are then integrated back into the conversation to generate accurate & in-context responses, maintaining context and flow.
Example: Weather Assistant
Here’s a more detailed breakdown of how function outputs are incorporated into the AI’s conversation flow:
Step 1: User Input
The user asks:
“What’s the weather like in Paris?”
Step 2: Model Generates Structured Parameters
The AI uses a predefined schema to generate only the required parameters for the API:
{
"location": "Paris",
"unit": "celsius"
}
Step 3: API Call
The application takes these parameters and calls the weather API:
def get_weather(location, unit):
response = requests.get(
"https://weatherapi.example.com/current",
params={"location": location, "unit": unit}
)
return response.json()
The parameters are passed directly:
api_response = get_weather(location="Paris", unit="celsius")
Step 4: API Response
The API returns the weather data:
{
"temperature": 18,
"condition": "sunny"
}
Step 5: Integrating the Output
The application injects the API response into the conversation by appending it to the previous user input :
Messages History Update:
[
{"role": "user", "content": "What’s the weather like in Paris?"},
{"role": "tool", "content": "{\"temperature\": 18, \"condition\": \"sunny\"}"}
]
Step 6: Generating the Final Response
The AI takes the updated message context and generates a user-friendly reply: “The weather in Paris is 18°C and sunny.”
Key Points
Now how are AI Models guided to generate the Structured Outputs Needed
1. Prompt Engineering for Structured Output
One way is to use prompt engineering to generate structured outputs like JSON or Python dictionaries for function calls.
Frameworks like LangChain and LlamaIndex have use this technique by embedding system prompts that explicitly instruct the model to follow a strict format.
These system prompts can be added dynamically to ensure the model adheres to pre-defined schemas.
What Are System Prompts?
System prompts are contextual instructions provided to the LLM to shape its behavior.
For example:
User role: "You're a weather assistant. Generate JSON outputs."
System role: Adds predefined rules to ensure structured output.
Example Prompt
Here’s an example of a prompt used to enforce structure:
{
"role": "system",
"content": "You are an assistant. Always respond with a JSON object following this schema: { 'name': 'string'}."
},
{
"role": "user",
"content": "What is John Doe's information?"
}
In this scenario, the LLM interprets the context to generate responses in the exact JSON format defined.
2. Fine-Tuning for Function Calling
Fine-tuning is another layer that makes LLMs better at structured outputs and function calling.
How It Works
Pre-training exposure: During pre-training, LLMs see structured data formats (e.g., JSON, XML).
Fine-tuning examples: Developers train the model with examples of function call scenarios:
Input: "Retrieve weather data for Paris."
Output: {"city": "Paris"}
Steps for Fine-Tuning
Define a Dataset: Create examples of queries and their desired structured responses.
Train the Model : Through Fine-Tuning
Test and Validate: Ensure the model adheres to output structure and knows when to trigger function calls.
3. Token-Level Guidance
Token-level guidance involves controlling how LLMs predict outputs token by token to get LLM to output the result in specific structured format.
How It Works
Token-level guidance is a critical technique used in ensuring that the outputs of Language Learning Models (LLMs) adhere to predefined structures such as JSON, XML, or other schemas during the generation process through innovative mechanisms like Finite State Automata (FSA) and rejection sampling, applied during inference.
Here’s a breakdown of how it works and why it’s a game-changer for structured outputs
1. Finite State Automata (FSA): Controlling Generation at Every Step
At the heart of token-level guidance is the Finite State Automata (FSA), a model of computation that defines valid states and transitions during generation.
Think of an FSA as a flowchart that guides the output:
States: Each valid part of the output corresponds to a state (e.g., {, “key”:, value).
Transitions: Define what the next valid step is based on the current state.
Example:
State 1: { (start of JSON object).
State 2: “key” (a string in quotes).
State 3: : (colon after a key).
State 4: “value” (value matching the schema).
State 5: } (close the JSON object).
By integrating the FSA into the token generation process, the system ensures that only tokens allowed by the schema are considered valid.
领英推荐
How FSA Guides Token Generation
During generation:
The model predicts tokens based on probabilities (e.g., {, “key”, or an invalid string).
The FSA validates each predicted token:
If valid: The token is accepted, and the generation moves to the next state.
If invalid: The token is rejected, and the model tries again (via rejection sampling).
2. Rejection Sampling: Enforcing Valid Tokens
Rejection sampling is a technique where the system:
Proposes a token based on the LLM’s predictions.
Validates the token against the FSA rules.
Rejects invalid tokens and repeats the process until a valid token is produced.
How It Works
Token Proposal: The LLM predicts tokens based on probabilities (e.g., { at 90%, Hello at 5%, and invalid characters at 5%).
Validation:
The FSA checks if the proposed token matches the schema.
For example:
Valid: { at the start of a JSON object.
Invalid: Hello when only { is allowed.
Rejection:
If the token is invalid, the system masks it and forces the LLM to propose another token.
Acceptance:
A valid token is accepted, and the process moves to the next state.
3. Integrating Token Control Without Model Changes
The beauty of this approach is that it doesn’t require modifying or fine-tuning the underlying LLM. Instead:
The FSA and rejection sampling act as a middleware layer during inference.
The LLM remains unchanged, making this technique compatible with any pre-trained model.
Advantages
Accuracy: Guarantees schema-compliant output without retries.
Efficiency: Operates entirely during inference, avoiding costly re-training.
Flexibility: Works for various structured formats like JSON, XML, YAML, and even domain-specific formats.
Practical Example: Weather Bot Schema
Let’s explore how this process works with a weather bot:
Schema Definition
{
"type": "function",
"function": {
"name": "get_current_weather",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" },
"unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
},
"required": ["location", "unit"]
}
}
}
Step-by-Step Generation Process
Initial State:
The FSA only allows { as the first token.
The LLM predicts {, and it’s accepted.
The FSA expects a quoted string (e.g., “type”).
The LLM proposes “type”, and it’s accepted.
2. Colon Validation:
The FSA requires “:” after “type”.
The LLM proposes “:” , and it’s accepted.
3. Value Validation:
The FSA expects “function” as the value for “type”.
The LLM predicts “function”, and it’s accepted.
4. Repeat for Each Key-Value Pair:
The process continues, validating “parameters”, “location”: “San Francisco”, and “unit”: “fahrenheit”.
Tokens like Hello or an invalid character are rejected at each step.
5. Final Output:
The FSA ensures the closing } completes the JSON object.
{
"type": "function",
"function": {
"name": "get_current_weather",
"parameters": {
"location": "San Francisco",
"unit": "fahrenheit"
}
}
}
Schemas in Function Calling
Schemas define the structure and validation rules for the LLM’s output. JSON Schema is a common format for defining function input parameters.
OpenAPI Schema Example
OpenAPI schemas provide detailed definitions of the parameters, including data types, descriptions, and constraints.
Example:
{
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["c", "f"]}
},
"required": ["location", "unit"],
"additionalProperties": false
}
}
How Schemas Help
1.Parameter Enforcement: Ensures that the model generates valid parameter values.
2.Response Validation: The system validates the LLM’s output against the schema before calling the function.
Conclusion
Function calling is a powerful feature that allows AI models to interact seamlessly with external systems, providing dynamic and context-aware responses. By understanding how structured outputs, schemas, and system logic work together, developers can build more robust and efficient applications.
If you’re curious about how retrieval works — another tool used to extract information from external PDFs and documents — be sure to check out my articles on retrieval mechanisms and knowledge bases linked below!
Additional Resources
Articles on Retrieval :
Read my detailed article on implementing function calling with the OpenAI Assistant API: Create an AI Chatbot: Function Calling with OpenAI Assistant.