登录查看更多内容

Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3.5): Advanced Concepts of Function Calling

Alozie Igbokwe

Building AI Voicebots | AI Engineer

发布日期: 2024年11月30日

In our previous article, “Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3): Basics of Function Calling,” we introduced the basic concept of function calling in AI systems, explaining how function calling works at a high level.

Now, we’re taking things a step further to explore the advanced technical mechanics behind function calling — how it really works.

This article will unravel the deeper layers of function calling like examining the role of schemas and techniques like token-level guidance, fine tuning & etc in ensuring structured outputs needed for function calling.

We’ll also discuss how function calls are orchestrated in real-world implementations.

Overview of Function Calling

Function calling enables AI models like OpenAI’s models to dynamically interact with external systems by generating structured outputs, such as JSON. These outputs provide the necessary parameters for APIs or functions, allowing the system to fetch or process data. The results are then integrated back into the conversation to generate accurate & in-context responses, maintaining context and flow.

Example: Weather Assistant

Here’s a more detailed breakdown of how function outputs are incorporated into the AI’s conversation flow:

Step 1: User Input

The user asks:


“What’s the weather like in Paris?”

Step 2: Model Generates Structured Parameters

The AI uses a predefined schema to generate only the required parameters for the API:

{
    "location": "Paris",
    "unit": "celsius"
}

Step 3: API Call

The application takes these parameters and calls the weather API:

def get_weather(location, unit):
    response = requests.get(
        "https://weatherapi.example.com/current",
        params={"location": location, "unit": unit}
    )
    return response.json()

The parameters are passed directly:

api_response = get_weather(location="Paris", unit="celsius")

Step 4: API Response

The API returns the weather data:

{
    "temperature": 18,
    "condition": "sunny"
}

Step 5: Integrating the Output

The application injects the API response into the conversation by appending it to the previous user input :

Messages History Update:

[
    {"role": "user", "content": "What’s the weather like in Paris?"},
    {"role": "tool", "content": "{\"temperature\": 18, \"condition\": \"sunny\"}"}
]

Step 6: Generating the Final Response

The AI takes the updated message context and generates a user-friendly reply: “The weather in Paris is 18°C and sunny.”

Key Points

Parameter Isolation: The AI only generates the parameters required for the API, ensuring the output is clean and easily usable.
Dynamic Injection: The function output is directly appended to the conversation.
Flexibility: This approach can be adapted to any structured API or function, making it highly versatile for real-world applications like booking systems, financial tools, or customer support.

Now how are AI Models guided to generate the Structured Outputs Needed

1. Prompt Engineering for Structured Output

One way is to use prompt engineering to generate structured outputs like JSON or Python dictionaries for function calls.

Frameworks like LangChain and LlamaIndex have use this technique by embedding system prompts that explicitly instruct the model to follow a strict format.

These system prompts can be added dynamically to ensure the model adheres to pre-defined schemas.

What Are System Prompts?

System prompts are contextual instructions provided to the LLM to shape its behavior.

For example:

User role: "You're a weather assistant. Generate JSON outputs."
System role: Adds predefined rules to ensure structured output.

Example Prompt

Here’s an example of a prompt used to enforce structure:

{
"role": "system",
"content": "You are an assistant. Always respond with a JSON object following this schema: { 'name': 'string'}."
},

{
"role": "user",
"content": "What is John Doe's information?"
}

In this scenario, the LLM interprets the context to generate responses in the exact JSON format defined.

2. Fine-Tuning for Function Calling

Fine-tuning is another layer that makes LLMs better at structured outputs and function calling.

How It Works

Pre-training exposure: During pre-training, LLMs see structured data formats (e.g., JSON, XML).

Fine-tuning examples: Developers train the model with examples of function call scenarios:

Input: "Retrieve weather data for Paris."
Output: {"city": "Paris"}

Steps for Fine-Tuning

Define a Dataset: Create examples of queries and their desired structured responses.

Train the Model : Through Fine-Tuning

Test and Validate: Ensure the model adheres to output structure and knows when to trigger function calls.

3. Token-Level Guidance

Token-level guidance involves controlling how LLMs predict outputs token by token to get LLM to output the result in specific structured format.

How It Works

Token-level guidance is a critical technique used in ensuring that the outputs of Language Learning Models (LLMs) adhere to predefined structures such as JSON, XML, or other schemas during the generation process through innovative mechanisms like Finite State Automata (FSA) and rejection sampling, applied during inference.

Here’s a breakdown of how it works and why it’s a game-changer for structured outputs

1. Finite State Automata (FSA): Controlling Generation at Every Step

At the heart of token-level guidance is the Finite State Automata (FSA), a model of computation that defines valid states and transitions during generation.

Think of an FSA as a flowchart that guides the output:

States: Each valid part of the output corresponds to a state (e.g., {, “key”:, value).

Transitions: Define what the next valid step is based on the current state.

Example:

State 1: { (start of JSON object).

State 2: “key” (a string in quotes).

State 3: : (colon after a key).

State 4: “value” (value matching the schema).

State 5: } (close the JSON object).

By integrating the FSA into the token generation process, the system ensures that only tokens allowed by the schema are considered valid.

领英推荐

Advanced Technical Analysis of Agent based RAGs:…

Dhanraj Dadhich 2 周前

Agentic AI: The Evolution from Automated Tasks to…

Karim Hijazi 1 周前

The Evolution of AI-Driven No-Code Platforms in 2025:…

Charles Skamser 1 个月前

How FSA Guides Token Generation

During generation:

The model predicts tokens based on probabilities (e.g., {, “key”, or an invalid string).

The FSA validates each predicted token:

If valid: The token is accepted, and the generation moves to the next state.

If invalid: The token is rejected, and the model tries again (via rejection sampling).

2. Rejection Sampling: Enforcing Valid Tokens

Rejection sampling is a technique where the system:

Proposes a token based on the LLM’s predictions.

Validates the token against the FSA rules.

Rejects invalid tokens and repeats the process until a valid token is produced.

How It Works

Token Proposal: The LLM predicts tokens based on probabilities (e.g., { at 90%, Hello at 5%, and invalid characters at 5%).

Validation:

The FSA checks if the proposed token matches the schema.

For example:

Valid: { at the start of a JSON object.

Invalid: Hello when only { is allowed.

Rejection:

If the token is invalid, the system masks it and forces the LLM to propose another token.

Acceptance:

A valid token is accepted, and the process moves to the next state.

3. Integrating Token Control Without Model Changes

The beauty of this approach is that it doesn’t require modifying or fine-tuning the underlying LLM. Instead:

The FSA and rejection sampling act as a middleware layer during inference.

The LLM remains unchanged, making this technique compatible with any pre-trained model.

Advantages

Accuracy: Guarantees schema-compliant output without retries.

Efficiency: Operates entirely during inference, avoiding costly re-training.

Flexibility: Works for various structured formats like JSON, XML, YAML, and even domain-specific formats.

Practical Example: Weather Bot Schema

Let’s explore how this process works with a weather bot:

Schema Definition

{
"type": "function",
"function": {
"name": "get_current_weather",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" },
"unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
},
"required": ["location", "unit"]
}
}
}

Step-by-Step Generation Process

Initial State:

The FSA only allows { as the first token.

The LLM predicts {, and it’s accepted.

Key Prediction:

The FSA expects a quoted string (e.g., “type”).

The LLM proposes “type”, and it’s accepted.

2. Colon Validation:

The FSA requires “:” after “type”.

The LLM proposes “:” , and it’s accepted.

3. Value Validation:

The FSA expects “function” as the value for “type”.

The LLM predicts “function”, and it’s accepted.

4. Repeat for Each Key-Value Pair:

The process continues, validating “parameters”, “location”: “San Francisco”, and “unit”: “fahrenheit”.

Tokens like Hello or an invalid character are rejected at each step.

5. Final Output:

The FSA ensures the closing } completes the JSON object.

{
"type": "function",
"function": {
"name": "get_current_weather",
"parameters": {
"location": "San Francisco",
"unit": "fahrenheit"
}
}
}

Schemas in Function Calling

Schemas define the structure and validation rules for the LLM’s output. JSON Schema is a common format for defining function input parameters.

OpenAPI Schema Example

OpenAPI schemas provide detailed definitions of the parameters, including data types, descriptions, and constraints.

Example:

{
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["c", "f"]}
},
"required": ["location", "unit"],
"additionalProperties": false
}
}

How Schemas Help

1.Parameter Enforcement: Ensures that the model generates valid parameter values.

2.Response Validation: The system validates the LLM’s output against the schema before calling the function.

Conclusion

Function calling is a powerful feature that allows AI models to interact seamlessly with external systems, providing dynamic and context-aware responses. By understanding how structured outputs, schemas, and system logic work together, developers can build more robust and efficient applications.

If you’re curious about how retrieval works — another tool used to extract information from external PDFs and documents — be sure to check out my articles on retrieval mechanisms and knowledge bases linked below!

Additional Resources

Articles on Retrieval :

1. Breaking Down the Cognitive Systems & Tooling of LLMs (Part 2): Knowledge Bases and RAG

2. Breaking Down the Cognitive Systems & Tooling of LLMs (Part 2.5): Vector Databases and Embeddings Role in RAG/Retrieval

Read my detailed article on implementing function calling with the OpenAI Assistant API: Create an AI Chatbot: Function Calling with OpenAI Assistant.

ArbitrageAI

229 位关注者

要查看或添加评论，请登录

Alozie Igbokwe的更多文章

Building a Calendly Inbound Appointment Booking AI Voicebot with VAPI (Part 5): Post-Call Analytics and Call Memory

2025年1月18日

Building a Calendly Inbound Appointment Booking AI Voicebot with VAPI (Part 5): Post-Call Analytics and Call Memory

Welcome to Part 5 of my series on building a Calendly-based AI Appointment Booking Phone Agent with VAPI. In Part 4, I…

1 条评论
Building an Inbound Appointment Booking AI Voicebot with VAPI (Part 4): Using AI Browser Automation to Book Meetings on Calendly During Call

2025年1月9日

Building an Inbound Appointment Booking AI Voicebot with VAPI (Part 4): Using AI Browser Automation to Book Meetings on Calendly During Call

Welcome to Part 4 of my series on building a Calendly-based AI Appointment Booking Phone Agent with VAPI. In Part 3, I…
Building an Calendly Inbound Appointment Booking AI Voice Agent with VAPI (Part 3): Validating a Caller’s Email with Function Calling and Prompting

2025年1月8日

Building an Calendly Inbound Appointment Booking AI Voice Agent with VAPI (Part 3): Validating a Caller’s Email with Function Calling and Prompting

Welcome to Part 3 of my series on building a Calendly-based AI Appointment Booking Phone Agent with VAPI. In Part 2, I…
Building an Inbound Calendly Appointment Booking AI Voicebots with VAPI (Part 2): Adjusting Your Availability to the Caller’s Timezone

2025年1月7日

Building an Inbound Calendly Appointment Booking AI Voicebots with VAPI (Part 2): Adjusting Your Availability to the Caller’s Timezone

Welcome to another installment of my series on building an Inbound Appointment Booking AI Phone Agent using VAPI, a…
Building an Inbound Calendly Appointment Booking AI Voicebot with VAPI (Part 1): Retrieving Calendar Availability with Server URLs

2025年1月7日

Building an Inbound Calendly Appointment Booking AI Voicebot with VAPI (Part 1): Retrieving Calendar Availability with Server URLs

Excited to kick off a new series where I’ll break down how to use VAPI, a low-code tool for building AI voicebots, to…
24/7 Customer Service Made Easy: Building an AI Phone Agent with Vapi AI

2024年12月18日

24/7 Customer Service Made Easy: Building an AI Phone Agent with Vapi AI

In today’s fast-paced business world, providing round-the-clock customer support is crucial, yet often challenging and…

2 条评论
Breaking Down the Cognitive Systems & Tooling of LLMs Part 5 : Long Term Memory in AI

2024年12月12日

Breaking Down the Cognitive Systems & Tooling of LLMs Part 5 : Long Term Memory in AI

Memory is a fundamental component of artificial intelligence (AI), enabling systems to retain, retrieve, and use…
Breaking Down the Cognitive Systems & Tooling of LLMs Part 4 : Code Interpreters

2024年12月6日

Breaking Down the Cognitive Systems & Tooling of LLMs Part 4 : Code Interpreters

The rise of Large Language Models (LLMs) has introduced revolutionary capabilities to the programming and AI landscape.…
Crafting a Graph-Generating Data Analysis AI Assistant with OpenAI Assistant API

2024年12月4日

Crafting a Graph-Generating Data Analysis AI Assistant with OpenAI Assistant API

Welcome to Part 3 of our enlightening series, where we dive into the fascinating world of creating a data analysis…
Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3): Basics of Function Calling

2024年11月30日

Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3): Basics of Function Calling

1. Introduction: What is a Function? In the rapidly evolving world of artificial intelligence, we often marvel at AI’s…

See all articles

Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3.5): Advanced Concepts of Function Calling

Alozie Igbokwe

Building AI Voicebots | AI Engineer

Overview of Function Calling

Example: Weather Assistant

Step 1: User Input

Step 2: Model Generates Structured Parameters

Step 3: API Call

Step 4: API Response

Step 5: Integrating the Output

Step 6: Generating the Final Response

Key Points

Now how are AI Models guided to generate the Structured Outputs Needed

1. Prompt Engineering for Structured Output

2. Fine-Tuning for Function Calling

3. Token-Level Guidance

领英推荐

Schemas in Function Calling

Conclusion

ArbitrageAI

229 位关注者

Alozie Igbokwe的更多文章

社区洞察

其他会员也浏览了

DeepSeek R1 Deconstructed: How it was built and How it Operates as it does

Behind the Scenes of LangGraph: Innovating Multi-Agent AI Workflows

Newsletter #13 - AI AND AUTOMATION Weekly: Mastering Prompt Engineering with OpenAI

Understanding AI Tools: Which Ones Are Worth It and Which Are Just Hype

The Art of Prompt Engineering: Crafting Effective AI Queries

Multimodal or Multiagents: Artificial General Intelligence vs. Artificial Swarm Intelligence

In-Depth Insights into LangGraph and Its Role in AI Agent Development

AI vs ML what's the difference?

Large Concept Models - LCMs

Generative AI May Become the 2023 Time Person of the Year: The Evolution of Data Democratization & AI, from 2006 to 2024

Overview of Function Calling

Example: Weather Assistant

Step 1: User Input

Step 2: Model Generates Structured Parameters

Step 3: API Call

Step 4: API Response

Step 5: Integrating the Output

Step 6: Generating the Final Response

Key Points

Now how are AI Models guided to generate the Structured Outputs Needed

1. Prompt Engineering for Structured Output

2. Fine-Tuning for Function Calling

3. Token-Level Guidance

领英推荐

Schemas in Function Calling

Conclusion

ArbitrageAI

229 位关注者

Alozie Igbokwe的更多文章

Building a Calendly Inbound Appointment Booking AI Voicebot with VAPI (Part 5): Post-Call Analytics and Call Memory

Building an Inbound Appointment Booking AI Voicebot with VAPI (Part 4): Using AI Browser Automation to Book Meetings on Calendly During Call

Building an Calendly Inbound Appointment Booking AI Voice Agent with VAPI (Part 3): Validating a Caller’s Email with Function Calling and Prompting

Building an Inbound Calendly Appointment Booking AI Voicebots with VAPI (Part 2): Adjusting Your Availability to the Caller’s Timezone

Building an Inbound Calendly Appointment Booking AI Voicebot with VAPI (Part 1): Retrieving Calendar Availability with Server URLs

24/7 Customer Service Made Easy: Building an AI Phone Agent with Vapi AI

Breaking Down the Cognitive Systems & Tooling of LLMs Part 5 : Long Term Memory in AI

Breaking Down the Cognitive Systems & Tooling of LLMs Part 4 : Code Interpreters

Crafting a Graph-Generating Data Analysis AI Assistant with OpenAI Assistant API

Breaking Down the Cognitive Systems & Tooling of LLMs (Part 3): Basics of Function Calling

社区洞察

其他会员也浏览了

DeepSeek R1 Deconstructed: How it was built and How it Operates as it does

Behind the Scenes of LangGraph: Innovating Multi-Agent AI Workflows

Newsletter #13 - AI AND AUTOMATION Weekly: Mastering Prompt Engineering with OpenAI

Understanding AI Tools: Which Ones Are Worth It and Which Are Just Hype

The Art of Prompt Engineering: Crafting Effective AI Queries

Multimodal or Multiagents: Artificial General Intelligence vs. Artificial Swarm Intelligence

In-Depth Insights into LangGraph and Its Role in AI Agent Development

AI vs ML what's the difference?

Large Concept Models - LCMs

Generative AI May Become the 2023 Time Person of the Year: The Evolution of Data Democratization & AI, from 2006 to 2024