Introduction to Function Calling with?LLMs
Asim Hafeez
Senior Software Engineer | Lead | AI | LLMs | System Design | Blockchain | AWS
As artificial intelligence gets smarter, Large Language Models (LLMs) are changing the way we interact with technology. They’re no longer just about answering questions or making chatbots. Thanks to a feature called Function Calling, these models can now do specific tasks based on what we ask them. It’s like giving them a to-do list, and they check off items as they go.?
We will show you how this works using GPT-4 alongside DALL-E to create images from text descriptions. It’s fascinating how AI can do much more than just talk.
What is Function Calling with?LLMs?
Function calling with LLMs refers to using these models to trigger predefined functions or APIs in response to user prompts. This capability extends the utility of LLMs beyond text generation, allowing them to interact with other software components, fetch data, or even control devices, thereby enabling real-time actions and responses. It represents a significant leap toward more dynamic and useful AI systems.
For example, you can use an LLM to fetch the latest weather information, calculate complex equations, or generate custom images?—?all through natural language prompts provided by the user.
How Function Calling Works in?Practice
The process of Function Calling with LLMs can be broken down into a few key steps:
1. User Prompt: The user inputs a query or command, which the LLM recognizes as requiring a specific function.
2. Function Trigger: The LLM parses the input and identifies the need to call a predefined function.
3. Function Execution: The LLM triggers the appropriate function, which executes and returns the result.
4. Response Delivery: The LLM processes the function’s output and delivers it to the user in a comprehensible format.
Using GPT-4 for Image Generation with Function?Calling
One of the most exciting applications of Function Calling with LLMs is using it to generate images based on textual descriptions. This involves using GPT-4 in combination with DALL-E to turn words into visuals. Here’s how you can set up and run such an application:
Step 1: Setup Your Environment
Before we dive into the code, ensure you have your environment set up properly.
1. Install required packages:
npm install dotenv openai dotenv
2. Create a?.env file and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
Step 2: Configure OpenAI?API
Lastly, create the OpenAI instance in openai.js:
import { configDotenv } from 'dotenv'
configDotenv()
import OpenAI from 'openai'
export const openai = new OpenAI(process.env.OPENAI_API_KEY)
Step 3: Defining the User Input and Initializing GPT-4
We begin by importing the necessary libraries and defining how the model will process user input and generate images using function calling.
import 'dotenv/config'
import { openai } from './openai.js'
const QUESTION = process.argv[2] || 'hi'
// User prompt to interact with the model
const messages = [
{
role: 'user',
content: QUESTION,
},
]
Here, we import the configuration from?.env and prepare an initial message for GPT-4. The QUESTION variable captures what the user inputs.
Step 4: Define Functions
Now, define the function that will generate the image using DALL-E:
const functions = {
async createImage({ prompt }) {
const result = await openai.images.generate({ prompt })
console.log(result)
return result.data[0].url
},
}
This function accepts a prompt (text description) and generates an image URL using DALL-E.
领英推荐
Step 5: Setup GPT-4 to Recognize and Call Functions
In this step, we teach GPT-4 to recognize when to call a specific function based on the user’s input. Here’s what each part does:
const getGPTCompletion = async (messages) => {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
functions: [
{
name: 'createImage',
description: 'Create or generate an image based on description',
parameters: {
type: 'object',
properties: {
prompt: {
type: 'string',
description: 'Description of the image to generate',
},
},
required: ['prompt'],
},
},
],
temperature: 0,
})
return response
}
1. Model Selection: We specify GPT-4o-mini as the model, but this could be any GPT-4 model, I think function calling is also in the GPT-3.5-turbo.
2. Messages: We send the conversation history (in this case, just the user’s message). This tells the model what the user has asked.
3. Functions Array: We define a function (createImage) inside the functions array that GPT-4 can call when it recognizes the need. The function takes a single parameter, prompt, which will be a description of the image.
4. Temperature: This controls how creative or random the model’s responses are. A lower value like 0 makes responses more deterministic and predictable.
So in Step 4, we’re preparing GPT-4 to recognize a certain kind of user input (asking to generate an image) and to respond by calling a specific function (createImage) with the provided parameters.
Step 6: Handling GPT-4 Responses and Calling Functions
Now, in Step 5, we handle the actual process of calling the function when GPT-4 decides it needs to. Here’s the breakdown:
let response
while (true) {
response = await getGPTCompletion(messages)
console.log(response.choices[0].message.content)
if (response.choices[0].finish_reason === 'stop') {
console.log(response.choices[0].message.content)
break
} else if (response.choices[0].finish_reason === 'function_call') {
const fnName = response.choices[0].message.function_call.name
const args = response.choices[0].message.function_call.arguments
const functionToCall = await functions[fnName]
const params = JSON.parse(args)
const result = functionToCall(params)
messages.push({
role: 'assistant',
content: null,
function_call: {
name: fnName,
arguments: args,
},
})
messages.push({
role: 'function',
name: fnName,
content: JSON.stringify({ result: result }),
})
}
}
1. Response Loop: The loop runs continuously, waiting for GPT-4’s responses.
2. Stopping the Conversation: If GPT-4 decides that the conversation is over (i.e., the user doesn’t need a function call), the loop breaks and the content is logged to the console.
3. Function Call Handling:
This process lets GPT-4 act dynamically, calling the right functions based on user input, executing those functions, and integrating the results back into the conversation.
Step 7: Running the Application and?Demo
Now that everything is set up, it’s time to run the code and see the function calling in action.
Run the script:
node function-calling.js "Generate an image of a futuristic city skyline at sunset"
Demo
You can check the generated image
Conclusion
Congratulations! You’ve now successfully implemented Function Calling with GPT-4 to generate images using DALL-E. This project showcases how to extend the capabilities of LLMs beyond text-based responses by integrating external functions.
Through this approach, we’ve demonstrated:
Next: You can extend this example by adding more functions to interact with other APIs or perform complex tasks, such as sending emails, fetching data from external sources, or even controlling IoT devices. This opens up a wide range of possibilities for creating intelligent, interactive applications using LLMs.
If you found the article helpful, don’t forget to share the knowledge with more people! ??