Using AI to automatically transcribe WhatsApp audio.
Igor Campos Vilas Boas
Certified Bubble Developer | No-Code | Bubble.io | Automation Task | N8N
Context
This article will show you step by step how to configure artificial intelligence to automatically transcribe audio on your WhatsApp. Transcription will occur for both sent and received audio.
Technology Stack
Choosing and configuring the WhatsApp API
Step zero consists of installing the Evolution API on a machine so that API calls can be made. As the focus of the article is to teach how to configure OpenAI's Whisper to reactivate audio transcription, I suggest the ZDG Community video that shows step by step how to install it
Video link: https://www.youtube.com/watch?v=h1jwldMlpVQ
If you prefer to use another API service, to avoid the need to install the Evolution API, I recommend API Brasil services (https://apibrasil.com.br/ ).
Connecting your WhatsApp to the Evolution API
To make the connection, it is necessary to create a dedicated session for the number that will be connected through the following cURL call:
curl --location 'https://api01.your-domin/instance/create' \
--header 'Content-Type: application/json' \
--header 'apikey: your-api-key' \
--data '{
"instanceName": "choose-a-name",
"token": "chose-a-token",
"qrcode": true,
"number": "ddi-number", //number to be connected
"webhook": "https://your-sub.domain.com/webhook-test",
"webhook_by_events": true,
"events": [
// "APPLICATION_STARTUP",
"QRCODE_UPDATED",
"MESSAGES_SET",
"MESSAGES_UPSERT",
//"MESSAGES_UPDATE",
"MESSAGES_DELETE",
"SEND_MESSAGE",
// "CONTACTS_SET",
// "CONTACTS_UPSERT",
// "CONTACTS_UPDATE",
// "PRESENCE_UPDATE",
// "CHATS_SET",
// "CHATS_UPSERT",
// "CHATS_UPDATE",
// "CHATS_DELETE",
// "GROUPS_UPSERT",
// "GROUP_UPDATE",
// "GROUP_PARTICIPANTS_UPDATE",
"CONNECTION_UPDATE"
//"CALL"
// "NEW_JWT_TOKEN",
// "TYPEBOT_START",
// "TYPEBOT_CHANGE_STATUS",
]
}'
You can configure a webhook to redirect the events of your choice. By enabling the "MESSAGES_SET" and "MESSAGES_UPSERT" events, it will be possible to retrieve the events of incoming and outgoing messages.
To make the link with the N8N, a Webhook node was created and this is exactly the URL used in the webhook_by_events variable of the previous cURL call.
If you want audio transcription to also be done for groups, you will need to enable “GROUP” events
When making the call, a QR Code image in base64 format will be generated. Just open your WhatsApp and read the QR Code.
Connecting ChatGPT and N8N
Access your openAI account via the link https://platform.openai.com/api-keys and create a new API key that will be used. Then open N8N, access the “Credentials” tab and click the “Add credential” button.
A new popup will open. Select the “OpenAI” option in the dropdown
Add your created OpenAI api key
That's it, your N8N already has a registered OpenAI credential.
Creating the transcription flow
领英推荐
1- Check if the event is a text message
To check whether the content of the event is audio, it is necessary to evaluate the body.data.messageType parameter. If it is an audio event, this variable will have a value equal to “audioMessage”.
To filter only audio events, two options are available in the N8N: IF node and Filter node. We chose to use the IF conditional because, in later applications, it may be interesting to treat a message that is not audio in some way.
2- Convert the received audio to base64
The audio file is encrypted by WhatsApp so it is not possible to recover the file directly in the event received via the webhook.
To be able to access the audio file, it is necessary to use the “chat/getBase64FromMediaMessage” endpoint of the Evolution API. It will be necessary to pass the id of the audio message received (it can be retrieved through the variable body.data.key.id, in the JSON received from the webhook). The endpoint will return the audio file in base64 format.
Just copy the cURL call below
curl --location '{{baseUrl}}/chat/getBase64FromMediaMessage/{{instance}}' \
--header 'Content-Type: application/json' \
--header 'apikey:your-api-key' \
--data '
{
"message": {
"key": {
"id": "3EB0F4A1F841F02958FB74"
}
},
"convertToMp4": false
}'
3- Convert the base64 file to binary
It is important to convert the file from base64 to binary because the chatgpt call requires a binary as input.
To perform this action, the N8N “Convert to/from binary data” node is used, configured as shown in the following image.
4- Make a call to ChatGPT to transcribe the audio
With the entire environment configured, transcribing the audio is simple, just make a call using cURL below. In response, the API will return the audio transcription.
curl --request POST \
--url https://api.openai.com/v1/audio/transcriptions \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--header 'Content-Type: multipart/form-data' \
--form file=@/path/to/file/audio.mp3 \
--form model=whisper-1
5- Return the audio on WhatsApp
With the audio already transcribed by artificial intelligence, simply use the following cURL to send the transcribed audio in the same conversation.
curl --location '{{baseUrl}}/message/sendText/{{instance}} \
--header 'Content-Type: application/json' \
--header 'apikey: your-api-key' \
--data '{
"number": "number-to-return",
"options": {
"delay": 1200,
"presence": "composing",
"linkPreview": false
},
"textMessage": {
"text": "your-text-here"
}
}'
The “text” variable will receive the transcription of the previous step. The number variable indicates the number that will receive the message, so it will receive the value body.data.key.remoteJid.replace('@s.whatsapp.net','') returned from the Webhook's JSON. The replace function is to format it correctly: ddi&numero.
This is the complete flow created on N8N
Conclusion
In this article you learned step by step how to connect your number to the Evolution API so that, using OpenAI's artificial intelligence, you are able to automatically transcribe all audio on WhatsApp and return it to the corresponding conversation. The backend was operated through N8N.
Would you like to test?
If you want to test the ready functionality on your WhatsApp, I developed a super simple to use system. Just create your account, read the device and start having your audios transcribed.
Click on the link below and discover Sussuro, our audio transcription robot integrated into WhatsApp
Generative AI Evangelist and Digital Innovation- Stellantis
6 个月Great ?? do you know a solution for scraping the conversations of a WhatsApp group ?