Using AI to automatically transcribe WhatsApp audio.

Using AI to automatically transcribe WhatsApp audio.

Context

This article will show you step by step how to configure artificial intelligence to automatically transcribe audio on your WhatsApp. Transcription will occur for both sent and received audio.

Technology Stack

  • Evolution API: Unofficial WhatsApp API. It will be used to record received audio events.
  • N8N: Used as a back-end to process events, redirect them to the AI and return them to the WhatsApp conversation.
  • AI: OpenAI's Whisper responsible for transcribing audio into texts (speach-to-text)

Choosing and configuring the WhatsApp API

Step zero consists of installing the Evolution API on a machine so that API calls can be made. As the focus of the article is to teach how to configure OpenAI's Whisper to reactivate audio transcription, I suggest the ZDG Community video that shows step by step how to install it

Video link: https://www.youtube.com/watch?v=h1jwldMlpVQ

If you prefer to use another API service, to avoid the need to install the Evolution API, I recommend API Brasil services (https://apibrasil.com.br/ ).

Connecting your WhatsApp to the Evolution API

To make the connection, it is necessary to create a dedicated session for the number that will be connected through the following cURL call:

curl --location 'https://api01.your-domin/instance/create' \
--header 'Content-Type: application/json' \
--header 'apikey: your-api-key' \
--data '{
    "instanceName": "choose-a-name",
    "token": "chose-a-token",
    "qrcode": true,
    "number": "ddi-number", //number to be connected
    "webhook": "https://your-sub.domain.com/webhook-test",
    "webhook_by_events": true,
    "events": [
        // "APPLICATION_STARTUP",
        "QRCODE_UPDATED",
        "MESSAGES_SET",
        "MESSAGES_UPSERT",
        //"MESSAGES_UPDATE",
        "MESSAGES_DELETE",
        "SEND_MESSAGE",
        // "CONTACTS_SET",
        // "CONTACTS_UPSERT",
        // "CONTACTS_UPDATE",
        // "PRESENCE_UPDATE",
        // "CHATS_SET",
        // "CHATS_UPSERT",
        // "CHATS_UPDATE",
        // "CHATS_DELETE",
        // "GROUPS_UPSERT",
        // "GROUP_UPDATE",
        // "GROUP_PARTICIPANTS_UPDATE",
        "CONNECTION_UPDATE"
        //"CALL"
        // "NEW_JWT_TOKEN",
        // "TYPEBOT_START",
        // "TYPEBOT_CHANGE_STATUS",
    ]    
}'        

You can configure a webhook to redirect the events of your choice. By enabling the "MESSAGES_SET" and "MESSAGES_UPSERT" events, it will be possible to retrieve the events of incoming and outgoing messages.

To make the link with the N8N, a Webhook node was created and this is exactly the URL used in the webhook_by_events variable of the previous cURL call.

webhook node from N8N

If you want audio transcription to also be done for groups, you will need to enable “GROUP” events

When making the call, a QR Code image in base64 format will be generated. Just open your WhatsApp and read the QR Code.

Connecting ChatGPT and N8N

Access your openAI account via the link https://platform.openai.com/api-keys and create a new API key that will be used. Then open N8N, access the “Credentials” tab and click the “Add credential” button.

A new popup will open. Select the “OpenAI” option in the dropdown

Add your created OpenAI api key

That's it, your N8N already has a registered OpenAI credential.

Creating the transcription flow

  1. The flow is divided into 5 steps:
  2. Check if the event is a text message
  3. Convert received audio to base64
  4. Convert base64 file to binary
  5. Make a call to ChatGPT to transcribe the audio
  6. Return audio on WhatsApp


1- Check if the event is a text message

To check whether the content of the event is audio, it is necessary to evaluate the body.data.messageType parameter. If it is an audio event, this variable will have a value equal to “audioMessage”.

To filter only audio events, two options are available in the N8N: IF node and Filter node. We chose to use the IF conditional because, in later applications, it may be interesting to treat a message that is not audio in some way.

2- Convert the received audio to base64

The audio file is encrypted by WhatsApp so it is not possible to recover the file directly in the event received via the webhook.

To be able to access the audio file, it is necessary to use the “chat/getBase64FromMediaMessage” endpoint of the Evolution API. It will be necessary to pass the id of the audio message received (it can be retrieved through the variable body.data.key.id, in the JSON received from the webhook). The endpoint will return the audio file in base64 format.

Just copy the cURL call below

curl --location '{{baseUrl}}/chat/getBase64FromMediaMessage/{{instance}}' \
--header 'Content-Type: application/json' \
--header 'apikey:your-api-key' \
--data '
{
    "message": {
        "key": {
            "id": "3EB0F4A1F841F02958FB74"
        }
    },
    "convertToMp4": false
}'        

3- Convert the base64 file to binary

It is important to convert the file from base64 to binary because the chatgpt call requires a binary as input.

To perform this action, the N8N “Convert to/from binary data” node is used, configured as shown in the following image.

4- Make a call to ChatGPT to transcribe the audio

With the entire environment configured, transcribing the audio is simple, just make a call using cURL below. In response, the API will return the audio transcription.

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header "Authorization: Bearer $OPENAI_API_KEY" \
  --header 'Content-Type: multipart/form-data' \
  --form file=@/path/to/file/audio.mp3 \
  --form model=whisper-1
        

5- Return the audio on WhatsApp

With the audio already transcribed by artificial intelligence, simply use the following cURL to send the transcribed audio in the same conversation.

curl --location '{{baseUrl}}/message/sendText/{{instance}} \
--header 'Content-Type: application/json' \
--header 'apikey: your-api-key' \
--data '{
    "number": "number-to-return",
    "options": {
        "delay": 1200,
        "presence": "composing",
        "linkPreview": false
    },
    "textMessage": {
        "text": "your-text-here"
    }
}'        

The “text” variable will receive the transcription of the previous step. The number variable indicates the number that will receive the message, so it will receive the value body.data.key.remoteJid.replace('@s.whatsapp.net','') returned from the Webhook's JSON. The replace function is to format it correctly: ddi&numero.

This is the complete flow created on N8N

Conclusion

In this article you learned step by step how to connect your number to the Evolution API so that, using OpenAI's artificial intelligence, you are able to automatically transcribe all audio on WhatsApp and return it to the corresponding conversation. The backend was operated through N8N.

Would you like to test?

If you want to test the ready functionality on your WhatsApp, I developed a super simple to use system. Just create your account, read the device and start having your audios transcribed.

Click on the link below and discover Sussuro, our audio transcription robot integrated into WhatsApp

https://geniosdossistemas.com/transcritor-de-audio/


Guillaume Calfati

Generative AI Evangelist and Digital Innovation- Stellantis

6 个月

Great ?? do you know a solution for scraping the conversations of a WhatsApp group ?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了