登录查看更多内容

Using AI to automatically transcribe WhatsApp audio.

Igor Campos Vilas Boas

Certified Bubble Developer | No-Code | Bubble.io | Automation Task | N8N

发布日期: 2024年4月22日

Context

This article will show you step by step how to configure artificial intelligence to automatically transcribe audio on your WhatsApp. Transcription will occur for both sent and received audio.

Technology Stack

Evolution API: Unofficial WhatsApp API. It will be used to record received audio events.
N8N: Used as a back-end to process events, redirect them to the AI and return them to the WhatsApp conversation.
AI: OpenAI's Whisper responsible for transcribing audio into texts (speach-to-text)

Choosing and configuring the WhatsApp API

Step zero consists of installing the Evolution API on a machine so that API calls can be made. As the focus of the article is to teach how to configure OpenAI's Whisper to reactivate audio transcription, I suggest the ZDG Community video that shows step by step how to install it

Video link: https://www.youtube.com/watch?v=h1jwldMlpVQ

If you prefer to use another API service, to avoid the need to install the Evolution API, I recommend API Brasil services (https://apibrasil.com.br/ ).

Connecting your WhatsApp to the Evolution API

To make the connection, it is necessary to create a dedicated session for the number that will be connected through the following cURL call:

curl --location 'https://api01.your-domin/instance/create' \
--header 'Content-Type: application/json' \
--header 'apikey: your-api-key' \
--data '{
    "instanceName": "choose-a-name",
    "token": "chose-a-token",
    "qrcode": true,
    "number": "ddi-number", //number to be connected
    "webhook": "https://your-sub.domain.com/webhook-test",
    "webhook_by_events": true,
    "events": [
        // "APPLICATION_STARTUP",
        "QRCODE_UPDATED",
        "MESSAGES_SET",
        "MESSAGES_UPSERT",
        //"MESSAGES_UPDATE",
        "MESSAGES_DELETE",
        "SEND_MESSAGE",
        // "CONTACTS_SET",
        // "CONTACTS_UPSERT",
        // "CONTACTS_UPDATE",
        // "PRESENCE_UPDATE",
        // "CHATS_SET",
        // "CHATS_UPSERT",
        // "CHATS_UPDATE",
        // "CHATS_DELETE",
        // "GROUPS_UPSERT",
        // "GROUP_UPDATE",
        // "GROUP_PARTICIPANTS_UPDATE",
        "CONNECTION_UPDATE"
        //"CALL"
        // "NEW_JWT_TOKEN",
        // "TYPEBOT_START",
        // "TYPEBOT_CHANGE_STATUS",
    ]    
}'

You can configure a webhook to redirect the events of your choice. By enabling the "MESSAGES_SET" and "MESSAGES_UPSERT" events, it will be possible to retrieve the events of incoming and outgoing messages.

To make the link with the N8N, a Webhook node was created and this is exactly the URL used in the webhook_by_events variable of the previous cURL call.

If you want audio transcription to also be done for groups, you will need to enable “GROUP” events

When making the call, a QR Code image in base64 format will be generated. Just open your WhatsApp and read the QR Code.

Connecting ChatGPT and N8N

Access your openAI account via the link https://platform.openai.com/api-keys and create a new API key that will be used. Then open N8N, access the “Credentials” tab and click the “Add credential” button.

A new popup will open. Select the “OpenAI” option in the dropdown

Add your created OpenAI api key

That's it, your N8N already has a registered OpenAI credential.

Creating the transcription flow

The flow is divided into 5 steps:
Check if the event is a text message
Convert received audio to base64
Convert base64 file to binary
Make a call to ChatGPT to transcribe the audio
Return audio on WhatsApp

Daniel Abbott 1 年前

Unwrapping 2023-2024 Language Industry Insights

Bayantech 6 个月前

D-ID's AI Breakthrough: Videos Now Speak Your Language

Deqode 3 个月前

1- Check if the event is a text message

To check whether the content of the event is audio, it is necessary to evaluate the body.data.messageType parameter. If it is an audio event, this variable will have a value equal to “audioMessage”.

To filter only audio events, two options are available in the N8N: IF node and Filter node. We chose to use the IF conditional because, in later applications, it may be interesting to treat a message that is not audio in some way.

2- Convert the received audio to base64

The audio file is encrypted by WhatsApp so it is not possible to recover the file directly in the event received via the webhook.

To be able to access the audio file, it is necessary to use the “chat/getBase64FromMediaMessage” endpoint of the Evolution API. It will be necessary to pass the id of the audio message received (it can be retrieved through the variable body.data.key.id, in the JSON received from the webhook). The endpoint will return the audio file in base64 format.

Just copy the cURL call below

curl --location '{{baseUrl}}/chat/getBase64FromMediaMessage/{{instance}}' \
--header 'Content-Type: application/json' \
--header 'apikey:your-api-key' \
--data '
{
    "message": {
        "key": {
            "id": "3EB0F4A1F841F02958FB74"
        }
    },
    "convertToMp4": false
}'

3- Convert the base64 file to binary

It is important to convert the file from base64 to binary because the chatgpt call requires a binary as input.

To perform this action, the N8N “Convert to/from binary data” node is used, configured as shown in the following image.

4- Make a call to ChatGPT to transcribe the audio

With the entire environment configured, transcribing the audio is simple, just make a call using cURL below. In response, the API will return the audio transcription.

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header "Authorization: Bearer $OPENAI_API_KEY" \
  --header 'Content-Type: multipart/form-data' \
  --form file=@/path/to/file/audio.mp3 \
  --form model=whisper-1

5- Return the audio on WhatsApp

With the audio already transcribed by artificial intelligence, simply use the following cURL to send the transcribed audio in the same conversation.

curl --location '{{baseUrl}}/message/sendText/{{instance}} \
--header 'Content-Type: application/json' \
--header 'apikey: your-api-key' \
--data '{
    "number": "number-to-return",
    "options": {
        "delay": 1200,
        "presence": "composing",
        "linkPreview": false
    },
    "textMessage": {
        "text": "your-text-here"
    }
}'

The “text” variable will receive the transcription of the previous step. The number variable indicates the number that will receive the message, so it will receive the value body.data.key.remoteJid.replace('@s.whatsapp.net','') returned from the Webhook's JSON. The replace function is to format it correctly: ddi&numero.

This is the complete flow created on N8N

Conclusion

In this article you learned step by step how to connect your number to the Evolution API so that, using OpenAI's artificial intelligence, you are able to automatically transcribe all audio on WhatsApp and return it to the corresponding conversation. The backend was operated through N8N.

Would you like to test?

If you want to test the ready functionality on your WhatsApp, I developed a super simple to use system. Just create your account, read the device and start having your audios transcribed.

Click on the link below and discover Sussuro, our audio transcription robot integrated into WhatsApp

https://geniosdossistemas.com/transcritor-de-audio/

Guillaume Calfati

Generative AI Evangelist and Digital Innovation- Stellantis

6 个月

Great ?? do you know a solution for scraping the conversations of a WhatsApp group ?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Using AI to automatically transcribe WhatsApp audio.

Igor Campos Vilas Boas

Certified Bubble Developer | No-Code | Bubble.io | Automation Task | N8N

Context

Technology Stack

Choosing and configuring the WhatsApp API

Connecting your WhatsApp to the Evolution API

Connecting ChatGPT and N8N

Creating the transcription flow

领英推荐

1- Check if the event is a text message

2- Convert the received audio to base64

3- Convert the base64 file to binary

4- Make a call to ChatGPT to transcribe the audio

5- Return the audio on WhatsApp

Conclusion

Would you like to test?

更多精彩文章

社区洞察

其他会员也浏览了

When Will YouTube Speak Your Language? The Challenges and Opportunities of Automatic Audio Translation

Language Translations and Artificial Intelligence

Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data

Generative AI Tools Landscape - Audio Applications – Part3

VoizHub AI Review – Clone Any Celebrity Voice (By Seyi Adeleke)

Voixr Review – Generate Human-Like Voices In Any Language (Seun Ogundele)

How AI Adoption Affects Language Experts

18 Online Transcription Jobs For Beginners

Unlock Global Success by Transforming Your Business with Multilingual AI Communication Tools

The Future of Language Services in the Automotive Industry: Embracing Digital Transformation

Context

Technology Stack

Choosing and configuring the WhatsApp API

Connecting your WhatsApp to the Evolution API

Connecting ChatGPT and N8N

Creating the transcription flow

领英推荐

1- Check if the event is a text message

2- Convert the received audio to base64

3- Convert the base64 file to binary

4- Make a call to ChatGPT to transcribe the audio

5- Return the audio on WhatsApp

Conclusion

Would you like to test?

Learn How to Create a SQL AI Agent Using N8N

2024年11月25日

Discover the Secret Behind LinkedIn Profile Extraction with N8N and AI

2024年10月28日

How to configure real-time between Supabase and Bubble.io

2024年5月28日

Sensitive data leak on Bubble

2024年5月7日

How to use browser cache with Bubbble.io, reduce Workflow Units consumption by up to 90% and optimize repeater group loading time

2024年2月16日

Automatic Messages on WhatsApp and Telegram

2023年9月14日

How to consolidate several excel files into a single file

2023年8月18日

CREATING BULK DYNAMIC QR CODES IN BUBBLE.IO (STEP BY STEP)

2023年8月17日

Como economizar Workflow Unit no Bubble com a??es em massa

2023年6月16日

社区洞察

其他会员也浏览了

When Will YouTube Speak Your Language? The Challenges and Opportunities of Automatic Audio Translation

Language Translations and Artificial Intelligence

Paper Review: Translatotron 3: Speech to Speech Translation with Monolingual Data

Generative AI Tools Landscape - Audio Applications – Part3

VoizHub AI Review – Clone Any Celebrity Voice (By Seyi Adeleke)

Voixr Review – Generate Human-Like Voices In Any Language (Seun Ogundele)

How AI Adoption Affects Language Experts

18 Online Transcription Jobs For Beginners

Unlock Global Success by Transforming Your Business with Multilingual AI Communication Tools

The Future of Language Services in the Automotive Industry: Embracing Digital Transformation