Bridging the Gap: Introducing ZapGPT

Bridging the Gap: Introducing ZapGPT

ZapGPT emerges as our solution, bridging the gap between the familiar world of WhatsApp and the functionalities of ChatGPT. But how did this idea spark? Let's delve into the problem that ignited our journey.

The Problem

Brazilians have a sincere passion for smartphones. We literally have more smartphones than people in Brazil. We use smartphones for a lot of things (including calling). And among the thousands of mobile applications available, one has a special place in the hearts of Brazilians: WhatsApp.

When we talk about WhatsApp we immediately remember messages and groups, and when we remember groups a figure that is already part of Brazilian pop culture comes to mind: the 'Tia do Zap' (Zap's Aunt)! Yes, she (or he) herself! Our dear relative who is always ready to send a good morning message, a prayer, a chain letter, a scam alert or news of dubious origin in the family group.

Although Zap's Aunt is highly versed in the features of WhatsApp, they often find it difficult to use other applications or services available on the web.

And that's where we started questioning ourselves. Is Zap's Aunt using the new generative models and ChatGPT?

ChatGPT onboarding, for example, is not the easiest, there is some friction, and even understanding how to use the chat can be a barrier for some people. Our hypothesis is that this makes it difficult for people less versed in the digital world to use these technologies, therefore moving Zap's Aunt away from the world of generative models and ChatGPT.

Well, there we have our problem: How to connect Zap's Aunt to LLMs?

Our Solution

ZapGPT was one of the projects developed for the Novatics Hackathon. It aims to solve the problem described above. The idea is to create an application that allows the user to communicate with generative models using WhatsApp.

The objective is to have very straightforward and simple user flow, where the user would only need to add the ZapGPT number on WhatsApp to start using the possibilities offered by the generative/ChatGPT models.

To do this, we built an application that uses the integration between Twilio and OpenAI API to allow users to interact with generative models through WhatsApp, in this case with ChatGPT and Dall-E.

The project was part of a hackathon, so our solution had to be lean. We decided to focus on two features:

  1. generate answers to questions (the ChatGPT default);
  2. generate 'good morning' images – for those who know the persona of the Zap's Aunt, you know that generating 'good morning' images is a very important feature, which is why we decided to prioritize it!

How It Works

To implement the two prioritized features, we decided to create a structure of extensions that adds functionalities to generative models and allow the user to obtain the expected results from simple text messages, like the ones we normally send on WhatsApp.

Our goal is to avoid as much as possible the need for structured messages that require the user to see a set of options and respond with the option they want to execute, for example, "Reply with the desired option number: (1) generate image, (2) obtain information, etc…". We think an experience like this is close to a standard web interaction (a dropdown, for example) and our goal was to test how to make ZapGPT work using a completely free text-based interaction.

We defined four extensions: (1) Image generator, (2) GPT default, (3) Fake news check, and (4) Travel itinerary. And for the first MVP we implemented the first two extensions, which meet the prioritized functionalities.

Basically ZapGPT works like this:

  1. User sends the message via WhatsApp;
  2. ZapGPT core checks the message and selects the extension that should be used – part of the “magic” happens here, as the extension selection process also uses LLM models;
  3. The message is forwarded to the extension;
  4. The extension generates the response and sends it to the core;
  5. ZapGPT sends the response to WhatsApp.

In the next section we will explain better how we developed this solution and how we deployed it.

How We Developed It

To develop the solution, three points had to be built: (1) an integration with WhatsApp, (2) a platform to run our API, and (3) an integration with the OpenAI API.

Communication with WhatsApp can be done directly through the API provided by them, however it is necessary to set up a business account which takes some time. To reduce this development friction, we chose to use Twilio, which already supports integration with WhatsApp and provides a free trial account.

To develop our API we use a simple stack with NodeJS, ExpressJS and Typescript. As we had to manipulate images we decided to use the Jimp lib.

We ended-up with the following architecture:

It works as follows:

  1. User sends a WhatsApp message to a number created by Twilio;
  2. Twilio calls ZapGPT webhook forwarding the message that was sent by the user;
  3. ZapGPT API handles the message and uses the OpenAI API to generate the response;
  4. The generated response is sent to Twilio, which in turn sends it to WhatsApp;
  5. User receives the response in the WhatsApp thread.

We deployed the ZapGPT API to an AWS Lambda function. For this we use the serverless-http lib that converts an ExpessJS server to Lambda's serverless format. For a production application with a high volume of requests, this solution may not be the most recommended, however, for the hackathon, it was good enough.

The gpt-3.5-turbo-1106 template is used to identify which extension will be activated to generate the response for the user.

To respond to messages from users who demand image generation, we also used the OpenAI API, more specifically the dall-e-3 model. The use of the OpenAI API in this case also follows the MVP approach, that is, we had little time to achieve a result, so we chose a quick solution with a good cost-benefit ratio.

MVP results

We tested the two features that were prioritized for the MVP: Image generation and question answering (the ChatGPT standard).

  • Image generation:

User message: “good morning image with a prayer”.
User message: “blessed good morning image for family group”.

  • Question answering:

User question: “who won the 94 world cup?”. ZapGPT answer: “Brazil won the 1994 World Cup, held in the United States. The Brazilian team beat Italy in the final, in a penalty shootout”.
User question: “characters from trapalh?es”. ZapGPT answer: “The Trapalh?es were a brazilian comedy group…”.

In general, the results were good, but some difficulties were encountered when generating images, mainly:

  • Generation of images that do not match the user's message, especially when the user's message requested an image containing text;
  • Problems with selecting the correct extension when the message is too long. It's a corner case, but it was still an identified issue that needs to be addressed in a future iteration of the product.

Next Steps

The next step is to test the ZapGPT MVP with a broader group of users to get feedback and define a roadmap of features to be included.

Some other potential technical improvements we can make are:

  • Experiment with other LLMs to do the extension selection part;
  • Improve image generation prompts;
  • Include other extensions.

Ricardo Amarilla

Senior UX Designer & Project Lead: Bringing innovation for SaaS Teams Around the World

7 个月

Esse foi um dos projetos mais massas que a galera criou. Parabéns! A ideia de trazer essa tecnologia t?o recente para as popula??o comum dentro de um ambiente que eles est?o adaptados (Whatspp) ajuda a tornar a IA algo acessível para as massas e possivelmente escalável por entregar valor t?o rápido, já que a curva da aprendizado/adapta??o é modesta.

要查看或添加评论,请登录

Novatics的更多文章

社区洞察