From a Whisper to a Chat(GPT)
Hey there, it's a great day for tech enthusiasts! OpenAI just dropped two new APIs that are set to change the game when it comes to natural language processing (NLP). The ChatGPT and Whisper APIs are the latest developments from OpenAI, and they're bound to make integrating NLP into software a breeze.
First off, let's talk about the ChatGPT API. This language model is designed to engage in realistic conversations with humans. With its in-depth understanding of human language, ChatGPT can be integrated into chatbots, virtual assistants, and other conversational interfaces. What's exciting about this API is that it's trained on vast amounts of human-generated text, making it highly accurate and adaptable. It's expected to help developers create chatbots that can offer excellent customer support, answer user queries, and even crack jokes!
Ok. As it turns out, the above two paragraphs were generated by ChatGPT. They were taken verbatim, no revisions were made:
When you read the start of this newsletter you may or may not have thought it sounded like me. It does, mostly. ChatGPT took some of my prompts that I've written over the last two months, and some of the text I had it rewrite, and it got a "feel" for what I sound like.
Now this is not really anything new if you have been keeping an eye on ChatGPT and all that is going around about it. Pros and cons abound on this stuff. But no matter what you think about it, you know in your heart of hearts this is cool stuff. Scary cool or just super cool. Fascinating stuff.
ChatGPT API? GPT-3.5 API?
So what is the big deal about an API coming out for ChatGPT if we have the tool at our fingertips? Well, an API is an Application Programming Interface, a way to connect to a program without screens. You write code in your app that connects to the API to get data, to push data to a database, etc.
"But I thought there was already an API for this? What is the big deal?" Great questions! What we devs had was the GPT-3.5 API. This is the text completion engine that sits inside the ChatGPT application.
So ChatGPT is a chat application built on GPT-3.5. Get it?
Chat...GPT engine...ChatGPT.
GPT-3.5 is version 3.5 of the GPT engine. Blah blah blah...what you need to know is that the API for it is not chat-based. GPT is a text completion engine. You give it something, it completes the thought. You say "Why is the sky blue?" and it will give you an answer, thus completing the thought.
Guru Example
Look at it like taking your question up the mountain to the guru meditating there. You ask your question, you get an answer. "How many licks does it take to get to the center of a Tootsie Pop?" You are old if you know not only the question, but the answer.
You don't get to ask more than one question. So you go back down the mountain. If you want him/her to reword their answer, you have to go back up and ask it again with more specifics in the question about what you want for an answer.
You can't say "Hi guru, it is me again. Can you reword your answer?" They will look at you and go "Who are you and what answer are you talking about?"
It is very Q&A oriented. Microsoft Research built ChatGPT to be a conversational web app. It remembers the questions you ask. It knows what you've talked about previously. Thus, the concept of a "chat". You speak, I listen. Then I reply, you absorb and ask another question. That sort of thing.
领英推荐
You can ask the guru a question, get an answer and then say only "Can you rephrase that?" and it knows what you are talking about. If you go down the mountain and come back 2 days later and ask the guru to rephrase the answer, they will know who you are, what you asked, what you talked about, etc.
Until March 1, 2023, apps we created were using the one-off method. Every question stands alone.
Now, using the ChatGPT API we can basically have ChatGPT running inside our apps. Instead of needing to have ALL the context in every prompt, you can just add the new part like you would talking to the guru. (Geeks reading this - I know this is a broad brush explanation that isn't 100% accurate. I do it for ease of understanding by non-geeks. Bear with me.)
This is huge for devs who can now program real conversational apps using AI, without having to reword the whole question somehow using the old single-use Q&A methods.
Now we can do real AI chat. And generate lead magnets, and then tell it to regenerate just page 7 (and it will know what you are talking about). Everything will be in context as the conversation goes on, so the answers will be smarter, better, faster and cheaper. As compared to GPT-3.5, ChatGPT API calls are 1/10 of the cost!
Whisper
So what is Whisper? Whisper is an AI-based speech-to-text engine made by OpenAI. Check this out - you can upload a 25MB audio file, and it will do a transcription or a translation of the audio for you. Sure, speech-to-text has been around a while, but you've all seen the fun our phones and softwares have had understanding us. It is far from perfect.
But here is where OpenAI does its magic of scouring tons of real data to use in its AI models to be more accurate than anything that has gone before. Here is what they say, far better than what I ever could:
680 THOUSAND hours of data collected from the web. With all our ums, ahhs, y'alls, ay yuhs...and that is just English. This is a cool API that will let us upload files and get a transcription on the fly for it.
Ok...now for some meat for you to chew on. This newsletter IS about innovation, right? These APIs are tools, that's it. What YOU build is the innovation.
1) You have an app that takes in your voice, asking for something. Sort of like Alexa or hey Google or whatever. But instead of it being some big deal like those, it is YOUR app. Yours. You wrote it. It takes in the speech, calls to the Whisper API and gets an immediate transcription of it. That transcription is put inside a ChatGPT prompt that you then use in a call to the ChatGPT API. The result you get back is then shown on the screen. The user then uses their voice and says "Now make it, you know, more like...um...more like Snoop Dogg would say it." And that goes over to Whisper, then to ChatGPT who returns back the text just like Snoop would say it, fo shizzle!
2) You are in a foreign country and do not understand what someone is saying. You hold up your app, get their voice in their own language, send it to Whisper which then gives you back a transcription in English.
3) There are apps that take in several pictures of you from different angles and with different facial expressions to generate a 3D realistic avatar of you. Now put on a VR headset, and speak some question. It goes to Whisper, then to ChatGPT and then the result is put into speech synthesis software and is spoken back to you by your avatar of you. With near perfect deep fake movements. ChatGPT becomes the answer bot. And then you say "Can you rephrase that, you handsome devil?" and it will.
These are the beginnings of what these APIs can do for us. Now I know, you have concerns and big scary monsters under the bed. Welcome to change and innovation of historic proportions! Don't let fear stop the GOOD this can do in the world.
Tackle the scary things as they come up, not sooner, and know that people much smarter than I will be working on those things.
For now, dream. This is the whisper...the beginnings. Fo shizzle.
???? Online Marketing | ?? Digital Marketing. ?? Affiliate Marketing | I'm a Sales And Marketing Geek And I Can Help You Grow And Monetize Your Brand Using Social Media
2 年Loved your article and explanation... I love seeing you lead the conversation around ChatGpt and it's practical uses... Just one thing... About that first section... I think ChatGpt is funnier than you ??...
General Contractor & Fatherpreneur?? ??
2 年Awesome Greg Howe! Love the way you came out of the gate. Fooled me in beginning! I need to learn more about this API integration and how to prompt it correctly.