Video Call with ChatGPT

Video Call with ChatGPT

OpenAI's ChatGPT is recognized for its advanced text-based interactions among the expanding language learning models (LLMs) family. However, numerous UI clients for LLMs, including ChatGPT, are currently limited to text-based interactions.

We strive to enhance this experience by incorporating an avatar to humanize LLMs and enabling voice interactions. Our goal is to infuse the LLM with more personality and elevate conversations to a more enjoyable level, even though it already possesses an impressive personality!

What’s New?

  • Voice Interactions: Now, you can speak to ChatGPT, and it will respond in a human-like voice.
  • Avatar Integration: The avatar adds personality to your interactions, making conversations more natural and engaging.

How It Works

  • To make this possible, we integrated several technologies:
  • Speech-to-Text: Converts your spoken words into text for LLM to process.
  • Text-to-Speech: Transforms LLM’s text responses into audio to respond to you.
  • 3D Animation: Synchronizes the avatar’s facial movements with the speech (audio output) for a realistic effect.

This is the outcome!

Diagram

The diagram below shows the structure of the application:

Tech Stack

In this Proof of Concept (PoC), we used a combination of SaSS models, APIs, and open-source tools:

  • ChatGPT: As the core Language Model.
  • Azure Speech-to-Text: For converting user audio into text.
  • Azure Text-to-Speech: For transforming text responses into audio.
  • ThreeJS: For animating the 3D model.

Looking Ahead

We're excited about future developments. We can enhance its capabilities with tools like LangChain, AutoGPT, or BabyAGI.

  • Enhanced English Teaching: Adding speech analysis for pronunciation feedback.
  • Broader Knowledge Integration: Incorporating Internet and YouTube search capabilities for tailored lesson recommendations.

Explore the full blog article for an in-depth look at our journey, challenges, solutions, and future plans.

Live Demo

Experience it firsthand! Try our live demo here and enjoy interacting with the next generation of conversational AI.

Stay tuned for more updates!

要查看或添加评论,请登录

CodeLink的更多文章

社区洞察

其他会员也浏览了