My Journey to Building a Chinese-English Translator Web App
In May, I embarked on a project to simplify the translation process for non-English speakers. Inspired by my mother's struggles with standard translation apps, I created a web app that transcribes and translates audio from English to Chinese. With the help of the GPT-4 model, Deepgram’s API and Google Cloud Translate’s API, I was able to create a fully functional web app that can take an audio recording straight from the browser (either desktop or mobile), and transcribe and translate them.
Initial Setup and Requirements:
The following are tools and resources that I needed in order to get started:
The Role of ChatGPT:?
ChatGPT was a great help in the development process. If you haven’t heard of ChatGPT, it’s this natural language processing tool that is capable of answering your questions, helping you with analyzing your code and composing your code based on the information given to the chatbot. It can help you explain things from the developers' resources for Deepgram and Google Cloud. It has suggestions and recommendations which makes it such a powerful tool.
While ChatGPT is such a fantastic resource, it has its limits. For one, you can’t just dump code in there without context. You also need to be thorough in explaining what you’re looking to build. There’s a character limit as well. Let’s say you have about 400 lines of code in your JavaScript file plus another 100 lines of code for your server side JavaScript files, chances are, you’ll run into the token limit. While there’s no official documentation on the actual limit on OpenAI’s site, you can find them on the web to get a good idea of what that limit is. You’ll also run into a message cap. At the time, I was capped at 25 messages per 4 hours. You definitely want to use GPT-4 wisely when you want it to analyze your code for any mistakes or help.
Overall, ChatGPT was a great help in the development process, and I would definitely recommend it to others who are looking for a tool to help them with their JavaScript development.
Building the Transcription Feature:
Trying to implement the transcription feature was a real headache. The process starts with Netlify sending recorded audio files to Deepgram. These files then get converted into an audio blob in string form. After that, Deepgram sends the transcription results back to the DOM. One thing I love about Deepgram is how versatile it is. It accepts a broad range of audio formats, and I've opted to use the WAV format for this project.
I initially chose to use the WebM format, but soon discovered a thorny issue - compatibility. Apple devices, particularly Safari and iOS, don't play well with the MediaRecorder API WebM format. I thought it would work seamlessly across all platforms, but nope, was I wrong.
After what felt like endless debugging, countless hours poring over relevant documentation, and feeding ChatGPT additional information to better understand these compatibility issues, I eventually found a workaround. The RecordRTC API with WAV audio format proved to be the golden ticket for resolving the compatibility issue. But let me tell you, the whole ordeal drove me nuts.
领英推荐
Building the Translation Feature:
Implementing the translation feature proved to be less of an uphill battle compared to transcription. It was fairly straightforward: basically, take the transcription results from Deepgram, which are in English text format, and send them off to the Google Cloud Translate API.
Then, in a matter of seconds, the results come back and show up on the DOM. A function automatically translates the text from English to Chinese. If the transcription is in Chinese, it switches the results to English. This saves someone like my mom time and effort from pressing buttons to change the text language.
While this wasn't as taxing as getting the transcription feature to work, it was still a challenge in its own right. But I was satisfied with how the JavaScript function was able to seamlessly switch between languages. It added that extra layer of finesse to the entire process.
Final Product and User Experience:
The final user interface of my web app is pretty straightforward. You'll find a 'record' button that morphs into a 'translate' button, an 'edit' button that turns into an 'update' button, and a 'copy' button to snag the transcribed or translated text. The final button is the 'settings' button, which allows you to toggle the UI language between English and Chinese, with English as the default setting.
My mom, the main user for whom this app was built, isn't big on intricate navigation. So, I needed to keep it simple and accessible.
At the top of the page, there's a switch. Even though the transcribed text gets rendered, the translated text is displayed by default. The handy 'edit' button lets someone tweak the transcribed texts—perfect for those times when background noises mess up the transcription or some words get lost in translation. A quick edit, and the translated text gets updated accordingly.
Learnings and Conclusions:
My project is still very much a work in progress. The current limitation on the recording duration is still a challenge, courtesy of restrictions on buffer size and the number of allowable API calls at any given time. If I had known about the potential of utilizing websockets for sending audio earlier, I might have taken that route, despite the exposure of the API key. If there's anything I've gleaned from this project, it's that every path has its trade-offs; the choice ultimately depends on the needs of the product.
As for my experience with ChatGPT, it has helped me tremendously. However, I remain uncertain about its practicality for someone building a product based on limitations of ChatGPT. However, it does provide immense support, but it doesn't remove the necessity for you to learn, understand, and build independently, as it won't always be able to lend a helping hand.
You can see the working project here: https://teal-kitsune-e89718.netlify.app/
IT Business Operations Director | Strategy & Planning | Chief of Staff at ServiceNow
1 年That’s awesome! Great job, Wan!