Can AI Help with Your Accent?

Can AI Help with Your Accent?

After a presentation, someone from the audience asked the speaker, “Can AI help with your accent?” The speaker answered with an apology, “There might be a day when with a push of a button, AI could help change my accent into that of a native English speaker. Today, not yet.”

For ESL (English as a second language) speakers, often there are sounds that seem more unnatural than others. A coworker had shared that for his highly educated father, v and w required an effort. Often, particularly when rushed, he’d say “I am doing wery vell.” For me, it was whirled vs world. Slowing down definitely helps.

Accent is one of the factors in communication that can have an impact on how the content is understood and at times could magnify or mislead the original intent. Listen to a German, Greek, Russian, French, Portuguese, Indian (or take your own sample) say “Shut the high front door!” I understand there are many dialects and accents within each, but you get the point. Accents don’t necessarily mean among languages only. A Parisian told me, he often sweats, nods and smiles when he converses in French with a Quebecois. Surely the evolution of lexicon also has an effect. Many say certain British and Australian accents are very hard for some even native American English speakers.?If you are interested on when and how did Americans lose their British accents, see the referenced articles[1] .

I noticed one young man speaking so gently and patiently to everyone at a volunteer event for immigrants, I asked him what prompted him to be so. He said, “Growing up, we had a lot of immigrants move into our neighborhood. My grandfather told me, ‘Respect anyone who has accent. That person speaks one more language than you.”

Someone who did respect me, yet told me once, “If you just improved your accent, you’d be a much better speaker.” AI and ML were not in our everyday life back then. I didn’t want to treat myself like my accent was a sickness by getting a therapy, but I did agree I could be better so decided to try an expert.

Out of curiosity and also to see whether this is an effort I’d like to undertake, I contacted a Dialect Coach, aka Movie Accent Expert. Yes, these kinds of jobs do exist. When your favorite Canadian actor speaks Scottish English, a South African actress speaks Southern, a Texan sounds like a New Yorker, more times than not, these types of experts are usually behind the scene helping actors train.

What was the chance that this busy, highly sought after expert would respond to my inquiry was also a test in itself, but to my surprise, he actually did reply. He asked me why I wanted to work on my accent, then after hearing my rationale he said, “I work with professionals who must and want to sound authentic to deliver best performance for the short-time being, word by word. Yes you can work on making pronunciation clearer, pay more attention and practice how you physically use your tongue, vocal chords, mouth, jaw, etc. to make a sound then improve by seeing and hearing yourself. However, remember – your accent is also who you are, and I can understand you fine.” He did give me some tips and homework and was open to continue if I wished. That was my first and last lesson. I walked away so much richer than before I met him.

Think about your favorite Avenger speaking in multiple languages. Dubbing is a growing area for AI already. More on that some other time. Needless to say, your favorite Avenger sounds different across various languages. Try it today, if not already. Now imagine the hero speaking in various dialects.

For phonetically challenged, AI can provide a lot of help. It can speak for them when they type, express through winks and pupil movements. For others, audio articles can be done using these types of technologies. Whatever your aim or project may be, there are already many related efforts where AI technology is applied in speech (natural language processing, automatic speech recognition, text-to-speech, AI voice for speech synthesis, etc.) ongoing with both in terms of dangerous deep fakes but also legitimate solutions. ?Here is a sample list of players: uberduck.ai, resemble.ai, typecast.ai, lovo.ai, play.ht which boasts having hundreds of voices and accents and speechki, an audiobook recording platform that uses synthetic voices. Tech Titans also offer solutions, and countless more are coming to market. All offer combination of services through AI. AI could analyze your own speech for Alzheimer’s disease from a short voice sample[2] . See Canary Speech’s collaboration with Syntiant, Winterlight Labs as examples. Beyond the memory, speech, retina are also things that Alzheimer’s disease begins to affect.

On accent, there is already research that proposes an interesting voice and accent joint conversion approach, which can convert an arbitrary source speaker’s voice to a target speaker with non-native accent. See papers Accent and Speaker Disentanglement in Many-to-many Voice Conversion[3] , Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning[4] for starters and similar papers.

In not so distant future, a live conversation or presentation, perhaps a message like this might appear: “This presentation is delayed by 3 seconds and is being brought to you by your model Al 2022. The setting selected is #0909, baseline Ryan Reynolds, speed 1.25X, amplitude max 1.3, option Brooklyn NY with a 5% flavor from Seoul, Korea. We recommend this setting #0909, since this has the highest, 91% favorable experience with this type of audience mix. To see or change the audience mix and your audio setting click button X in the bottom right corner. Selections are the sole responsibility of the XYZ Enterprises and the presenter or listener.” Hyun Bin, Morgan Freeman, Pedro Pascal, Meryl Streep, Arnold Schwarzenegger and Sigourney Weaver might have more requests, curation and approval to do than already, to be the voiceover or baseline voice or not, for documentaries, business presentations, party DJs, Zoom dates and beyond.

If and when that day comes – after safety, reliability, humanity have been vetted – I am thinking of having a custom accent setting to say things properly. “In the human-AI whirled, I plan to do wery vell.”

What other uses would you like to see or have seen with speech AI? Whose voice would you want as baseline or flavor setting? Post your comment here @LinkedIn or send me your thoughts to [email protected]

[1] https://www.bbc.com/culture/article/20180207-how-americans-preserved-british-english

https://www.smithsonianmag.com/smithsonian-institution/when-did-Americans-Lose-British-accents-ask-smithsonian-180955291/

[2] https://spectrum.ieee.org/amp/ai-to-detect-alzheimers-2656468203

[3] https://arxiv.org/pdf/2011.08609.pdf

[4] https://assets.amazon.science/b7/2e/fe56ca414c4c99f1e63cfa4da7d0/best-of-both-worlds-robust-accented-speech-recognition-with-adversarial-transfer-learning.pdf


Shelly Swanback

CEO | Board Member | Digital Leader

2 年

Great read Jinsook

Jinsook, thanks for sharing. This was a very interesting read. For one, I love comment about patience with accents for people that speak at least one more language that you. As someone with only basic to intermediate fluency in other languages besides English, it's great advice. I have to admit I hadn't previously considered application of AI for speech and the basic idea of getting us to understand each other better, but the thought of using AI to understand pitch, tone, inflection and other components of what makes up human speech is incredible. I look forward to seeing what else emerges in this space in the months/years to come.

Jean-Luc Chatelain

Founder & Managing Partner @ Verax Capital Advisors | AI Transformation

2 年

Excellent article my friend, I hope all is well with you and your family (BTW I have no accent when I am typing) :-)

要查看或添加评论,请登录

Jinsook Han的更多文章

社区洞察

其他会员也浏览了