登录查看更多内容

How I used Speech Recognition to overcome challenges due to my deafness

Guilhaume LEROY-MELINE

IBM Distinguished Engineer, Transforming Businesses with AI, Quantum and Data, IBM Consulting France

发布日期: 2022年11月16日

During this European Week for the Employment of People with Disabilities (#seeph2022) week, I wanted to share my personal experience as a Deaf (bilateral & profound) with Artificial Intelligence, more specifically with Speech Recognition. I strongly believe that technology is a tremendous enabler from better accessibility, and even more, when associated with creative experiences.

I will cover 35 years of innovation in speech recognition for deaf, and give some ideas on the future enabled by Quantum and Augmented Reality.

I was fortunate since my early years to be surrounded by forward-thinking people : my parents & family who always gave me access to IT technology (I started at 8 y-o with an i486 based with a sound blaster card....), my speech therapists who were very open to non-traditional approaches, my school teachers who accepted I brought exotic tools in class, IBM who supported solutions to help in my job as a Consultant.

No alt text provided for this image — Photo Unisciel

In the 1980s, even if was very-young and my memories not so clear, I remember my speech therapist materializing the sound with vibration with sand on a table, and after using an oscilloscope. It helped me to understand how I could modulate my voice (pitch, vowels, consonants). She used piano, drums, guitar, and I had to reproduce similar waveform with my voice.

In the 1990s I changed speech therapist, due to a relocation in another region of France. I discovered SpeechViewer III, playing games with the computer on a reward-based system if I was able to pronounce phonemes correctly, at the right pitch. (... I noticed it was an IBM product 20 years later ...). My sessions with the speech therapist were so fun that I cannot wait the next session.

Three years later in 1994, my geeky speech therapist used IBM VoiceType to further continue to use speech recognition as a personal trainer, based on the idea that, if the machine could not understand me, I would not pronounce correctly. I had each week a list of single words to work, and come back to the next session with him with results. As a child, I felt honored to have a computer in my bedroom to exercise !

Then we changed Speech recognition generations (Hello Hidden Markov Models !) moving to IBM Viavoice and then Nuance Naturally Speaking, where I did not had anymore to pause at every word. At this moment I used it also to start learning to speak English using such tools (I am a French native speaker), because I could not find an English speaking speech therapist where I lived.

When you are congenital deaf, if you don't practice, you are slowly loosing your speech accuracy as hearing his own voice is difficult, and when you hear it, it's pretty distorted.

领英推荐

Deaf awareness technology for all: from every day to…

Dr Emma Taylor CISSP CEng 8 个月前

Hearing Technology or Deaf Technology....

Dawn Baxter. LLB 9 个月前

Building a Future in Speech and Communication Sciences…

Dr. Shareef Povval 2 周前

Then I started in 2006 as a Consultant in IBM, the most difficult was for me not to be client facing, as I strongly use lip reading, but to actively participate in conference calls. IBM financed for me real time captioning with Velotype, because at this moment, Speech recognition was not enough accurate when you had noisy background, telephony audio and multiple speakers. Until 2014, I haven't seen a sufficient strong enhancement in speech recognition to enable new use-cases, especially in French language.

In 2014 a new generation of speech recognition based on deep learning started to change the domain, by breaking the 10% word error rate limit we were facing, leading now to current generation of models that are excelling in multiple situations.

It led to the amazing application RogerVoice, I used for my phone calls able to transcribe me live, in French and in English. Of course there were still a lot of non recognized word but I was able to catch discussions, and to react live. This application was really a disruption in my work, giving a new dimension in accessibility in a global world.

Today, it's a pleasure to see automatic speech recognition (sometimes enhanced by human post process) implemented everywhere, with a good accuracy, in multiple languages : IBM make it mandatory for all internal videos, Cisco Webex, Slack and MS Teams are captioning live, as well as Youtube and streaming platforms. How accessible is digital for me now !

What's next ?

Quantum Computing is a domain I am currently working on, delivering Quantum Exploration phases to client, conducting some experiments on Natural Language Processing, teaching basis of Quantum Machine Learning to French schools. I am pretty sure that Quantum will also impact Speech Recognition, maybe breaking new barriers in noisy environments, accented speech, polylanguage recognition... The Scientific community and IBM Research are working on it, proposing new architectures leveraging Quantum. I can't wait to see the industrialized result of such research in the coming years.

Also I believe the user-experience of speech recognition for Deaf will be enhanced by the democratization of Augmented Reality. There is some experiments that were made with Google Lens, then by Hololens devices. The journey to have an optimal user experience will involve speaker localization, advanced diaritization, multi dimensional sound capture, complex noise reduction, and finally augmented reality device miniaturization and optimization. But there is here, a tremendous future, where I will be, finally able to follow group discussion in restaurants, events... This is the latest situation where I still feel fully Deaf.

Jivnani Sagar

Digital Marketer | Strategic Solutions and Client Partnerships at MamoTechnolabs | Analytical Thinker | Growth enthusiast

2 年

Guilhaume, thanks for sharing!

Prof. Keith Burgess

2 年

A true inspiration to us all.

1 次回应

Franck Duthil

2 年

Bonjour Guilhaume LEROY-MELINE toujours aussi engagé dans la SEEPH. Quand reparticiperas tu aux jury du Handitech Trophy ? Tu nous manques. Lundi chez bpifrance c était la remise des Trophées et c était un grand cru d innovation. Jette un ?il !!! A bient?t j espère.

1 次回应

Phaedra Boinodiris

Responsible AI leader, Author of 'AI for the Rest of Us', Public Speaker

2 年

Thank you for sharing your personal story!

1 次回应

Marie-Christine Tan

2 年

Bonjour et merci Guillaume pour ce partage. L’expérience en orthophonie avec le sable et les instruments de musique, est particulière, j’en parlerai aux orthophonistes. Totalement d’accord sur les nouvelles technologies au service de l’accessibilité, notamment l’IA qui offre une belle opportunité aux sourds et malentendants d’accéder en temps réel aux échanges. étant utilisatrice des solutions de transcription en fonction des situations de nuisances sonores, de port de masque, je constate qu’il reste encore pas mal d’amélioration à faire pour garantir une stabilité aux transcriptions et surtout la possibilité de sous-titrer sans connexion internet ou en faible débit, quelque soit le lieu. L’idéal serait d’arriver à obtenir une parfaite transcription sans faire appel à un correcteur humain qui ne fait que maintenir la dépendance dans une certaine mesure. Je suis persuadée que nous vivrons assez vite cette révolution et je crois aussi que la réalité augmentée va transformer l’expérience utilisateur sur le plan professionnel et privé, je pense notamment au plein accès à la culture, et pas que.

1 次回应

查看更多评论

要查看或添加评论，请登录

Guilhaume LEROY-MELINE的更多文章

How to increase PwDA Diversity in Executive Leadership

2024年11月20日

How to increase PwDA Diversity in Executive Leadership

This week is the European Week for the Employment of People with Disabilities (better with Diverse abilities)…

10 条评论
Scalable Generative AI with Optimized Models.

2024年10月22日

Scalable Generative AI with Optimized Models.

??Each week we have new models’ updates, breaking benchmarks in multiple domains, like reasoning, extraction, querying,…

2 条评论

How I used Speech Recognition to overcome challenges due to my deafness

Guilhaume LEROY-MELINE

IBM Distinguished Engineer, Transforming Businesses with AI, Quantum and Data, IBM Consulting France

领英推荐

What's next ?

Guilhaume LEROY-MELINE的更多文章

社区洞察

其他会员也浏览了

Do Deaf People Have an Inner Voice? (Important Facts)

Sam Sepah on The Silent Revolution: How Sign Language Technology is Changing the Way We Communicate

Alexander Graham Bell (1847-1922) – The Inventor of the Telephone

Sam Sepah On The AI Revolution You Haven't Heard About

The impact of interpreters on my life as a Deaf person

DevFest Experience

Should we embrace AI and Sign Language in the UK?

Should the Development of a Technology-Based on a Sign Language Be Constituted As Cultural Appropriation?

Sign Language Curriculum for Health Professionals set to break Communication barrier

You've Received a Call from a Deaf Person, Now What?

领英推荐

What's next ?

Guilhaume LEROY-MELINE的更多文章

How to increase PwDA Diversity in Executive Leadership

Scalable Generative AI with Optimized Models.

社区洞察

其他会员也浏览了

Do Deaf People Have an Inner Voice? (Important Facts)

Sam Sepah on The Silent Revolution: How Sign Language Technology is Changing the Way We Communicate

Alexander Graham Bell (1847-1922) – The Inventor of the Telephone

Sam Sepah On The AI Revolution You Haven't Heard About

The impact of interpreters on my life as a Deaf person

DevFest Experience

Should we embrace AI and Sign Language in the UK?

Should the Development of a Technology-Based on a Sign Language Be Constituted As Cultural Appropriation?

Sign Language Curriculum for Health Professionals set to break Communication barrier

You've Received a Call from a Deaf Person, Now What?