How I used Speech Recognition to overcome challenges due to my deafness
Guilhaume LEROY-MELINE
IBM Distinguished Engineer, Transforming Businesses with AI, Quantum and Data, IBM Consulting France
During this European Week for the Employment of People with Disabilities (#seeph2022) week, I wanted to share my personal experience as a Deaf (bilateral & profound) with Artificial Intelligence, more specifically with Speech Recognition. I strongly believe that technology is a tremendous enabler from better accessibility, and even more, when associated with creative experiences.
I will cover 35 years of innovation in speech recognition for deaf, and give some ideas on the future enabled by Quantum and Augmented Reality.
I was fortunate since my early years to be surrounded by forward-thinking people : my parents & family who always gave me access to IT technology (I started at 8 y-o with an i486 based with a sound blaster card....), my speech therapists who were very open to non-traditional approaches, my school teachers who accepted I brought exotic tools in class, IBM who supported solutions to help in my job as a Consultant.
In the 1980s, even if was very-young and my memories not so clear, I remember my speech therapist materializing the sound with vibration with sand on a table, and after using an oscilloscope. It helped me to understand how I could modulate my voice (pitch, vowels, consonants). She used piano, drums, guitar, and I had to reproduce similar waveform with my voice.
In the 1990s I changed speech therapist, due to a relocation in another region of France. I discovered SpeechViewer III, playing games with the computer on a reward-based system if I was able to pronounce phonemes correctly, at the right pitch. (... I noticed it was an IBM product 20 years later ...). My sessions with the speech therapist were so fun that I cannot wait the next session.
Three years later in 1994, my geeky speech therapist used IBM VoiceType to further continue to use speech recognition as a personal trainer, based on the idea that, if the machine could not understand me, I would not pronounce correctly. I had each week a list of single words to work, and come back to the next session with him with results. As a child, I felt honored to have a computer in my bedroom to exercise !
Then we changed Speech recognition generations (Hello Hidden Markov Models !) moving to IBM Viavoice and then Nuance Naturally Speaking, where I did not had anymore to pause at every word. At this moment I used it also to start learning to speak English using such tools (I am a French native speaker), because I could not find an English speaking speech therapist where I lived.
When you are congenital deaf, if you don't practice, you are slowly loosing your speech accuracy as hearing his own voice is difficult, and when you hear it, it's pretty distorted.
领英推荐
Then I started in 2006 as a Consultant in IBM, the most difficult was for me not to be client facing, as I strongly use lip reading, but to actively participate in conference calls. IBM financed for me real time captioning with Velotype, because at this moment, Speech recognition was not enough accurate when you had noisy background, telephony audio and multiple speakers. Until 2014, I haven't seen a sufficient strong enhancement in speech recognition to enable new use-cases, especially in French language.
In 2014 a new generation of speech recognition based on deep learning started to change the domain, by breaking the 10% word error rate limit we were facing, leading now to current generation of models that are excelling in multiple situations.
It led to the amazing application RogerVoice, I used for my phone calls able to transcribe me live, in French and in English. Of course there were still a lot of non recognized word but I was able to catch discussions, and to react live. This application was really a disruption in my work, giving a new dimension in accessibility in a global world.
Today, it's a pleasure to see automatic speech recognition (sometimes enhanced by human post process) implemented everywhere, with a good accuracy, in multiple languages : IBM make it mandatory for all internal videos, Cisco Webex, Slack and MS Teams are captioning live, as well as Youtube and streaming platforms. How accessible is digital for me now !
What's next ?
Quantum Computing is a domain I am currently working on, delivering Quantum Exploration phases to client, conducting some experiments on Natural Language Processing, teaching basis of Quantum Machine Learning to French schools. I am pretty sure that Quantum will also impact Speech Recognition, maybe breaking new barriers in noisy environments, accented speech, polylanguage recognition... The Scientific community and IBM Research are working on it, proposing new architectures leveraging Quantum. I can't wait to see the industrialized result of such research in the coming years.
Also I believe the user-experience of speech recognition for Deaf will be enhanced by the democratization of Augmented Reality. There is some experiments that were made with Google Lens, then by Hololens devices. The journey to have an optimal user experience will involve speaker localization, advanced diaritization, multi dimensional sound capture, complex noise reduction, and finally augmented reality device miniaturization and optimization. But there is here, a tremendous future, where I will be, finally able to follow group discussion in restaurants, events... This is the latest situation where I still feel fully Deaf.
Digital Marketer | Strategic Solutions and Client Partnerships at MamoTechnolabs | Analytical Thinker | Growth enthusiast
2 年Guilhaume, thanks for sharing!
A true inspiration to us all.
Bonjour Guilhaume LEROY-MELINE toujours aussi engagé dans la SEEPH. Quand reparticiperas tu aux jury du Handitech Trophy ? Tu nous manques. Lundi chez bpifrance c était la remise des Trophées et c était un grand cru d innovation. Jette un ?il !!! A bient?t j espère.
Responsible AI leader, Author of 'AI for the Rest of Us', Public Speaker
2 年Thank you for sharing your personal story!
Bonjour et merci Guillaume pour ce partage. L’expérience en orthophonie avec le sable et les instruments de musique, est particulière, j’en parlerai aux orthophonistes. Totalement d’accord sur les nouvelles technologies au service de l’accessibilité, notamment l’IA qui offre une belle opportunité aux sourds et malentendants d’accéder en temps réel aux échanges. étant utilisatrice des solutions de transcription en fonction des situations de nuisances sonores, de port de masque, je constate qu’il reste encore pas mal d’amélioration à faire pour garantir une stabilité aux transcriptions et surtout la possibilité de sous-titrer sans connexion internet ou en faible débit, quelque soit le lieu. L’idéal serait d’arriver à obtenir une parfaite transcription sans faire appel à un correcteur humain qui ne fait que maintenir la dépendance dans une certaine mesure. Je suis persuadée que nous vivrons assez vite cette révolution et je crois aussi que la réalité augmentée va transformer l’expérience utilisateur sur le plan professionnel et privé, je pense notamment au plein accès à la culture, et pas que.