The Future of Human Assistant Support Experience: My Experience with AI Lip-Reading Technology
Built using DALL-E

The Future of Human Assistant Support Experience: My Experience with AI Lip-Reading Technology

What started off as nothing more than a thought while watching a basketball game; “Oh I wonder what they are saying?” to ending up here. Stuck in an ethical dilemma to which I have no answer to so, I write and ponder.

How would you feel if someone, anyone could read your lips without your knowledge or consent? The above seems quite science fiction and outlandish but on even on the slightest of pauses you can see the realm of reality. I’m not going to sit here and say Generative AI is the reason all of this is happening, no, it does play a part, but no. So, with that I went ahead and started down the path of first – as a lazy programmer always does – trying to see if there is any blogs, tutorials, libraries or GitHub repositories to start. I should probably clarify at this stage, I’m not an AI researcher, I’m not formally trained, just a data nerd with a curious mind. This is also not a post that walks you through how to build said model, more on that later. Jumping into this problem without considering anything other than a technical challenge in hindsight was foolish.?It’s a lot more than that, dabbling into AI assistant agents, privacy, security, and ethical boundaries. My research lead me to learn from, and be inspired by the advancements like the University of Oxford’s LipNet, which showcased remarkable accuracy in lip-reading, surpassing human proficiency, and that’s what blew my mind. Surpassing human proficiency.

Excited as I was, as the model’s accuracy improved, it meant treading into a zone rife with privacy implications. The accuracy of the model brought forth the risk of encroaching upon personal spaces and capturing conversations without consent.

This is where we begin, a dilemma; one that grapples with the balance between technological advancement and human values.?

A few meat-and-potatoes

Okay, the technology is pretty damn cool: The proficiency of the AI in lip-reading model which I will call ALRM, inspired by breakthroughs like the University of Oxford’s LipNet1?and Google’s Lip Reading Sentences in the Wild, presents significant benefits and challenges.?While ALRM was quite accurate in one-on-one conversations, the accuracy did start to weary when having to take into account other conversations. This led me to first putting aside the lip reading aspect, and work on the vision side. Leveraging Cognitive Vision, I started with tracking faces, and then slowly training it towards more features such as eyes, ears, mouth, etc. After isolating the conversations, and independently running them through the model, the accuracy was back to baseline. This lead to a very evident risk of privacy invasion became increasingly apparent. The decision to scale this technology would mean introducing a tool capable of real-time lip-reading across any camera. While the benefits for enhancing communication were clear, the implications on individual privacy were, well, scary. The potential misuse of such a tool by government agencies or other entities for surveillance or nefarious purposes isn’t too hard to imagine. The way that I tested this model was to record conversations/interviews and remove the voice completely. The model was initially trained with the voice-to-text, then it was trained just with lips-to-guess-text, and then matching it up with and running the training over and repeatedly. This gave it a very thorough understanding of the lip movements and the corresponding speech.

Counterarguments and Alternative Perspectives:

Time to argue with myself, using some examples that I thought we could all relate to.

Benefits for Various Sectors: Beyond its primary purpose – trying to know the smack talk between athletes – AI lip-reading can significantly contribute to security, education, and entertainment.?For example, it could help with people who are deaf or hard of hearing to access subtitles on TV, and to communicate in noisy surroundings3.?It could also assist in speech recognition in challenging environments, such as low-quality audio or heavy accents.?It could even enhance the experience of watching silent movies or documentaries by adding realistic dialogues.

For instance, it could violate the privacy and consent of people who are unaware or unwilling to have their lips read by a machine6.?It could also be used for malicious purposes, such as blackmailing, impersonating, or stealing sensitive information.?It could even create social and ethical dilemmas, such as whether it is acceptable to lip-read someone without their permission, or whether it is fair to use lip-reading as evidence in court.

These benefits could potentially outweigh the risks, provided there are robust safeguards and regulations in place to prevent abuse and ensure accountability. Question becomes, who regulates, and decides what those are? Governments? Microsoft, Amazon or others? Its really not clear, but one thing is we are a lot closer to is the universal translators from Star Trek.

Mitigating Risks:

Addressing concerns involves not just recognizing potential misuse but actively finding solutions, like establishing transparent guidelines and obtaining explicit consent. For example, AI lip-reading could be used only with the permission of the speaker, or with clear indicators that the tool is in use. Moreover, the data collected by the tool could be encrypted and anonymized to protect the privacy of the users.

One of the existing frameworks that could help to regulate this technology is the EU’s General Data Protection Regulation (GDPR), which sets strict rules for the processing of personal data, including biometric data, and grants data subjects various rights, such as the right to access, rectify, erase, and object to their data. Another framework that could provide some ethical principles for the development and use of AI lip-reading is the IEEE’s Ethically Aligned Design (EAD), which aims to ensure that AI systems are aligned with human values and well-being, and respect human autonomy, privacy, and dignity.

Conclusion:

As a data nerd with a curious mind, I enjoyed playing with this technology and its possibilities, but I also realized that it is not a joke or a game. It is a tool that can have profound impacts on people’s lives, and therefore, it should be used with caution and responsibility. Looking for a clear position for me remains quite unclear. There are a few things that are foundational and should not be encroached upon;

  • Individual privacy, just as you can’t record someone without their knowledge, the same applies here
  • Share-responsibility is the path forward. It can’t just be the ones building the technology that are creating the guard-rails. We need everyone at the table.
  • Ethical boundaries exist, they are just murky at best

Cautious optimism, and constant self-reflection is how I’m going to be approaching the current world of AI. As it evolves and matures – hopefully – we will uncover new problems, but the foundation laid today is critical.

As Uncle Ben would be “With great power, comes great responsibility”; The problem being, do we really understand AI's power?


PS: if you’ve made It this far and are curious, I was watching the NBA Lakers vs Houston.

要查看或添加评论,请登录

Angad Soni的更多文章

  • Microsoft Fabric - Understanding Tokens for Copilot

    Microsoft Fabric - Understanding Tokens for Copilot

    Fabric Copilot Pricing: A High-Level Guide In today’s world of artificial intelligence and data analysis, Fabric…

  • Shadow A.I.

    Shadow A.I.

    The adage "What was once old is new again" rings especially true when talking about Shadow A.I.

    4 条评论
  • Summary: Large Language Models Are Amazing, But Nobody Knows Why

    Summary: Large Language Models Are Amazing, But Nobody Knows Why

    In this thought-provoking article by MIT Technology Review, the author delves into the enigma surrounding large…

    1 条评论
  • The changing world of Education

    The changing world of Education

    Education is changing. I can wake up tomorrow in any part of the world and with access to the internet, take a course…

    1 条评论
  • Take a break and increase your productivity.

    Take a break and increase your productivity.

    Too often we forget that we need to give our brain a little rest, it is a muscle after all. A few months ago, I found…

    4 条评论

社区洞察

其他会员也浏览了