登录查看更多内容

AI for Signal Language [SLI]

Luis Urso

发布日期: 2022年12月31日

I really believe that Artificial Intelligence [AI] field of study can improve people lives and offers low cost and inclusive solutions. In my mind, this belief has been so strong that made me start thinking how could I use AI to make something meaningful for the society, and the first thing that I got was how could I use AI to interpret signal language, and other computing functions like move a mouse, type a text, or even turn on/off devices using hand movements in a matter that I could train the computer on the my way, or share this training abroad (e.g. download an already trained American Signal Language [ASL] or Linguagem Brasileira de Sinais [Libras]). Wouldn't be nice ?

I am happy to announce that I have made significant progress on this, at a point that I developed a possible architecture, and started developing the code that is published in my GitHub at Signal Language Interpreter. So far, I was able to implement the hands movements recognition and position changes differentiation, which is key to provide the hand landmarks (XYZ coordinates) as features to my Convolutional Neural Network (CNN), under development yet. The overall architecture is in the (Figure 2) below:

No alt text provided for this image — Figure 2. SLI Architecture

Basically the algorithm works on this way:

I am using the MediaPipe Hand API to get the 21 Hand Land Mark Points (XYZ) coordinates based on the image received from the webcam.
I am transforming the image in a tensor that I am feeding 2 outputs: 1 - video out + landmarks / kinetics to monitor; 2 - a tensor to be analyzed by the algorithm to detect hand movements. This approach is important to gain the required performance that this type of application demands.
I am splitting the tensor in 3 vectors (X,Y,Z). From this point I can start applying the filters to identify the hand movements changes in a fluid way
First I apply a filter that makes the weighted average for all axis + a threshold factor to evaluate if a hand movement has occurred. I give higher weight for the finger tips (4, 8, 12, 16, and 20) (see Figure 1 for reference), the other joints have weight = 1. I am still analyzing if I will include a special weight for the point 0 (wrist), which is very easy to do by just changing the 0 index of the weight vector.
If the criteria above passed, then I apply a Pearson Correlation for the 3 axis to evaluate the significance of the hand movement changes. If the correlation is closer to 1, means that no relevant hand movement occurred. This filter is important since it corrects the camera zoom in/out paradigm, or hand movements (left/right or upper/down) without change the sign. I am calling this filter as the "camera stabilizer" algorithm. I am still adjusting this filter and the latest test demonstrated that any r < 0.97 for X,Y axis means a movement change.
Before I process the features, I normalized the Xn,Yn coordinates in order to keep the hands proportion and shape independent of the position of the hand in the camera coordinates space, the wrist point was consider as the scale factor and all other Xn and Yn (which n>0) coordinates subtracting the X0 and Y0?,and at the end X0 and Y0?were set to 0.
Implemented the following deep learning architecture (Figure 3):

It is important to observe that besides of the dense layers, 2 dropout layers were introduced to set randomly the defined percentage of inputs to 0 during each training step, which helps prevent overfitting, and for the last layer it was used the Softmax (Brownlee, 2020) activation function that deals with classification problems.

As the optimizer it was used the Adam algorithm since according to (Kingma et al., 2017), it is "computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters”. This model fitted ideally to the defined tenants for the application.

For the loss function, the Sparse Categorical Cross-Entropy model was selected since according to (Koech, 2020) this loss function is recommended for classification of 2 or more label classes to be predicted.

It was defined for the training execution a batch size of 128 and 1000 epochs. These definitions demonstrated to be sufficient to minimize the loss while gave a high accuracy, in general greater than 90%.?

Results

The neural network training performed extremely well. Despite the fact I am still on the course of providing more labels and classifications, the accuracy has been at 96% and the inference tests have shown a response time under 1 milliseconds, which is an amazing score.

In the (Figure 4) the confusion matrix for 13 labels:

领英推荐

Top Weekend Reading in Artificial Intelligence

Michael Spencer 2 年前

Unlocking LLM Potential with Memory Compression: ARM…

Ganesh Raju 6 个月前

The Comprehensive Guide to Understanding Key AI and…

William W Collins 8 个月前

Conclusion:

The designed methodology and application have performed very well, in fact, much better than I personally expected. This methodology can be leveraged for other use cases, like defects identification in aleatory real work coordinates using computer vision, body language identification, touchless commands, and many others.

References:

Agarwal, A.;?Archbold, S.;?Au A. 2021. World report on hearing. World Health Organization. Available at: https://www.who.int/teams/noncommunicable-diseases/sensory-functions-disability-and-rehabilitation/highlighting-priorities-for-ear-and-hearing-care. Accessed on: Feb 01, 2023.

Herman, R.; Roy, P.; Kyle F. 2017. Reading and dyslexia in deaf children. Nuffield Foundation, London, England, United Kingdom. Available at: https://www.city.ac.uk/__data/assets/pdf_file/0005/564170/Reading-and-Dyslexia-in-Deaf-Children-Herman-Roy-Kyle-2017-FINAL.pdf. Accessed on: Feb 01, 2023.

Parton, S.B. 2005.Sign language recognition and translation:A multidisciplined approach from the field of artificial intelligence. Oxford Academic, Oxford, England, United Kingdom. Available at: https://academic.oup.com/jdsde/article/11/1/94/410770?login=false. Accessed on: Feb 01, 2023.

Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.;Hays, M.; Zhang, F.; Chang, C.; Yong, M.G.; Lee, J.; Chang, W.; Hua, W.; Georg, M.; Grundmann, M. 2019. Mediapie: A framework for building perception pipelines. arXiv. Available at: https://arxiv.org/abs/1906.08172. Accessed on: Feb 01, 2023.

Brownlee, J. 2020. Softmax activation function with Python. Machine learn mastery. Available at: https://machinelearningmastery.com/softmax-activation-function-with-python/. Accessed on: Feb 01, 2023.

Kingma, D.P.; Ba, J. 2017. Adam: A method for stochastic optimization. axRiv. Available at: https://arxiv.org/abs/1412.6980. Accessed on: Feb 01, 2023.?

Koech, K.E. 2020. Cross-entropy loss function. Towards Data Science. Available at: https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e. Accessed on: Feb 01, 2023.

Thanks for reading :)

Luis Urso

1 年

Very soon I will update this paper. I finally was able to train the 26 ASL letter + 10 number reaching 96% of accuracy which is amazing and an inference response time lower than 1 milisecond using TensorFlow Lite. I will now present this paper and as soon as I get it approved I will publih it at arXiv. The code is open-source and is already available at my GitHub (Luis-Urso/USP-ESALQ)

Luis Urso

2 年

Finally I reached the version 10 of my AI that is working perfectly. So many hours dedicated to this paper was priceless, the amount of learnings and ideas that came out were amazing and powerful. I would recommend for anyone trying to deep in AI world to get a use case from real life and try to solve the problem !

Luis Urso

2 年

See the latest reviews I did in the article. I have include some of the learnings and solutions I have found

Allan Caberlon Franke

Sr Product Manager na Sanofi

2 年

Very interesting project! I fiddled with chatGPT recently and it's crazy how fast things are evolving in this field. Imagine smart glasses adoption and this, everyone would easily communicate!

1 次回应

Graham Millar

Director, Global Operations Eli Lilly and Company

2 年

Rhian Taylor-Moore

2 次回应

查看更多评论

要查看或添加评论，请登录

Luis Urso的更多文章

Conscious Usage of Artificial Intelligence

2024年2月28日

Conscious Usage of Artificial Intelligence

It is undeniable that the increase of artificial intelligence [AI] solutions have been revolutionizing the human life…

8 条评论
Artificial Intelligence Unconscious Bias

2022年11月4日

Artificial Intelligence Unconscious Bias

Can Artificial Intelligence Algorithms have unconscious biases like humans ? Well before explore that, let's understand…

4 条评论
AI - Order in the Chaos or What !

2022年10月12日

AI - Order in the Chaos or What !

“In all chaos there is a cosmos, in all disorder a secret order” (Carl Jung. CW 9i, Page 32, Para 66.

AI for Signal Language [SLI]

Luis Urso

Basically the algorithm works on this way:

Results

领英推荐

Conclusion:

References:

Luis Urso的更多文章

社区洞察

其他会员也浏览了

In search of equivalent of CNNs for wireless communication

The 5 Blind Spots of Synthetic Responses: Simulating Insights and Tribute Bands

Large Language Models - Part 3

The Dawn of Artificial Intelligence: An Exploration

The Evolution of Artificial Intelligence (AI)

Hallucinations in LLMs: bug or feature?

Transformers Made Simple: A User-Friendly guide to Formal Algorithms for Transformers

AI is Dead! Long Live AI

Where Semantics and Machine Learning Converge

Top AI/ML Papers of the Week [23/09 - 29/09]

Basically the algorithm works on this way:

Results

领英推荐

Conclusion:

References:

Luis Urso的更多文章

Conscious Usage of Artificial Intelligence

Artificial Intelligence Unconscious Bias

AI - Order in the Chaos or What !

社区洞察

其他会员也浏览了

In search of equivalent of CNNs for wireless communication

The 5 Blind Spots of Synthetic Responses: Simulating Insights and Tribute Bands

Large Language Models - Part 3

The Dawn of Artificial Intelligence: An Exploration

The Evolution of Artificial Intelligence (AI)

Hallucinations in LLMs: bug or feature?

Transformers Made Simple: A User-Friendly guide to Formal Algorithms for Transformers

AI is Dead! Long Live AI

Where Semantics and Machine Learning Converge

Top AI/ML Papers of the Week [23/09 - 29/09]