AI for Signal Language [SLI]
I really believe that Artificial Intelligence [AI] field of study can improve people lives and offers low cost and inclusive solutions. In my mind, this belief has been so strong that made me start thinking how could I use AI to make something meaningful for the society, and the first thing that I got was how could I use AI to interpret signal language, and other computing functions like move a mouse, type a text, or even turn on/off devices using hand movements in a matter that I could train the computer on the my way, or share this training abroad (e.g. download an already trained American Signal Language [ASL] or Linguagem Brasileira de Sinais [Libras]). Wouldn't be nice ?
I am happy to announce that I have made significant progress on this, at a point that I developed a possible architecture, and started developing the code that is published in my GitHub at Signal Language Interpreter. So far, I was able to implement the hands movements recognition and position changes differentiation, which is key to provide the hand landmarks (XYZ coordinates) as features to my Convolutional Neural Network (CNN), under development yet. The overall architecture is in the (Figure 2) below:
Basically the algorithm works on this way:
It is important to observe that besides of the dense layers, 2 dropout layers were introduced to set randomly the defined percentage of inputs to 0 during each training step, which helps prevent overfitting, and for the last layer it was used the Softmax (Brownlee, 2020) activation function that deals with classification problems.
As the optimizer it was used the Adam algorithm since according to (Kingma et al., 2017), it is "computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters”. This model fitted ideally to the defined tenants for the application.
For the loss function, the Sparse Categorical Cross-Entropy model was selected since according to (Koech, 2020) this loss function is recommended for classification of 2 or more label classes to be predicted.
It was defined for the training execution a batch size of 128 and 1000 epochs. These definitions demonstrated to be sufficient to minimize the loss while gave a high accuracy, in general greater than 90%.?
Results
The neural network training performed extremely well. Despite the fact I am still on the course of providing more labels and classifications, the accuracy has been at 96% and the inference tests have shown a response time under 1 milliseconds, which is an amazing score.
In the (Figure 4) the confusion matrix for 13 labels:
领英推荐
Conclusion:
The designed methodology and application have performed very well, in fact, much better than I personally expected. This methodology can be leveraged for other use cases, like defects identification in aleatory real work coordinates using computer vision, body language identification, touchless commands, and many others.
References:
Agarwal, A.;?Archbold, S.;?Au A. 2021. World report on hearing. World Health Organization. Available at: https://www.who.int/teams/noncommunicable-diseases/sensory-functions-disability-and-rehabilitation/highlighting-priorities-for-ear-and-hearing-care. Accessed on: Feb 01, 2023.
Herman, R.; Roy, P.; Kyle F. 2017. Reading and dyslexia in deaf children. Nuffield Foundation, London, England, United Kingdom. Available at: https://www.city.ac.uk/__data/assets/pdf_file/0005/564170/Reading-and-Dyslexia-in-Deaf-Children-Herman-Roy-Kyle-2017-FINAL.pdf. Accessed on: Feb 01, 2023.
Parton, S.B. 2005.Sign language recognition and translation:A multidisciplined approach from the field of artificial intelligence. Oxford Academic, Oxford, England, United Kingdom. Available at: https://academic.oup.com/jdsde/article/11/1/94/410770?login=false. Accessed on: Feb 01, 2023.
Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.;Hays, M.; Zhang, F.; Chang, C.; Yong, M.G.; Lee, J.; Chang, W.; Hua, W.; Georg, M.; Grundmann, M. 2019. Mediapie: A framework for building perception pipelines. arXiv. Available at: https://arxiv.org/abs/1906.08172. Accessed on: Feb 01, 2023.
Brownlee, J. 2020. Softmax activation function with Python. Machine learn mastery. Available at: https://machinelearningmastery.com/softmax-activation-function-with-python/. Accessed on: Feb 01, 2023.
Kingma, D.P.; Ba, J. 2017. Adam: A method for stochastic optimization. axRiv. Available at: https://arxiv.org/abs/1412.6980. Accessed on: Feb 01, 2023.?
Koech, K.E. 2020. Cross-entropy loss function. Towards Data Science. Available at: https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e. Accessed on: Feb 01, 2023.
Thanks for reading :)
Luis Urso
Very soon I will update this paper. I finally was able to train the 26 ASL letter + 10 number reaching 96% of accuracy which is amazing and an inference response time lower than 1 milisecond using TensorFlow Lite. I will now present this paper and as soon as I get it approved I will publih it at arXiv. The code is open-source and is already available at my GitHub (Luis-Urso/USP-ESALQ)
Finally I reached the version 10 of my AI that is working perfectly. So many hours dedicated to this paper was priceless, the amount of learnings and ideas that came out were amazing and powerful. I would recommend for anyone trying to deep in AI world to get a use case from real life and try to solve the problem !
See the latest reviews I did in the article. I have include some of the learnings and solutions I have found
Sr Product Manager na Sanofi
2 年Very interesting project! I fiddled with chatGPT recently and it's crazy how fast things are evolving in this field. Imagine smart glasses adoption and this, everyone would easily communicate!
Director, Global Operations Eli Lilly and Company
2 年Rhian Taylor-Moore