AI Series: Deep into Deep Learning (Light version)

AI Series: Deep into Deep Learning (Light version)

If you already went through my previous article than you can just skip this one, however for those who didn’t or didn’t make it through (but yet survived at the experience), here I provide a much shorter walk into the magic of Deep Learning and neural networks.

Imagine to be part of the biggest selfie ever, with millions of people, and being able to identify a specific face…in less than 5 seconds. The difficult part would be fitting millions of faces into a picture, not the ability to recognize a face in few seconds out of millions of faces.

This ability is already a reality thanks to Deep Learning, a machine learning technique that implements Artificial Neural Networks which consists of many highly connected artificial nodes arranged in layers, which are loosely analogous to the human biological neurons, how they are connected and how they exchange information and learn.

Among the most popular types of neural networks today are the Convolutional Neural Networks (CNN), specialized in object recognition, which took additional inspiration from the human visual system.

How Convolutional Neural Networks work.

CNNs implement several layers, called hidden layers as they are located between the input and the output layers, that progressively process each pixel of the input picture, identifying discriminating patterns and creating higher generalizations of the input data in order to define and train a model capable of detecting specific objects in the input images.

To isolate and identify specific patterns (features) within the original picture, the network layers implement different filters including Convolution, Pooling and various Activating Functions. These algorithms will progressively simplify the incoming information making the features detected a bit more robust and a little less sensitive to position allowing for some shift of the feature within the original image.

Thanks to the weight associated with each node, which mimic our biological synapsis, features that strongly characterize the object in the original image will be carried along layer after layer while the elements that are invariant in the determination of the object will lose weight and eventually disappear, along the process.

At the end of the process, Convolutional Networks implement one or more additional layers called Fully Connected Layers, where the values associated with each identified feature are sequentially listed in a long array and every value becomes a vote which determines how strongly that specific pattern predicts the presence of the object we are looking for in the original image.

To make solid predictions, Deep Learning and the underlying neural network require a long training process during which the system has to learn how to autonomously identify an object. For this to happen, the process will require a huge dataset of training images containing the object from which it will learn, training cycle after training cycle, all the features that discriminate that object. A highly discriminating set of features will then determine a robust and generalized model for future qualitative predictions over images that have never seen before. The learning process, similarly to what happens in our biological brain, will strengthen the connections between neurons that carry the discriminating features by adjusting the weight value associated with each neuron.

Most of the Deep Learning use cases, implements an approach known as Supervised Learning which goal is to find a function that best maps a set of labelled inputs to their correct output. An example would be a classification task, where the input is an image of letter, and the correct output is the name of the letter. It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers; the algorithm iteratively makes predictions on the training data and is corrected by the teacher.

Once you start feeding the neural network with the training images, the initial prediction quality on what the object represent will be very poor but it will improve over time, as the output of the network is compared with what the correct answer should have been and the difference, or the error, is used to adjust the weights values, slightly up or slightly down, across the entire network.

The training will stop when the error between prediction and correct answer reaches its smallest value and therefore there are no further significant adjustments to be made in the network.

The network is than tested on new images of the same object, to validate if the learned model is solid enough to be able to correctly classify objects in images that have never seen before…including all the faces in your millions-people selfie!

Shawn L. Frugé, Psy.D., QME

CEO & Co-Founder of Frugé Psychological Associates, Inc.

6 年

Thanks, that was informative. I always enjoy and learn from your articles.

回复
Shawn L. Frugé, Psy.D., QME

CEO & Co-Founder of Frugé Psychological Associates, Inc.

6 年

Great summary on deep learning. Is the technology simply limited to visual information, or can it also be used to discriminate auditory, olfactory, and tactile data? The more senses, although artificial, engaged, the broader and deeper the learning experience. After all, the organic model for AI is a fully integrated system.

回复

要查看或添加评论,请登录

Michele Vaccaro的更多文章

社区洞察

其他会员也浏览了