Understanding Deep Learning for Computer Vision-A simple draft version
Debi Prasad Rath
@AmazeDataAI- Technical Architect | Machine Learning | Deep Learning | NLP | Gen AI | Azure | AWS | Databricks
Dedicated all budding ****data scientists****
Deep learning has been significantly improving the predictive modeling paradigm for quite a while. In the same context it holds the major area of innovation and advancement in computer vision. In this article we are going to understand how deep learning is helping us to understand the way computer vision has been revolutionized. In this we will deeply dig into Tensor Flow, a popular deep learning tool and keras, a popular deep learning API or framework for specifying deep learning models.
Well, our machine learning experience so far is dealing with tabular data handled in python pandas. Now we are moving forward to grab our hands on with the image data. So, we will start with an overview of how images are stored using deep learning.
Let us assume that I have a handwritten digit of number 2. This number 2 in real time would be fitting into pixels. Some might be black, some might be white, and some might be in variant in shape like grey. The pixels are arranged in rows and columns so that we can represent them in a matrix. The matrix of numbers would look like representing how dark the pixel is in the image. Having said this color images have a different context behind it. It follows an extra dimension called “BGR” means that how blue or green or red that image is. In a simple way we can say that color images are stack of three matrices. In this regard a tensor is nothing but a matrix which can certainly hold any number of dimensions, hence the term Tensor Flow coined. In today’s deep learning mechanism, we use a convolution to fit into this type of tensors. A convolution is a small tensor that can be multiplied over a smaller section of the main image. A convolution is a simple filter that can be applied to main the image over and over depending upon the values in that array to pick specific pattern from the image.
An example of a hand written digit.
An example of a color image: "BGR" component as a dimension dog
Let us assume that we have this convolution tensor,
2.5 2.5
-2.5 -2.5
I propose this convolution we see above would be a horizontal line detector. If we try to multiply a part of image that has a horizontal line, then we get a large value. If we try to multiply a part of image that has not a horizontal line in it then we will get a lesser value almost equal to zero.
The data (fig 2)
200 200 - - -
200 200 - - -
Let us work this out. In fig 2, we have a part of image that has all the pixel value with same value or intensity. When we try to perform array type multiplication then we get sum as 0.
2.5*200+2.5*200-2.5*200-2.5*200 = 0
The data (fig 3)
0 0 - - -
0 0 - - -
In other way around we can test for the same, if suppose I have a part of image which is a white area. Let us work this out. In fig 3, we have a part of image that has all the pixel value with same value or intensity. When we try to perform array type multiplication then we get sum as 0.
2.5*0+2.5*0-2.5*0-2.5*0 = 0
Let us use the same convolution where we have some dark pixels and light pixels(horizontal representation as proposed ) in the data. The data would look like this,
The data (fig 4)
200 200 - - -
0 0 - - -
Now if we try to perform the same array multiplication between tensor and data , the sum would be very high.
2.5*200+2.5*200-2.5*0-2.5*0 = 1000
Amazing, this time around we did get a really large number which means that this convolution did really work well to detect a horizontal line as proposed. The other keynote is that numeric value presented in the convolution helped us to detect the pattern inside the data. During this example , this single convolution helped us understanding pattern inside the data. Henceforth, we can conclude that using multiple layers of convolutions we will be able to build powerful models.