Deep Learning: Teach your Network How to draw!

Deep Learning: Teach your Network How to draw!


Teaching your kids!It is common that parents teach their kids how to draw images by presenting some images to their kids and ask them to draw them. Usually, kids do not copy pixel by pixel, they use a more compressed internal representation of the input image and use it to reconstruct the original image. (Some copy pixel by pixel, this is not our case :) )

Neural Networks: In our case, input is just the pixels of the input image, the desired output is the same input image. The training of the network is actually adjusting the parameter to achieve certain objective, our objective is clear: ask the network to generate the same input image from the internal/compressed representation.

We will train Neural Networks to do the same. For example consider the fully connected network:

  • Input: gray scale 64*64 images, vector length = 4096
  • Internal/compressed representation length = 1024 < 4096
  • Output: the same gray scale 64*64 images, vector length = 4096  

 

Here, after training the network, I give it some new images and asked it to reconstruct these images (First raw is the input images, second raw is the reconstructed images). Data set used in the following examples is here  

Internal layer size = 1024 (looks very nice!)

 

Internal layer size = 512 (looks good)

 

Internal layer size = 128

 

Internal layer size = only 4 (looks like average of all faces)

Wait a minute! This looks like a lossy data compression method where we need only 1024 vector (for example) to represent 4096 vector. In other words, the 1024 vector holds the important features to reconstruct the original image. This is not generic compression, it depends on the training process.  Moreover, we can control the compression ratio and the quality of the reconstructed images by changing the internal vector length. Increasing the internal length will reconstruct better looking images and will increase the parameter of the model.

Convolutional Networks

Our inputs are images, it make sense to use ConvNets, for example:

  • Input: gray scale 64*64 images, vector length = 4096
  • Convolution size 8
  • Max pooling
  • Convolution size 8
  • Max pooling
  • Output: the same gray scale 64*64 images, vector length = 4096

 

As expected, we have better results even for smaller number of features 

 

Colored images

Gray images are saved in 2 dimensional space where each point in the space represents the gray level of the corresponding pixel (from 0 to 1 or from 0 to 255). On the other hand, colored images are saved in 3 dimensional space where we have three values representing the degree of red, green and blue (R,G,B) components for each pixel.     

For neural networks, colored images is very similar to gray images, both are just vectors.

Here I used the CIFAR-10 dataset and ConvNets. 

(First raw is the input original images, second raw is the reconstructed images)

After 1 epoch (looks good)

 

After 5 epoch (looks better)

 

 

Autoencoders

What we have implemented above is known as “Autoencoders”. Training deep neural networks was very difficult. Auto encoders were used in greedy layer-wise pre-training for deep convolutional neural networks. But now we usually apply better random weight initialization schemes, Relu and batch normalization that enable using even deeper networks.

Tip: Sometimes decoder and encoder share weights as regularization strategy.

Autoencoders usually used for dimensionality reduction. With appropriate settings, autoencoders can learn data projections that are more interesting than PCA.

Autoencoders Summary 

  • Try to reconstruct the input
  • Used to learn features or summarization of the data
  • Features can be used for supervised tasks

 

Regards 

要查看或添加评论,请登录

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了