Neural Style Transfer: Online Image Optimization (Flexible but Slow)

Neural Style Transfer: Online Image Optimization (Flexible but Slow)

In this article, we demonstrate the power of Deep Learning, Convolutional Neural Networks (CNN) in creating artistic images via a process called Neural Style Transfer (NST). Currently, NST is well-known and a trending topic both in academic literature and industrial applications. Broadly speaking, NST can be divided into two main paradigms:

  1. Online image optimization (discussed in this article)
  2. Offline Network optimization

In this article, we focus on the first point discussing the main papers as a survey. 

Online image optimization: Overview

The main idea is to iteratively optimizing a random image, not a network, and keep changing the image in the direction of minimizing some loss. The iterative optimization process is based on gradient descent in the image space.

In this paper "Understanding Deep Image Representations by Inverting Them", the loss is defined as a simple Euclidean distance between the activations of the network based on the input and the equivalent activations of a reference image, in addition to a regularizer such as the Total Variance.

The figure above shows five possible reconstructions of the reference image obtained from the 1,000 dimensional code (vector) extracted at the VGG network trained on ImageNet.

All these five generated images produce almost the same vector of length 1000 that the original image produce. In other words, from the model's viewpoint , all these images are almost equivalent.

Example 1: Reconstruction of Images based on Content and Style

In the well-known work “Image Style Transfer Using Convolutional Neural Networks”, a new image can be constructed, through iterative optimization process in the image space, by having a loss that balances between two components, one for the content and the other for the style.   

As discussed here, the content is usually given by activations of high layers and one way to capture the style is capturing the correlation of feature maps in different layers. In this setup, the goal is to generate an image that minimizes the difference between weighted content loss plus style loss.

Example 2: Reconstruction of Images using different statistical style representation

There is another statistical style representation proposed in this paper “Demystifying neural style transfer”, where it was proved that matching the Gram matrices (proposed in the second example) is equivalent to a specific Maximum Mean Discrepancy (MMD) process. The style information is intrinsically represented by the distributions of activations in a CNN. Accordingly, ]the style transfer can be achieved by distribution alignment. Moreover, they showed several other distribution alignment methods, and find that these methods all yield promising transfer results. 

In the figure above, style reconstructions of different methods in five layers. Each row corresponds to one method and the reconstruction results are obtained by only using the style loss. In each column, different style representations are reconstructed using different subsets of layers of VGG network.

Example 3: Reconstruction of Images while preserving the Coherence

The CNN features unavoidably lose some low level information contained in the image, which make the generated images distorted and look as irregular. To preserve the coherence structures, it was proposed in “Laplacian-steered neural style transfer” to add more constrains for low level features in pixel space. Basically, Laplacian filter computes the second order derivatives of the pixels in an image and is widely used for edge detection. In their work, Laplacian loss was added, which is defined as the squared Euclidean distance between the Laplacian filter responses of a content image and stylized result.

As shown in the figure above, The Laplacian loss is defined as the mean-squared distance between the two Laplacians. Minimizing this loss drives the stylized image to have similar detail structures as the content image. and also rendered in the new style.

Finally, Deep Dreaming, can be seen as another online optimization image generation, based on input image and what the used network is trained on.

Final Note:

The online image optimization discussed here, is based on online iterative optimization process through gradient descent, applied in the image space. Accordingly, the process is time consuming especially when the desired reconstructed image is large or when having large number of images to generate. In the next article a much faster method, Offline network optimization, is discussed.

Regards

Moamen Abdelrazek

AI Engineering Manager

6 年

Very useful (Y)

Eslam Ali

Software Engineer II at Booking

6 年

very nice paper i think they applied the same strategy in?Leon A. Gatys paper -> Gatys et al., 2015. A neural algorithm of artistic style. Images on slide generated by Justin Johnson

要查看或添加评论,请登录

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了