Artsy Machines - Why Prisma is smarter than Instagram Filters ?

Artsy Machines - Why Prisma is smarter than Instagram Filters ?

Eleven months ago, a group of researchers[1] at the University Of Tubingen, Germany, developed an algorithm that could morph an image to resemble a painting in the style of the great masters. A month back, an App developed by Alexey Moiseenkov and his team took the world by storm with over 7.5 million downloads in one month. To answer the question in the heading - Why is Prisma better than Instagram Filter - Well.. for starters, it is not a Filter at all, every image that prisma spits out is actually recreated by the machine in the style you intended.

This has been possible because of Deep Learning - or more precisely, Convolutional Neural Network learning algorithm for one. The applications of Deep learning are being perfected with every passing day, which means – machines can not only read addresses in a post office or hand written cheques in Banks, but also paint pictures like a Picaso or read out aloud to help the visually challenged use Facebook. This is fascinating, more so because the complexity involved in functions like vision and pattern recognition, which were hitherto considered beyond the purview of artificial intelligence are soon being peeled open - literally, layer by layer to demonstrate the ability of machines in mastering them. The very notion that differentiates the inanimate from the living - our sensory perception of stimuli is being challenged. Having said that, we are still far from achieving perfection in creating a machine that can emulate the eye or brain but the progress that is being made is stupendous.

 So, how exactly do the machines read hand written digits or recognise a style of painting and apply it to the provided picture? The best shot of achieving any of this is in understanding how humans would do it, although not all humans are truly gifted in the sense to simulate a Picasso or a Da Vinci.

To provide a very basic level of understanding ( chiefly due to my own ineptness in understanding the mathematical implications of the process), I will use the analogy used in a tutorial from www.neuralnetworksanddeeplearning.com. To be able to mathematically or logically simulate through programming what the 140 million neurons coupled with millions of ganglion and tens of millions of connections between them do is by no means a cheap feat. This, nevertheless, is not our endeavour by any means, because to be able to code to recognize a set of hand written digits in itself poses problems such as identifying segregation between numbers , understanding the strokes, and then predicting the number  all of which complicate the process beyond humanly capable coding. Our best bet hence, is to simulate our thinking process and help the machine learn on its own. Provide the machine with a huge training set, with tens of thousands of samples and allow the machine to get closer to reading the numbers by just minimizing the Error function. To provide a perspective, of how we do this, let us understand two types of Neurons – the Perceptron and the Sigmoid Neurons.

To overtly simplify It, let us assume that we are to decide whether to go and watch a movie this Saturday or not, which is a binary response of a YES/NO. The factors that we are willing to consider to help us decide are: Proximity of the theatre to our home, the weather condition, whether our partner is willing to accompany us, and the timing. Let us assume now that each of the condition has a binary deciding factor which means that if the distance is over 10 Km, I will go, else I won’t; If the Partner accompanies , I will go, Else I won’t; If the timing is between 4 and 8 PM I will go, else I won’t; If it rains I won’t go else I will. So, here in the above example, assuming that there are separate weights that you would like to attribute to each of the input, say if the distance is less than 10 KM, no matter what the other conditions are, you would be willing to go and watch or if the timing is not between 4 and 8, even if the other conditions are satisfied, you most likely won’t watch, it becomes a binary function output, wherein, if the product of the weight and the input are above a certain value, you will go , else you won’t . This is how a simple perceptron would work.

Now twist this scenario a bit, where each of the condition can have several continuous possibilities between 0 and 1. Eg. If the distance is 1 km, the possibility of your going is .9, and it is .8 if the distance is 2 km and keeps reducing further until .1 if the distance is 20 km, this becomes a continuous input variable, and is vaguely the concept of a Sigmoid Function. This allows a smooth curve with a small change in input to result in a small change in output, the advantage of which is that we can modify the weights accordingly to decide the output until a perfect output is reached.

As can be seen from the graph, a sigmoid function tends to a perceptron when the value of the input variables becomes large positive number (when it tends to 1) or large negative number (when it tends to 0) and hence becomes binary. There could be several layers between the input and output which helps in deciding the heuristics of the process. I have not introduced the concept of bias and would suggest a read through Neural Networks tutorial for a deeper appreciation of this method.

Next, we define an error function, which is the squared difference between output received from the neural net and the actual output as per the training set. The objective is then to minimise this error function to increase accuracy. The more the training set, the lesser the error, and the better the accuracy.

In most such problems, where we have a training set , which qualifies the desired output, the machine learning happens by systematically applying the concept of gradient descent.In other words , Gradient descent is a way to minimize an objective function J(θ)J(θ) parameterized by a model's parameters θ∈Rdθ∈Rd by updating the parameters in the opposite direction of the gradient of the objective function ?θJ(θ)?θJ(θ) w.r.t. to the parameters. This can be done by defining a cost function and finding the first differential to predict the slope of the cost function and minimizing it to the extent possible by moving in the direction of the global minima.

This concept, coupled with the concept of Convolutional Neural Network and Feed forward algorithm is used to identify the patterns (Style) of the sample art piece which is to be emulated. Identifying a picture is done through breaking down the picture in sub structures or features, which are either already taught to the system through supervised learning or are understood using the concepts of layering through deep learning. Convolutional Neural networks, which draws inspiration from the research done on cat’s visual cortex, exploits spatially local coherence by enforcing a local connectivity pattern between adjacent layers. Spatially contiguous visual fields are used as first level inputs to the hidden layers, which confines the learnt filters to have local patterns.  

The structure of the original photo is also recognized through Feed Forwarding Convolutional neural nets, and eventually, the both are combined to produce an image that has the structure of the image with the style of the Master Artist. In case of Prisma, the structural composition of the picture is identified, and the new image is recreated using the style of the artist chosen in the cloud servers of the company, which explains the delay in recreating the image.

As always, before concluding, I want to throw in my two cents of how this can be used in the HCM space.

On boarding: A lot of tasks in on-boarding are extremely transactional and boring and yet very vital. A future state can perhaps directly read important details like PAN number, TFN or social security number etc. from the submitted documents and input them into the ERP system directly without any human intervention thus eliminating human fatigue and errors- Read eliminate form filling.

Engagement Analysis: Sentiment analysis and Psychological predictions using Deep Learning help classify (Unsupervised learning) employees as engaged or not based on various input parameters of how they interact digitally.

Although it might seem intrusive at first, but Deep learning can be used to understand patterns in managers (aggressive emails/ pushy nature), approval frequencies, motivational or appreciation mails etc and the pattern of attrition to understand and provide tips to managers on motivating and leading their resources better.

Most of the other applications of machine learning as highlighted in my previous blogs, can also be used in tandem with Deep learning in the fields of Hiring, Performance appraisals, Social collaboration, Learning management etc.

 

 

 

 

 

 

 

 

 

[1] A Neural Algorithm of Artistic Style Leon A. Gatys,? Alexander S. Ecker, Matthias Bethge

要查看或添加评论,请登录

Pavan Swaminathan的更多文章

  • What the Future Holds : HR Tech Architecture

    What the Future Holds : HR Tech Architecture

    The HR space is flooded with funding , with capital of over $2 Billion being invested into HR Tech Startups over the…

    1 条评论
  • HR Trends of 2018

    HR Trends of 2018

    Driven by the push for cognitive digital experiences and fueled by technologies like Artificial Intelligence, Internet…

    3 条评论
  • Applying Design Thinking to HR Transformation Projects

    Applying Design Thinking to HR Transformation Projects

    The seeds for this article were sown from inspiration gathered at Tech HR conference2017, where Ekta Agarwal, former…

    1 条评论
  • DAO: The story of blockchain in new Org Structures

    DAO: The story of blockchain in new Org Structures

    I have always been fascinated by the potential that blockchain held for HR. Imagine, for instance, a Global…

  • DAO: Blockchain driven organizational structure

    DAO: Blockchain driven organizational structure

    I have always been fascinated by the potential that blockchain held for HR. Imagine, for instance, a Global…

  • The Block-Chain Resume

    The Block-Chain Resume

    My last post was about block-chain’s invaluable contribution to the crypto-currency ecosystem. As we speak about the…

    7 条评论
  • Forget Bitcoin –Let’s talk Block Chain??

    Forget Bitcoin –Let’s talk Block Chain??

    Today, power is gained by sharing knowledge, not by hoarding it” – Dharmesh Shah, Hubspot November 7 2016, 8 PM: A…

  • Technology versus Mindfulness

    Technology versus Mindfulness

    My formative years were spent in a beautiful valley “Phuentsholing”- in the foothills of Himalayas in the kingdom of…

    12 条评论
  • Conversational Design - Chat-Bots In Enterprise

    Conversational Design - Chat-Bots In Enterprise

    Venue: Oracle Open World 2016 #OOW16– A convergence of an international scale that draws in an economy of Billion…

  • Don't bug the users - they will come .. eventually!!

    Don't bug the users - they will come .. eventually!!

    Innovation is akin to lighting a lamp in a dark room -The lamp is just as bright as it always is, but our perception of…

    1 条评论

社区洞察

其他会员也浏览了