VANILLA NEURAL NETWORKS: AN INTRODUCTION

VANILLA NEURAL NETWORKS: AN INTRODUCTION

Over the course of my studies, I’ve found that a lot of Data Science tutorials use extensive techno-jargon that can be quite off putting to a beginner level audience. This article will look to give you an introduction to Neural Networks, specifically Vanilla Neural Networks (also called Vanilla Feed Forward NN, if you’re looking to be tacky). The idea is to explain these concepts with little to no complexity.

Vanilla NNs serve as the perfect gateway to NNs as they are essentially an extension of our simple Linear Regression algorithms. So, a good place to start would be to recall the basics of Linear Regression.

Linear Regression

In linear regression, we obtain an estimate of the unknown variable (denoted by y; the output of our model) by computing a weighted (weight vector denoted by w) sum of our known variables (denoted by x; the inputs).?

No alt text provided for this image

I.e.?the output is equal to the dot product of the input and the weight vector(w).

Vanilla Neural Networks

A vanilla neural network works quite similar to the above regression model, the difference being that there exits a third layer between our inputs(x) and our output(y). This third layer is referred to as a “hidden layer (h)”. This hidden layer is connected to the output layer by another set of weight vectors.

No alt text provided for this image


These individual units of the hidden layer denoted by (h0, h1, h2) are known as neurons. We could create a hidden layer with as few or as many neurons as we require.

Thus, both Vanilla Neural Networks and Linear Regression are similar. Or are they?

The trick that NNs use to make their architecture so distinguished is that they apply a nonlinear “activation function” to the output of each layer.?

No alt text provided for this image

Basically, when we say nonlinear, we mean we cannot fit a straight line between our inputs and outputs.?

Activation Functions

An activation function in a neural network defines how the weighted sum of the input is transformed into an output in the network.

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation (introducing a non-linear function) to the input making it capable to learn and perform more complex tasks.

For now, let us denote the activation function as A(z). The input for each neuron in the hidden layer is the input vector, x. The output for each neuron is just the result of the dot product of x and w plugged into the activation function, A.

No alt text provided for this image

The hidden layer is connected to the output layer by another set of weight vectors(v).

SIDE NOTE: ACTIVATION FUNCTIONS

Primarily there are three types of non-linear activation functions that are commonly applied to our neural networks.

1)????Rectified Linear Activation (ReLu)

2)????Logistic (Sigmoid)

3)????Hyperbolic Tangent (Tanh)

No alt text provided for this image

Activation Functions are a whole topic by themselves. For now, let us proceed further with the understanding that after extensive research it was concluded that Rectified Linear Activation (ReLu) was the most effective activation function to be applied to NNs owing to its faster computation speed.

(I will also be writing articles to explain Activation Functions in more detail soon)?

Now that we know that we have to add weights to our layers and we use activation functions to introduce non-linearity, how exactly are the weights determined? This process is called Training.

A NN with two or more layers is called a Deep Neural Network.

No alt text provided for this image

Training: Back Propagation

In Linear Regression, the weights are trained with the help of an optimization algorithm called Gradient Descent. ?This process generally involves two steps: First, the algorithm randomly guesses initial starting values for all of the weights. Second, it uses those weights to make a test prediction for each training instance and computes the sum of squared errors between the predictions and the actual values (i.e., the cost function).

Refer to my article explaining Gradient Descent in detail.

https://www.dhirubhai.net/pulse/gradient-descent-people-hurry-shashank-r-shankar


Neural Networks also train their weights by a similar but slightly more complex process called Backpropagation. Backpropagation performs one gradient descent step for each “batch” of training instances. The number of batches it takes to go through all the training instances is known as an “epoch”.

However, unlike Gradient Descent, in Backpropagation the initial starting value for the weights is not guessed randomly. Different initialization parameters are used depending on our choice of activation function.

Once the weights are initialized, the backpropagation algorithm makes an initial set of test predictions for its current batch by feeding the training instances in the batch through the network from left to right. This left to right movement is known as a Forward Pass. The output for each neuron within each of the layers is also noted. Then, the output or prediction is compared with the actual values and the error in output is calculated.

Now that we have the total output error, the algorithm calculates the gradient of the output errors by moving through the network from right to left. I.e., starting with the output and ending with the input. This is known as a Reverse Pass. Since we know the output error and we kept track of the input/output values for each neuron during the forward pass, we can propagate the error backwards through the network (hence “backpropagation”) and figure out how much each neuron contributed to the total error. This allows us to directly compute the error gradient across each neuron and tells us how to tweak that neuron’s associated weight.

SUMMARY

1)????Vanilla Neural Networks are composed of input layers, an arbitrary number of hidden layers and an output layer.

2)????Non-Linearity is introduced by feeding the output of each layer through a non-linear activation function. This makes Vanilla NNs different from Linear Regression. ?

3)????Rectified Linear Activation (ReLu) is the most commonly used activation function owing to its fast computation time.?

4)????The neural net automatically trains its weights using backpropagation. The weights are initialized, each training instance undergoes a “forward pass”, a prediction is made. The total error is the calculated and propagated back through the network in a “reverse pass”. Thus, calculating the error gradient.

Nancy Chourasia

Intern at Scry AI

1 年

Interesting! Turing’s Imitation Game allowed the judge to ask the man and the computer questions related to emotions, creativity, and imagination. Hence, such AI gradually began to be known as Artificial General Intelligence (AGI). In fact, in the movie “2001: A Space Odyssey”, the computer, HAL 9000, was depicted as an AGI computer that exhibited creativity and emotions. However, the AGI systems of the early 1970s were limited to solving rudimentary problems because of the high cost of computing power and the lack of understanding of human thought processes. Hence, the hype regarding AI went bust by 1975 and the U.S. government withdrew funding. This led to the first “AI winter” where research in AI declined precipitously. Although significant advances were made during this period (e.g., the development of Multilayer Perceptrons and Recurrent Neural Networks), most of them went unnoticed. Eventually, researchers decided to constrain the notion of an AI system to the ability of performing a non-trivial human task accurately. And they started investigating AI systems that can be used for specific purposes, which can reduce human labor and time. More about this topic: https://lnkd.in/gPjFMgy7

回复
Shivangi Shantaram Shastri

Business Analyst | Data Analyst | SQL | Python | R | Data Analytics | Tableau | PowerBI | Alteryx | Data Visualization | AWS | Data Engineering | Salesforce

2 年

Well said!!

Namratha L Bemane

Product @ClickPost | B2B SaaS | API Integrations | Content & Community

2 年

Neural network Spam Classification do you have it can put into real world use case scenario using VNNs?

要查看或添加评论,请登录

Shashank Ravishankar的更多文章

社区洞察

其他会员也浏览了