Activation Function

Activation Function

In this article we will take a look at the final component of a perceptron i.e Activation Function .It plays a crucial role in artificial networks in order to help the network learn complex patterns in the data. The activation function at the end decides whether to fire the next Perceptron. It takes in the output signal from the previous cell and converts it into some form that can be taken as input to the next layer.

Below is the image that we looked in our first article of Neural Networks ( https://www.dhirubhai.net/posts/birbal-ekka_activity-7207942840222453760-fwwf?utm_source=share&utm_medium=member_desktop) .

On the right hand side of the image a perceptron and its different components and specifically the activation Function , which acts as an output . In Artificial Networks Activation Functions are used to map the results to 0 to 1 or -1 to 1 .. etc .

There are 2 types of Activation Function

  1. Linear activation Function
  2. Non – Linear Activation Function

Linear Activation Function :

The most basic activation Function f(x) = x . It is also referred to as identity activation. It is often used in the output Nodes, when the target is the real value.

Non-Linear Activation Function:

Non-Linear activation functions are the most used activation functions. They add non-linearity to the neural network to develop complex representations and functions based on the inputs, which would not be possible with linear activation function.

Different types of Non-Linear Activation Functions are as follows:

Sigmoid Function: It is specially used for models where we have to predict the probability as an output . Since probability of anything exists between the range of 0 and 1 , sigmoid is the right choice. Its value ranges from { 0,1 }.

However its never used in real model due to expensive computational . It also causes vanishing gradient problem and not Zero-centered .

Tanh Function:?Also known as Tangent Hyperbolic Function. Non-linear in nature its mathematically shifted version of sigmoid function . The value ranges from -1 to +1 .It is generally used in the hidden layers of neural network as its values lies between -1 to 1 .

Hence it helps in centering the data by bringing the mean close to 0 . This makes learning for the next layer much easier

ReLU ( Rectified Linear Unit ) ?Function : It’s the most used activation Function in the world right now . Implemented in the hidden layers its used in almost all the CNN ( convolutional Neural Networks ) or Deep Learning .Value ranging from { 0 , infinity ?} . Its easy to compute and doesn’t cause the Vanishing Gradient Problem.

f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero. Issue is all the negative values become zero Immediately causing some nodes to completely die ( Dying ReLU) ?and not learn? anything.


Leaky ReLU : In order to solve ?the Dying ReLU problem Leaky ReLU was invented . Its defined as ?f(x) = max(αx, x) . Where α (alpha ) is a hyperparameter generally set to 0.01.? When α (alpha ) is not 0.01 then its called Randomizied ReLU.


In Deep Neural Networks Sigmoid and Tanh are rarely used due to the Vanishing Gradient Problem. In all of the CNN , RNN ,LSTM networks ReLU or Leaky ReLU is used. These work best in their default hyperparameters that are used in popular Frameworks.?

Thank you for Reading !!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了