登录查看更多内容

Activation functions and its types:

Abhishek Zirange

Consultant | Product Engineer | Freelance Python Developer , Django, NLP, Machine learning | Redis | Celery | Web Scraping Specialist

发布日期: 2020年7月25日

NOTE: I would recommend reading up on the basics of Artificial Neural Network before reading this article for better understanding

Activation functions are really important for an Artificial neural network to learn and make sense of something complicated and Non-linear complex functional mappings between the inputs and response variable. They introduce non-linear properties to our Network. Their main purpose is to convert an input signal of a node in an A-NN to an output signal. That output signal now is used as an input in the next layer in the stack.

Specifically in A-NN we do the sum of products of inputs(X) and their corresponding Weights(W) and apply an Activation function f(x) to it to get the output of that layer and feed it as an input to the next layer.

So what does an artificial neuron do? Simply put, it calculates a “weighted sum” of its input, adds a bias, and then decides whether it should be “fired” or not.

1. Sigmoid

Sigmoid takes a real value as input and outputs another value between 0 and 1. The main reason why we use the sigmoid function is that it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output. Since the probability of anything exists only between the range of 0 and 1, sigmoid is the right choice. It is an S-shaped curve.

With the help of the Sigmoid activation function, we can reduce the loss during the time of training because it eliminates the gradient problem in the machine learning model while training.

It is an activation function of form f(x) = 1 / 1 + exp(-x) .

def sigmoid(x):

return 1/(1+ np.exp(-x))

Pros:

It is nonlinear in nature. Combinations of this function are also nonlinear!
It will give an analog activation, unlike step function.
It has a smooth gradient too.
It’s good for a classifier.
The output of the activation function is always going to be in range (0,1) compared to (-inf, inf) of a linear function. So we have our activations bound in a range. Nice, it won’t blow up the activations then.

Cons :

It gives rise to the problem of “vanishing gradients”.
Its output isn’t zero centered. It makes the gradient updates go too far in different directions. 0 < output < 1, and it makes optimization harder.
Sigmoids saturate and kill gradients.
Sigmoids have slow convergence.
The network refuses to learn further or is drastically slow ( depending on the use case and until gradient /computation gets hit by floating-point value limits ).

2. Tanh Activation Function:

fig. Tanh function and derivative

Tanh is the modified version of the sigmoid function. Hence have similar properties of the sigmoid function.

fig. Sigmoidal representation of Tanh

The function and it's derivative both are monotonic
The output is zero “centric”
Optimization is easier
Derivative /Differential of the Tanh function (f’(x)) will lie between 0 and 1.

Cons:

The derivative of Tanh function suffers “Vanishing gradient and Exploding gradient problem”.
Slow convergence- as its computationally heavy. (Reason use of exponential math function )

“Tanh is preferred over the sigmoid function since it is zero centered and the gradients are not restricted to move in a certain direction”

3. ReLu Activation Function(ReLu — Rectified Linear Units):

ReLu function and Derivative of ReLu

ReLU is the non-linear activation function that has gained popularity in AI. ReLu function is also represented as f(x) = max(0,x).

The function and it's derivative both are monotonic.
The main advantage of using the ReLU function- It does not activate all the neurons at the same time.
Computationally efficient
Derivative /Differential of the Tanh function (f’(x)) will be 1 if f(x) > 0 else 0.
Converge very fast

Cons:

ReLu function is not “zero-centric”.This makes the gradient updates go too far in different directions. 0 < output < 1, and it makes optimization harder.
The dead neuron is the biggest problem. This is due to Non-differentiable at zero.

“Problem of Dying neuron/Dead neuron : As the ReLu derivative f’(x) is not 0 for the positive values of the neuron (f’(x)=1 for x ≥ 0), ReLu does not saturate (exploid) and no dead neurons (Vanishing neuron)are reported. Saturation and vanishing gradient only occur for negative values that, given to ReLu, are turned into 0- This is called the problem of dying neuron.”

4. leaky ReLu Activation Function:

Leaky ReLU function is nothing but an improved version of the ReLU function with the introduction of “constant slope”

fig. Leaky ReLu activation, Derivative

Leaky ReLU is defined to address the problem of dying neuron/dead neurons.
The problem of dying neuron/dead neuron is addressed by introducing a small slope having the negative values scaled by α enables their corresponding neurons to “stay alive”.
The function and it's derivative both are monotonic
It allows negative value during backpropagation
It is efficient and easy for computation.
Derivative of Leaky is 1 when f(x) > 0 and ranges between 0 and 1 when f(x) < 0.

Cons:

Leaky ReLU does not provide consistent predictions for negative input values.

The question was which one is better to use?

Answer to this question is that nowadays we should use ReLu which should only be applied to the hidden layers. And if your model suffers from dead neurons during training we should use leaky ReLu or Maxout function.

It’s just that Sigmoid and Tanh should not be used nowadays due to the vanishing Gradient Problem which causes a lot of problems to train, degrades the accuracy and performance of a Deep Neural Network Model.

Vijay Navaluri

Building Supervity AI ???

4 年

Good one Abhishek

1 次回应

查看更多评论

要查看或添加评论，请登录

Abhishek Zirange的更多文章

Activation functions:

2020年5月22日

Activation functions:

NOTE: I would recommend reading up the basics of Artificial Neural Network before reading this article for better…

3 条评论

Activation functions and its types:

Abhishek Zirange

Consultant | Product Engineer | Freelance Python Developer , Django, NLP, Machine learning | Redis | Celery | Web Scraping Specialist

1. Sigmoid

def sigmoid(x):

2. Tanh Activation Function:

3. ReLu Activation Function(ReLu — Rectified Linear Units):

4. leaky ReLu Activation Function:

Answer to this question is that nowadays we should use ReLu which should only be applied to the hidden layers. And if your model suffers from dead neurons during training we should use leaky ReLu or Maxout function.

Abhishek Zirange的更多文章

社区洞察

其他会员也浏览了

Moving intelligence around

Multilayer Perceptron

Unveiling the Power of Kolmogorov-Arnold Networks (KANs): Enhancing Predictive Modeling for Developers, Engineers, and Scientists

U-Net: A Convolutional Neural Network (CNN) Model, Not a Transformer

The Evolution of the YOLO Neural Network Family: From v1 to v8 (Part 3 of 3)

?? Diving into the Depths: Understanding Deep Neural Networks

What are the roles of weights and biases in neural network?

#38-39 Did you know that Insect-Brain inspired Residual Networks?

A new activation function for Neural Networks - logmoid

1. Sigmoid

def sigmoid(x):

2. Tanh Activation Function:

3. ReLu Activation Function(ReLu — Rectified Linear Units):

4. leaky ReLu Activation Function:

Answer to this question is that nowadays we should use ReLu which should only be applied to the hidden layers. And if your model suffers from dead neurons during training we should use leaky ReLu or Maxout function.

Abhishek Zirange的更多文章