Top 10 Activation Function's Advantages & Disadvantages

Top 10 Activation Function's Advantages & Disadvantages

Sigmoid:-

Normally used as the output of a binary probabilistic function.

Advantages:

-> Gives you a smooth gradient while converging.

-> One of the best Normalised functions.

-> Gives a clear prediction(classification) with 1 & 0.

Disadvantages:

-> Prone to Vanishing Gradient problem.

-> Not a zero-centric function(Always gives a positive values).

-> Computationally expensive function(exponential in nature).


Tanh:-

Normally used as the input of a binary probabilistic function.

Advantages:

-> Zero-centric function unlike Sigmoid.

-> It is a smooth gradient converging function.

Disadvantages:

-> Prone to Vanishing Gradient function.

-> Computationally expensive function(exponential in nature).

RELU:-(Rectified Linear Unit)

Advantages:

-> Can deal with Vanishing Gradient problem.

-> Computationally inexpensive function(linear in nature).

Disadvantages:

-> Not a zero-centric function.

-> Gives zero value as inactive in the negative axis.

Leaky RELU:-

It is the same as of RELU function except it gives some partial value(0.01 instead zero as of RELU) in the negative axis.

ELU:- (Exponential Linear Unit)

Advantages:

-> Gives smoother convergence for any negative axis value.

-> For any positive output, it behaves like a step function and gives a constant output.

SoftMax:-

Normally used as the output in multi-class classification problems to find out different probabilities for different classes(Unlike Sigmoid which is prefered for a binary-class classification).

PRELU:- (Parametric RELU)

The advantage of PRELU is it has the learning parameter function which fine-tunes the activation function based on its learning rate(unlike zero in the case of RELU and 0.01 in the case of Leaky RELU).

SWISH:-

Also known as self gated function. This activation function is one of the kinds that is being inspired by the use of the Sigmoid function inside an LSTM(Long Short Term Memory) based network.

Advantages:

-> Can deal with Vanishing Gradient problem.

-> The output is a workaround between RELU and Sigmoid function which helps in normalising the output.

Disadvantage:

-> Computationally expensive function (as of Sigmoid).

MaxOut:-

Also known as the Learnable Activation Function.

It has all the advantages of a RELU function but at the same time do not have its disadvantages.

SoftPlus:-

Advantages:

-> Convergence of gradient is smoother than RELU function.

-> It can handle the Vanishing Gradient problem.

Disadvantage:

-> Computationally expensive than RELU(as of exponential in nature).


Thanks for going through this article. This is just a brief of the frequently used 10 activation function's advantages and disadvantages. Although there is a lot to cover about each of the activation functions, I hope this was meaningful to all of you.

I will be sharing my knowledge every now and then based on my availability.

Peace :)





Sunny P.

MSc in Artificial Intelligence & Machine Learning | BE in Mechatronics Engineering | Data Scientist at Honda Canada Inc. and LTIMindtree Canada

2 年

informative article! thanks for sharing

要查看或添加评论,请登录

Aditya Bikram Dash的更多文章

  • Leveraging Positive Friction in Product Marketing: A Key to Enhanced User Engagement

    Leveraging Positive Friction in Product Marketing: A Key to Enhanced User Engagement

    In the dynamic world of user experience, 'friction' is typically viewed as a hurdle. Yet, when applied strategically…

  • Riding the Market Wave: Advanced Demand Testing in Online Retail

    Riding the Market Wave: Advanced Demand Testing in Online Retail

    Introduction: Welcome to the pulse of the online retail world, where understanding and predicting market demand isn't…

  • Unveiling the Unseen: Adventures in Product Value Discovery

    Unveiling the Unseen: Adventures in Product Value Discovery

    Hey Product Pals! Ever wondered what makes a product go from 'meh' to 'must-have'? During my time at one of the…

  • Brilliance Bias in GPT-3 (Chat GPT!!!)

    Brilliance Bias in GPT-3 (Chat GPT!!!)

    Chat GPT has taken the globe by storm. I recently studied an IEEE paper titled "Brilliance Bias in GPT-3," which is an…

    2 条评论
  • Netflix - Microsoft Partnered Finally!

    Netflix - Microsoft Partnered Finally!

    To counteract its sluggish subscriber growth, Netflix Inc (NFLX.O) announced on Wednesday that it had chosen Microsoft…

  • Revolution of Online Learning

    Revolution of Online Learning

    This fast-evolving world of Data science has always put me in awe. This time it’s the revolution of Online Learning…

  • What should be your Product's Price?

    What should be your Product's Price?

    Deciding the Price is the most important part of your new launching product, don't you think the same way!!! As for the…

  • Learn, Unlearn & ReLearn.

    Learn, Unlearn & ReLearn.

    "Its difficult to learn mathematics. Its difficult to learn coding.

  • DATA is an ART !!!

    DATA is an ART !!!

    As Aristotle once said, "The aim of ART is not to represent outward appearance of things, but their inward…

    2 条评论
  • Briefing Machine Learning Algorithms

    Briefing Machine Learning Algorithms

    1.) Naive Bayes Classifier Algorithm If we’re planning to automatically classify web pages, forum posts, blog snippets…

    6 条评论

社区洞察

其他会员也浏览了