Understanding Activation Functions in Neural Networks:
Dohessiekan Xavier Gnondoyi
Student at the African Leadership University in Software Engineering, Cloudoor
When building neural networks, one of the essential elements that drives their performance is the activation function. Activation functions are crucial for transforming the input into outputs, determining whether a neuron should be activated or not, and introducing non-linearity into the system.
In this blog post, we’ll explore the purpose of activation functions and compare some of the most widely used ones:
What Is an Activation Function?
In a neural network, the input is multiplied by weights and then passed through an activation function to produce the output. Without an activation function, the network behaves like a simple linear model, incapable of solving complex tasks.
The main goals of an activation function are:
Let's now dive into each activation function and explore its use cases, advantages, and limitations.
1. Binary Step Function
The Binary step function is the simplest activation function. It outputs either a 0 or 1 depending on whether the input is less than or greater than a certain threshold, typically 0.
Formula:
1 & \text{if } x \geq 0 \\ 0 & \text{if } x < 0 \end{cases}
Example:
If an input is positive, the output is 1. If it’s negative, the output is 0.
Pros:
Cons:
2. Linear Activation Function
The Linear activation function is just a straight line where the output is proportional to the input. Mathematically, it looks like this:
Formula:
f(x)=axf(x) = axf(x)=ax
Where a is a constant.
Example:
For any input x, the output would be ax. For example, if a=2 and x=3, then the output will be 6.
Pros:
Cons:
3. Sigmoid Activation Function
The Sigmoid function outputs a value between 0 and 1. It is one of the most commonly used activation functions for binary classification tasks.
Formula:
f(x)=11+e?xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e?x1
Example:
For an input of 2, the sigmoid function would output about 0.88.
领英推荐
Pros:
Cons:
4. Tanh Activation Function
The Tanh (hyperbolic tangent) function is similar to the sigmoid but outputs values between -1 and 1, making it more useful in certain tasks where negative outputs are required.
Formula:
f(x)=tanh(x)=21+e?2x?1f(x) = \tanh(x) = \frac{2}{1 + e^{-2x}} - 1f(x)=tanh(x)=1+e?2x2?1
Example:
For an input of 1, the tanh function would output about 0.76.
Pros:
Cons:
5. ReLU (Rectified Linear Unit)
The ReLU function has become the de facto standard for hidden layers in neural networks. It is defined as the positive part of its input.
Formula:
f(x)=max(0,x)f(x) = \max(0, x)f(x)=max(0,x)
Example:
For an input of -2, the ReLU function would output 0, whereas for an input of 3, the output is 3.
Pros:
Cons:
6. Softmax Activation Function
The Softmax function is used in the output layer for multi-class classification problems. It converts raw input into a probability distribution, where the sum of the outputs is 1.
Formula:
f(xi)=exi∑jexjf(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}f(xi)=∑jexjexi
Example:
For an input array [2, 1, 0.1], the softmax function would produce something like [0.7, 0.2, 0.1].
Pros:
Cons:
Conclusion
Activation functions play a critical role in enabling neural networks to learn complex patterns. Whether you're working with binary classification, multi-class tasks, or regression problems, choosing the right activation function can make or break the model's performance. While simpler functions like the binary step and linear activations serve specific roles, modern deep learning relies heavily on non-linear functions like ReLU, sigmoid, and softmax for effective learning.