Understanding Activation Functions in Deep Learning
Understanding Activation Functions in Deep Learning
Activation Functions in Deep Learning
Activation functions are a critical part of deep neural networks. They introduce non-linearity, allowing neural networks to learn complex patterns in data.
1. Sigmoid
Function:
\[\sigma(x) = \frac{1}{1 + e^{-x}}\]Properties:
- Output Range: $[0, 1]$
- Used historically: Common in early neural networks.
Problems:
- Saturation: For large positive or negative inputs, saturated neurons kills the gradients. This is because for those regions the slope tends to be zero.
- Not Zero-Centered: Can lead to inefficient gradient updates. This is because the all the weights tend to be in the same direction and the adjustments to reach the minimum takes more time.
- Exponential Cost: Computing
exp()
can be expensive in large models.
2. Tanh
Function:
\[\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]- Range: $[-1, 1]$
- Zero-Centered: More desirable than sigmoid for hidden layer activations.
- Still Saturates: Can suffer from vanishing gradients for extreme values.
This post is licensed under CC BY 4.0 by the author.