GELU (GAUSSIAN ERROR LINEAR UNITS) activation function is often used in Bert, GPT-3. In this tutorial, we will introduce it for deep learning beginners.
GELU Activation Function
GELU is defined as:
It can be computed as:
\[0.5 x\left(1+\tanh \left[\sqrt{2 / \pi}\left(x+0.044715 x^{3}\right)\right]\right) \text { or } x \sigma(1.702 x)\]
It looks like:
GELU vs ELU and ReLU
From paper: GAUSSIAN ERROR LINEAR UNITS (GELUS), we can find:
For the numerous datasets evaluated in this paper, the GELU exceeded the accuracy of the ELU and
ReLU consistently, making it a viable alternative to previous nonlinearities.