An Explain to GELU Activation Function – Deep Learning Tutorial

By | January 4, 2022

GELU (GAUSSIAN ERROR LINEAR UNITS) activation function is often used in Bert, GPT-3. In this tutorial, we will introduce it for deep learning beginners.

GELU Activation Function

GELU is defined as:

GELU activation function

It can be computed as:

\[0.5 x\left(1+\tanh \left[\sqrt{2 / \pi}\left(x+0.044715 x^{3}\right)\right]\right) \text { or } x \sigma(1.702 x)\]

It looks like:

the image of gelu function

GELU vs ELU and ReLU

From paper: GAUSSIAN ERROR LINEAR UNITS (GELUS), we can find:

For the numerous datasets evaluated in this paper, the GELU exceeded the accuracy of the ELU and
ReLU consistently, making it a viable alternative to previous nonlinearities.

Leave a Reply