Mixed Precision Training for Beginners

In this tutorial, we will introduce what is mixed precision training, how about the effect of it and how to use it.

What is mixed precision training?

Mixed precision training means we will use float32 or float16 precision when traing a model, it has two benefits:

Decrease the required amount of memory.

Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Lowering the required memory enables training of larger models or training with larger minibatches.

Shorten the training or inference time.

Execution time can be sensitive to memory or arithmetic bandwidth. Half-precision halves the number of bytes accessed, thus reducing the time spent in memory-limited layers.

The effect of mixed precision training

From paper: MIXED PRECISION TRAINING, we can find: Mixed precision training does not decrease the effect of model.

How to implement mixed precision training?

If you are using pytorch, you can use torch.cuda.amp.GradScaler to implement, here is the tutorial:

Implement Mixed Precision Training with GradScaler in PyTorch – PyTorch Tutorial