In this tutorial, we will introduce how to implement mixed precision training with torch.cuda.amp.GradScaler in PyTorch, wich can speed up our training.
How to use torch.cuda.amp.GradScaler?
In pytorch, we can train a model with code below:
optimizer = ... for epoch in range(...): for i, sample in enumerate(dataloader): inputs, labels = sample optimizer.zero_grad() # Forward Pass outputs = model(inputs) # Compute Loss and Perform Back-propagation loss = loss_fn(outputs, labels) loss.backward() # Update Optimizer optimizer.step()
However, if you plan to train a model with mixed precision, we can do as follows:
from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() for epoch in epochs: for input, target in data: optimizer.zero_grad() # Runs the forward pass with autocasting. with autocast(device_type='cuda', dtype=torch.float16): output = model(input) loss = loss_fn(output, target) # Scales loss. Calls backward() on scaled loss to create scaled gradients. # Backward passes under autocast are not recommended. # Backward ops run in the same dtype autocast chose for corresponding forward ops. scaler.scale(loss).backward() # scaler.step() first unscales the gradients of the optimizer's assigned params. # If these gradients do not contain infs or NaNs, optimizer.step() is then called, # otherwise, optimizer.step() is skipped. scaler.step(optimizer) # Updates the scale for next iteration. scaler.update()
Here we use GradScaler() and autocast() to implement.