Implement Mixed Precision Training with GradScaler in PyTorch – PyTorch Tutorial

admin

2 years ago

In this tutorial, we will introduce how to implement mixed precision training with torch.cuda.amp.GradScaler in PyTorch, wich can speed up our training.

How to use torch.cuda.amp.GradScaler?

In pytorch, we can train a model with code below:

optimizer = ...

for epoch in range(...):
    for i, sample in enumerate(dataloader):
        inputs, labels = sample
        optimizer.zero_grad()

	# Forward Pass
        outputs = model(inputs)
        # Compute Loss and Perform Back-propagation
	loss = loss_fn(outputs, labels)
        loss.backward()
	# Update Optimizer
        optimizer.step()

However, if you plan to train a model with mixed precision, we can do as follows:

from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()

        # Runs the forward pass with autocasting.
        with autocast(device_type='cuda', dtype=torch.float16):
            output = model(input)
            loss = loss_fn(output, target)

        # Scales loss.  Calls backward() on scaled loss to create scaled gradients.
        # Backward passes under autocast are not recommended.
        # Backward ops run in the same dtype autocast chose for corresponding forward ops.
        scaler.scale(loss).backward()

        # scaler.step() first unscales the gradients of the optimizer's assigned params.
        # If these gradients do not contain infs or NaNs, optimizer.step() is then called,
        # otherwise, optimizer.step() is skipped.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

Here we use GradScaler() and autocast() to implement.