Tutorial Example

Implement Cosine Annealing with Warm up in PyTorch – PyTorch Tutorial

Some papers are using cosine annealing with warm up to train model. For example:

all models are trained with a batch size of 200 and optimised using an Adam optimiser with a weight decay of 2e-5. The learning rate was scheduled via the cosine annealing with warmup restart with a cycle size of 25 epochs, the maximum learning rate of 1e-3 and the decreasing rate of 0.8 for two cycles

In this tutorial, we will introduce how to implement cosine annealing with warm up in pytorch.

Preliminary

We can use source code pytorch-cosine-annealing-with-warmup

You can download it here:

https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup

How to implement cosine annealing with warm up in pytorch?

Here is an example code:

import torch
from matplotlib import pyplot as plt
from cosine_annealing_warmup import CosineAnnealingWarmupRestarts

lr_list = []
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
LR = 0.001
optimizer = torch.optim.Adam(model,lr = LR, weight_decay=2e-5)
scheduler = CosineAnnealingWarmupRestarts(optimizer,
                                          first_cycle_steps=50,
                                          cycle_mult=1.0,
                                          max_lr=LR,
                                          min_lr=5e-5,
                                          warmup_steps=25,
                                          gamma=0.8)
for epoch in range(200):
    data_size = 40
    for i in range(data_size):
        optimizer.zero_grad()
        optimizer.step()
    scheduler.step()
    lr_list.append(optimizer.state_dict()['param_groups'][0]['lr'])

plt.plot(range(200),lr_list,color = 'r')
plt.show()

In this example, we have used Adam optimizer. the max_lr = 0.001, the min_lr = 5e-5, warm_steps = 25.

Run this code, we will see:

If we set warmup_steps = 10, first_cycle_steps = 80, we will see: