Some papers are using cosine annealing with warm up to train model. For example:
all models are trained with a batch size of 200 and optimised using an Adam optimiser with a weight decay of 2e-5. The learning rate was scheduled via the cosine annealing with warmup restart with a cycle size of 25 epochs, the maximum learning rate of 1e-3 and the decreasing rate of 0.8 for two cycles
In this tutorial, we will introduce how to implement cosine annealing with warm up in pytorch.
Preliminary
We can use source code pytorch-cosine-annealing-with-warmup
You can download it here:
https://github.com/katsura-jp/pytorch-cosine-annealing-with-warmup
How to implement cosine annealing with warm up in pytorch?
Here is an example code:
import torch from matplotlib import pyplot as plt from cosine_annealing_warmup import CosineAnnealingWarmupRestarts lr_list = [] model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))] LR = 0.001 optimizer = torch.optim.Adam(model,lr = LR, weight_decay=2e-5) scheduler = CosineAnnealingWarmupRestarts(optimizer, first_cycle_steps=50, cycle_mult=1.0, max_lr=LR, min_lr=5e-5, warmup_steps=25, gamma=0.8) for epoch in range(200): data_size = 40 for i in range(data_size): optimizer.zero_grad() optimizer.step() scheduler.step() lr_list.append(optimizer.state_dict()['param_groups'][0]['lr']) plt.plot(range(200),lr_list,color = 'r') plt.show()
In this example, we have used Adam optimizer. the max_lr = 0.001, the min_lr = 5e-5, warm_steps = 25.
Run this code, we will see:
If we set warmup_steps = 10, first_cycle_steps = 80, we will see: