In this tutorial, we will introduce pytorch optimizer.param_groups. After learning this tutorial, you can control python optimizer easily.
PyTorch optimizer
There are some optimizers in pytorch, for example: Adam, SGD. It is easy to create an optimizer. For example:
optimizer = torch.optim.Adam(model.parameters())
By this code, we created an Adam optimizer.
What is optimizer.param_groups?
We will use an example to introduce.
For example:
import torch import numpy as np class CustomNN(torch.nn.Module): def __init__(self): super().__init__() self.a = torch.nn.Parameter(torch.randn(())) self.b = torch.nn.Parameter(torch.randn(())) def forward(self, x): pass model = CustomNN() all_params = model.parameters() print(type(all_params)) optimizer = torch.optim.Adam(model.parameters()) print(optimizer.param_groups)
Run this code, we will see:
<class 'generator'> [{'params': [Parameter containing: tensor(0.9417, requires_grad=True), Parameter containing: tensor(0.7757, requires_grad=True)], 'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}]
We can find optimizer.param_groups is a python list, which contains a dictionary.
As to this example, it is:
params: contains all parameters will be update by gradients.
lr: current learning rate
betas: (0.9, 0.999)
eps: 1e-08
weight_decay: 0
How to use optimizer.param_groups?
By optimizer.param_groups, we can control current optimizer.
For example, we can change learning rate by train steps.
self.step_num += 1 if self.step_num > self.warmup_steps: self.lr = self.max_lr * np.exp(-1.0 * self.k * (self.step_num - self.warmup_steps)) self.lr = max(self.lr, self.min_lr) for param_group in self.optimizer.param_groups: param_group['lr'] = self.lr self.optimizer.step()
In this example, we can use param_group[‘lr’] = self.lr to change current learing rate.
As to Adam optimizer step() function, it will run:
F.adam(params_with_grad, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, amsgrad=group['amsgrad'], beta1=beta1, beta2=beta2, lr=group['lr'], weight_decay=group['weight_decay'], eps=group['eps'])
We can find group[‘lr’] will passed into F.adam(), which means we can change value in optimizer.param_groups to control optimizer.