Understand PyTorch optimizer.param_groups with Examples – PyTorch Tutorial

By | April 20, 2022

In this tutorial, we will introduce pytorch optimizer.param_groups. After learning this tutorial, you can control python optimizer easily.

PyTorch optimizer

There are some optimizers in pytorch, for example: Adam, SGD. It is easy to create an optimizer. For example:

optimizer = torch.optim.Adam(model.parameters())

By this code, we created an Adam optimizer.

What is optimizer.param_groups?

We will use an example to introduce.

For example:

import torch
import numpy as np
class CustomNN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        pass
model = CustomNN()
all_params = model.parameters()
print(type(all_params))
optimizer = torch.optim.Adam(model.parameters())

print(optimizer.param_groups)

Run this code, we will see:

<class 'generator'>
[{'params': [Parameter containing:
tensor(0.9417, requires_grad=True), Parameter containing:
tensor(0.7757, requires_grad=True)], 'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}]

We can find optimizer.param_groups is a python list, which contains a dictionary.

As to this example, it is:

params: contains all parameters will be update by gradients.

lr: current learning rate

betas: (0.9, 0.999)

eps: 1e-08

weight_decay: 0

How to use optimizer.param_groups?

By optimizer.param_groups, we can control current optimizer.

For example, we can change learning rate by train steps.

self.step_num += 1
if self.step_num > self.warmup_steps:
    self.lr = self.max_lr * np.exp(-1.0 * self.k * (self.step_num - self.warmup_steps))
    self.lr = max(self.lr, self.min_lr)
    for param_group in self.optimizer.param_groups:
        param_group['lr'] = self.lr
self.optimizer.step()

In this example, we can use param_group[‘lr’] = self.lr to change current learing rate.

As to Adam optimizer step() function, it will run:

F.adam(params_with_grad,
                   grads,
                   exp_avgs,
                   exp_avg_sqs,
                   max_exp_avg_sqs,
                   state_steps,
                   amsgrad=group['amsgrad'],
                   beta1=beta1,
                   beta2=beta2,
                   lr=group['lr'],
                   weight_decay=group['weight_decay'],
                   eps=group['eps'])

We can find group[‘lr’] will passed into F.adam(), which means we can change value in optimizer.param_groups to control optimizer.