Understand torch.optim.lr_scheduler.CyclicLR() with Examples – PyTorch Tutorial

By | September 9, 2022

Cyclical learning rate policy (CLR) is proposed in paper: Cyclical Learning Rates for Training Neural Networks. In this tutorial, we will use some examples to show you how to use it.

torch.optim.lr_scheduler.CyclicLR()

It is defined as:

torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=- 1, verbose=False)

It will set the learning rate of each parameter group according to cyclical learning rate policy (CLR).

We should notice: Cyclical learning rate policy changes the learning rate after every batch. step should be called after a batch has been used for training.

We can view image below to understand this optimizer.

understand cyclical learning rate policy

Here are some important parameters we should concern.

base_lr: Initial learning rate which is the lower boundary in the cycle for each parameter group

max_lr: Upper learning rate boundaries in the cycle for each parameter group.

step_size_up: Number of training iterations in the increasing half of a cycle

mode: One of {triangular, triangular2, exp_range}

However, how to set values for these parameters.

As to max_lr, we can set it to the optimizer learning rate, for example: 1e-3

As to base_lr, it can be 1/3 or 1/4 of max_lr, we can see how to set it in paper.

As to step_size_up, we should notice the iterations in each epoch. It can be 2 − 10 times the number of iterations in an epoch.

Here we will use an example to show how it change the learning rate of Adam.

import torch
from matplotlib import pyplot as plt
lr_list = []
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
LR = 0.001
optimizer = torch.optim.Adam(model, lr=LR)
max_lr = 1e-3
min_lr = max_lr / 4
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=min_lr, max_lr=max_lr, cycle_momentum = False, mode='triangular2', step_size_up=17000*6)
for epoch in range(1,80):
    data_size = 17000
    for i in range(data_size):
        optimizer.zero_grad()
        #loss
        optimizer.step()
        lr_list.append(optimizer.state_dict()['param_groups'][0]['lr'])
        scheduler.step()

plt.plot(range(len(lr_list)),lr_list,color = 'r')
plt.show()

Run this code, we will see the learning rate as follows:

Understand torch.optim.lr_scheduler.CyclicLR() with Examples - PyTorch Tutorial

We should notice: we should use scheduler.step() in the batch iteration, which is different from torch.optim.lr_scheduler.StepLR().

Understand torch.optim.lr_scheduler.StepLR() with Examples – PyTorch Tutorial

torch.optim.lr_scheduler.CyclicLR()