Change Learning Rate By Global Step When Training Model in TensorFlow

When we are reading paper, we may find this training method:

We use the Adam optimizer with learning rate decay, which starts from 0.001 and is reduced to 0.0005, 0.0003, and 0.0001 after 500K, 1M and 2M global steps, respectively

It means we will change learning rate by global steps. In this tutorial, we will introduce you how to implement it in tensorflow.

Adam optimizer in tensorflow

We can create an adam optimizer using :

optimizer = tf.train.AdamOptimizer(lr)

Here lr is our learning rate.

How to change learning rate by global step in tensorflow?

In order to change learning rate by training step, we should pass lr value by tf.placeholder(). Here is an example:

import tensorflow as tf
import tensorflow as tf
import numpy as np

g = tf.Graph()
with g.as_default():
    session_config = tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=False
    )
    session_config.gpu_options.allow_growth = True
    sess = tf.Session(config=session_config, graph=g)

    with sess.as_default():
        batch = tf.placeholder(shape=[None, 40], dtype=tf.float32)  # input batch (time x batch x n_mel)
        lr = tf.placeholder(dtype=tf.float32)  # learning rate

        w = tf.layers.dense(batch, 10)

        loss = tf.reduce_mean(w)
        # Define Training procedure
        global_step = tf.Variable(0, name="global_step", trainable=False)
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            optimizer = tf.train.AdamOptimizer(lr)
            grads_and_vars = optimizer.compute_gradients(loss)
            train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

        sess.run(tf.global_variables_initializer())

        # start train
        rate = 0.1
        data = np.random.normal(size = [3, 40])
        for i in range(1000):
            current_step = tf.train.global_step(sess, global_step)
            if current_step<100:
                rate = 0.1
            elif current_step < 200:
                rate = 0.05
            elif current_step < 300:
                rate = 0.01
            else:
                rate = 0.001
            feed_dict = {
               batch: data,
               lr: rate
            }
            _, step, ls, lrate = sess.run([train_op, global_step, loss, optimizer._lr], feed_dict)
            print(step, "=", lrate)

In this example, we will pass different learning rate rate to tensorflow model lr = tf.placeholder(dtype=tf.float32)

We can use

current_step = tf.train.global_step(sess, global_step)

to get global training step.

Run this code, we will see: