Train Multiple Neural Layers with Different Learning Rate – TensorFlow Tutorial

By | October 12, 2021

Sometimes, we need to train a deep learning model with different learning rate. In this tutorial, we will introduce you how to do in tensorflow.

For example:

Train different neural layers with different learning rate

There are 12 layers in our model, we plan to train layer 1 – layer 10 with 2e-5 learning rate, layer 11 – layer 12 with 1e-3 learning rate.

This is common question when you plan to a fine tune a model.

How to train model with different learning rate?

We will use some steps to introduce how to d0.

Step 1: Create two global step variables

global_step = tf.Variable(0, name="g1", trainable=False)
pretraind_model_global_step = tf.Variable(0, name="g2", trainable=False)

Step 2: Get different layers for different learning rate

First, we should get neural network layers for different learning rate.

all_variables = tf.trainable_variables()
pretrained_var_list = [x for x in all_variables if 'layer 11' not in x.name and 'layer 12' not in x.name]
normal_var_list = [x for x in all_variables if 'layer 11' in x.name or 'layer 12' in x.name]

Here pretrained_var_list will trained by 2e-5, normal_var_list will be learned with 1e-3.

Step 3: Create an operation to train

Here is an example:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    normal_optimizer = tf.train.AdamOptimizer(1e-3, name = 'normal_adam')
    normal_grads_and_vars = normal_optimizer.compute_gradients(model.loss, var_list=normal_var_list)
    train_normal_op = normal_optimizer.apply_gradients(normal_grads_and_vars, global_step=global_step)

    pretrained_optimizer = tf.train.AdamOptimizer(2e-5, name = 'pretrained_adam')
    pretrained_grads_and_vars = pretrained_optimizer.compute_gradients(model.loss, var_list=pretrained_var_list)
    train_pretrained_op = pretrained_optimizer.apply_gradients(pretrained_grads_and_vars, global_step=pretraind_model_global_step)

    train_op = tf.group(train_normal_op, train_pretrained_op)

model.loss is the model loss function, we will use tf.group() to merge two training strategies.

Finally, we can use sess.run() to start to train our model.

sess.run(tf.global_variables_initializer())
_, step, loss, accuracy = sess.run(
                    [train_op, global_step, model.loss, model.accuracy], feed_dict)

However, if you are fine tuning an existig model and have used saver.restore() to restore a model, you should use sess.run(tf.global_variables_initializer()) carefully.

Here is a tutorial:

Steps to Load TensorFlow Model Using saver.restore() Correctly – TensorFlow Tutorial

Leave a Reply