Implement L2 or L1 Regularization Loss Using TensorFlow GraphKeys.REGULARIZATION_LOSSES – TensorFlow Tutorial

By | August 5, 2021

In tensorflow, we can use tf.trainable_variables() to list all trainable weights to implement l2 regularization. Here is the tutorial:

Multi-layer Neural Network Implements L2 Regularization in TensorFlow – TensorFLow Tutorial

However, it may be not a good way if you have used some built-in functions in tensorflow. In this tutorial, we will introduce you another way: Using GraphKeys.REGULARIZATION_LOSSES to implement l2 regularization.

Preliminary

In tensorflow, parameter regularizer exists in many tensorflow functions. For example:

tf.compat.v1.get_variable(
    name, shape=None, dtype=None, initializer=None, regularizer=None,
    trainable=None, collections=None, caching_device=None, partitioner=None,
    validate_shape=True, use_resource=None, custom_getter=None, constraint=None,
    synchronization=tf.VariableSynchronization.AUTO,
    aggregation=tf.compat.v1.VariableAggregation.NONE
)

Here regularizer=None.

    tf.layers.conv2d(inputs, filters, kernel_size, 
        strides=(1, 1), 
        padding='valid', 
        data_format='channels_last', 
        dilation_rate=(1, 1),
        activation=None, 
        use_bias=True, 
        kernel_initializer=None,
        bias_initializer=tf.zeros_initializer()
        kernel_regularizer=None,
        bias_regularizer=None, 
        activity_regularizer=None, 
        kernel_constraint=None, 
        bias_constraint=None, 
        trainable=True, 
        name=None,
        reuse=None)

Here kernel_regularizer = None and bias_regularizer = None.

If we use tf.contrib.layers.l2_regularizer(0.0001) to initialize these weights, how to regularize them?

For example:

x = tf.layers.conv2d(input_tensor, filters1, (1, 1),
                         kernel_initializer=tf.orthogonal_initializer(),
                         use_bias=False,
                         trainable=True,
                         kernel_regularizer=tf.contrib.layers.l2_regularizer(weight_decay),
                         name=conv_name_1
                         )

We have set kernel_regularizer=tf.contrib.layers.l2_regularizer(weight_decay), how to get regularization loss of this kernel?

How to get regularization loss GraphKeys.REGULARIZATION_LOSSES?

We should notice: Weights initialized by regularizer will be saved in GraphKeys.REGULARIZATION_LOSSES. We can use it to get regularization loss.

For example:

import tensorflow as tf
import numpy as np

weight_decay = 1e-4
regularizer = tf.contrib.layers.l2_regularizer(weight_decay)
input_tensor = tf.get_variable(shape = [64, 40, 200, 1], regularizer = regularizer, dtype=tf.float32, name = "w1")

x = tf.layers.conv2d(input_tensor, 64, (3, 3),
                         kernel_initializer=tf.orthogonal_initializer(),
                         use_bias=True,
                         trainable=True,
                         kernel_regularizer=regularizer,
                         name = "conv"
                         )

att = tf.nn.relu(x, name="relu")

keys = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
print("list variables in tf.GraphKeys.REGULARIZATION_LOSSES")
for k in keys:
    print(k)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
    sess.run([init, init_local])
    np.set_printoptions(precision=4, suppress=True)
    a =sess.run(att)
    print(a.shape)
    print("list all trainable variables:")
    for n in tf.trainable_variables():
        print(n.name)

In this code, we will list all variables in tf.GraphKeys.REGULARIZATION_LOSSES and all trainable variables.

Run this code, you will get:

list variables in tf.GraphKeys.REGULARIZATION_LOSSES
Tensor("w1/Regularizer/l2_regularizer:0", shape=(), dtype=float32)
Tensor("conv/kernel/Regularizer/l2_regularizer:0", shape=(), dtype=float32)
list all trainable variables:
w1:0
conv/kernel:0
conv/bias:0

In order to get l2 regularization loss, we can use two methods.

If we use tf.GraphKeys.REGULARIZATION_LOSSES to compute, we can do as follows:

keys = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
print("list variables in tf.GraphKeys.REGULARIZATION_LOSSES")
for k in keys:
    print(k)
# compute l2 loss using tf.GraphKeys.REGULARIZATION_LOSSES
loss = tf.add_n(keys)

If we use tf.trainable_variables() , we can do like this:

# compute l2 loss using tf.trainable_variables()
l2_loss = weight_decay * tf.reduce_sum([tf.nn.l2_loss(n) for n in tf.trainable_variables() if 'bias' not in n.name])

We can evaluate results computed by these two methods. Here is the example:

import tensorflow as tf
import numpy as np

weight_decay = 1e-4
regularizer = tf.contrib.layers.l2_regularizer(weight_decay)
input_tensor = tf.get_variable(shape = [64, 40, 200, 1], regularizer = regularizer, dtype=tf.float32, name = "w1")

x = tf.layers.conv2d(input_tensor, 64, (3, 3),
                         kernel_initializer=tf.orthogonal_initializer(),
                         use_bias=True,
                         trainable=True,
                         kernel_regularizer=regularizer,
                         name = "conv"
                         )

att = tf.nn.relu(x, name="relu")

keys = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
print("list variables in tf.GraphKeys.REGULARIZATION_LOSSES")
for k in keys:
    print(k)
# compute l2 loss using tf.GraphKeys.REGULARIZATION_LOSSES
loss = tf.add_n(keys)
# compute l2 loss using tf.trainable_variables()
l2_loss = weight_decay * tf.reduce_sum([tf.nn.l2_loss(n) for n in tf.trainable_variables() if 'bias' not in n.name])

#att = tf.reduce_max(att, axis=-1, keep_dims=True)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
    sess.run([init, init_local])
    np.set_printoptions(precision=4, suppress=True)
    a =sess.run(att)
    print(a.shape)
    loss = sess.run([loss, l2_loss])
    print(loss)
    print("list all trainable variables:")
    for n in tf.trainable_variables():
        print(n.name)

Run this code, we can find the loss is:

[0.0005492301, 0.0005492301]

It means two loss values are the same.

Leave a Reply