Understand tf.nn.conv2d(): Compute a 2-D Convolution in TensorFlow

TensorFlow tf.nn.conv2d() function is widely used to build a convolution network in deep learning. In this tutorial, we will use some examples to show how to use it correctly.

Syntax

tf.nn.conv2d() is defined as:

tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=True,
    data_format='NHWC',
    dilations=[1, 1, 1, 1],
    name=None
)

It computes a 2-D convolution given 4-D input and filter tensors.

Parameters

input: The shape of it should be [batch, in_height, in_width, in_channels].

If it represents image data, batch will be the image batch size, in_height will be the height of image, in_width will be the width of image, in_channels will be the image color channels, such as r, g, b.

filter: The shape of it should be [filter_height, filter_width, in_channels, out_channels].

You should notice: the value of in_channels in input and filter are the same.

strides: It shoud be [1,stride,stride,1]. It represents the stride of the sliding window for each dimension of input. The dimension order is determined by the value of data_format.

data_format: It can be NHWC or NCHW, default is NHWC. It determines the dimension of input and strides.

NHWC: It means the input = [batch, in_height, in_width, in_channels], strides = [1,stride,stride,1]

NCHW: It means the input = [batch, in_channels, in_height, in_width], strides = [1, 1, stride,stride]

padding: It can be SAME or VALID. The type of padding algorithm to use.

To know the difference between SAME and VALID, you can read:

Understand the Difference Between ‘SAME’ and ‘VALID’ Padding in Convolution Networks

dilations: Defaults to [1, 1, 1, 1]. The dilation factor for each dimension of input. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format.

Return

tf.nn.conv2d() will return a tensor with the shape [batch, out_height, out_width, out_channels ], out_height and out_width is determinded by filter, strides, padding and dilations.

In order to know how to determine out_height and out_width, you can read:

Understand the Shape of Tensor Returned by tf.nn.conv2d()

Then we will some examples to show how to use tf.nn.conv2d() .

How to use tf.nn.conv2d() ?

Look at this example code:

import tensorflow as tf
input = tf.Variable(tf.constant(1.0, shape=[1, 5, 5, 1]))
filter = tf.Variable(tf.constant([-1.0, 0, 0, -1], shape=[2, 2, 1, 1]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    print("op:\n",sess.run(op))

In this example, we can find:

As to input:

batch= 1, in_height = 5, in_width = 5, in_channels = 1

As to filter:

filter_height = 2, filter_width = 2, in_channels = 1, out_channels =1

We can find the shape of op may be: [1, out_height, out_width, 1]

Run thid code, we can find the op will be:

op:
 [[[[-2.]
   [-2.]
   [-1.]]

  [[-2.]
   [-2.]
   [-1.]]

  [[-1.]
   [-1.]
   [-1.]]]]

The process is:

If we set padding=’VALID’

The op will be:

op:
 [[[[-2.]
   [-2.]]

  [[-2.]
   [-2.]]]]

The last row and column will be dropped.

If the shape of filter = [2, 2, 1, 2]

filter = tf.Variable(tf.constant([-1.0, 0, 0, -1], shape=[2, 2, 1, 2]))
op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='VALID')

the shape of op may be: [1, out_height, out_width, 2]

op:
 [[[[-3. -3.]
   [-3. -3.]]

  [[-3. -3.]
   [-3. -3.]]]]

Understand tf.nn.conv2d(): Compute a 2-D Convolution in TensorFlow – TensorFlow Tutorial

Syntax

Parameters

Return

How to use tf.nn.conv2d() ?

Leave a Reply Cancel reply