TensorFlow tf.nn.conv2d() function is widely used to build a convolution network in deep learning. In this tutorial, we will use some examples to show how to use it correctly.
Syntax
tf.nn.conv2d() is defined as:
tf.nn.conv2d( input, filter, strides, padding, use_cudnn_on_gpu=True, data_format='NHWC', dilations=[1, 1, 1, 1], name=None )
It computes a 2-D convolution given 4-D input and filter tensors.
Parameters
input: The shape of it should be [batch, in_height, in_width, in_channels].
If it represents image data, batch will be the image batch size, in_height will be the height of image, in_width will be the width of image, in_channels will be the image color channels, such as r, g, b.
filter: The shape of it should be [filter_height, filter_width, in_channels, out_channels].
You should notice: the value of in_channels in input and filter are the same.
strides: It shoud be [1,stride,stride,1]. It represents the stride of the sliding window for each dimension of input. The dimension order is determined by the value of data_format.
data_format: It can be NHWC or NCHW, default is NHWC. It determines the dimension of input and strides.
NHWC: It means the input = [batch, in_height, in_width, in_channels], strides = [1,stride,stride,1]
NCHW: It means the input = [batch, in_channels, in_height, in_width], strides = [1, 1, stride,stride]
padding: It can be SAME or VALID. The type of padding algorithm to use.
To know the difference between SAME and VALID, you can read:
Understand the Difference Between ‘SAME’ and ‘VALID’ Padding in Convolution Networks
dilations: Defaults to [1, 1, 1, 1]. The dilation factor for each dimension of input. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format.
Return
tf.nn.conv2d() will return a tensor with the shape [batch, out_height, out_width, out_channels ], out_height and out_width is determinded by filter, strides, padding and dilations.
In order to know how to determine out_height and out_width, you can read:
Understand the Shape of Tensor Returned by tf.nn.conv2d()
Then we will some examples to show how to use tf.nn.conv2d() .
How to use tf.nn.conv2d() ?
Look at this example code:
import tensorflow as tf input = tf.Variable(tf.constant(1.0, shape=[1, 5, 5, 1])) filter = tf.Variable(tf.constant([-1.0, 0, 0, -1], shape=[2, 2, 1, 1])) op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME') init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) print("op:\n",sess.run(op))
In this example, we can find:
As to input:
batch= 1, in_height = 5, in_width = 5, in_channels = 1
As to filter:
filter_height = 2, filter_width = 2, in_channels = 1, out_channels =1
We can find the shape of op may be: [1, out_height, out_width, 1]
Run thid code, we can find the op will be:
op: [[[[-2.] [-2.] [-1.]] [[-2.] [-2.] [-1.]] [[-1.] [-1.] [-1.]]]]
The process is:
If we set padding=’VALID’
The op will be:
op: [[[[-2.] [-2.]] [[-2.] [-2.]]]]
The last row and column will be dropped.
If the shape of filter = [2, 2, 1, 2]
filter = tf.Variable(tf.constant([-1.0, 0, 0, -1], shape=[2, 2, 1, 2])) op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='VALID')
the shape of op may be: [1, out_height, out_width, 2]
op: [[[[-3. -3.] [-3. -3.]] [[-3. -3.] [-3. -3.]]]]