By | August 19, 2020

CNN networks are widely used in deep learning, in this tutorial, we will build a cnn network for mnist handwritten digits classification. It will help you understand how to use cnn in deep learning.

The structure of CNN network

The basic structure fo a CNN network looks like:

the structure of convolutional networks

We will use this structure to build a CNN network for mnist handwritten digits classification.

Load MNIST data

We should import MNIST data first.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
import numpy as np
import random

mnist = input_data.read_data_sets(os.getcwd() + "/MNIST-data/", one_hot=True)

Then, we can prepare data that can be used by a cnn network.

Prepare data

x = tf.placeholder(tf.float32, [None, 784])
y_label = tf.placeholder(tf.float32, [None, 10])
X_images = tf.reshape(x, [-1, 28, 28, 1])

This code will make X_images can be passed into a cnn network.

Create some hyperparameters

These hyperparameters can be used in our example.

learning_rate = 1e-3

total_steps = 1000
category_num = 10
steps_per_validate = 15
steps_per_test = 15
batch_size = 64

Then we can start to build a cnn network.

We will use tensorflow tf.nn.conv2d() and tf.nn.max_pool(). You can learn how to use them in these two tutorials.

Understand tf.nn.conv2d(): Compute a 2-D Convolution in TensorFlow

Understand TensorFlow tf.nn.max_pool(): Implement Max Pooling for Convolutional Network

How to implement Convolution+ReLU

Convolution+ReLU is the basic operation of a cnn notwork, we can use tf.nn.conv2d() and tf.nn.relu() to implement it. Here is an example:

conv1_Weights = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1), name='conv1_Weights')
# out_channels = 32
conv1_biases = tf.Variable(tf.constant(0.1, shape=[32]), name='conv1_biases')
#[batch, out_height, out_width, out_channels]
#out_channels = 32
conv1_conv2d = tf.nn.conv2d(X_images, conv1_Weights, strides=[1, 1, 1, 1], padding='SAME') + conv1_biases
conv1_activated = tf.nn.relu(conv1_conv2d)

How to implement pooling

In this tutorial, we will use max_pooling operation, here is an example:

# channels = 32
conv1_pooled = tf.nn.max_pool(conv1_activated, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

Then we can repeat Convolution+ReLU+pooling operations

conv2_Weights = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1), name='conv2_Weights')
conv2_biases = tf.Variable(tf.constant(0.1, shape=[64]), name='conv2_biases')
conv2_conv2d = tf.nn.conv2d(conv1_pooled, conv2_Weights, strides=[1, 1, 1, 1], padding='SAME') + conv2_biases
conv2_activated = tf.nn.relu(conv2_conv2d)
#[batch, out_height, out_width, channels]
# channels = 64
conv2_pooled = tf.nn.max_pool(conv2_activated, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

How to implement Fully Connected operation

A fully connected layer can be created as following:

dim_w1= 7 * 7 * 64
connect1_flat = tf.reshape(conv2_pooled, [-1, dim_w1])
connect1_Weights = tf.Variable(tf.truncated_normal([dim_w1, 1024], stddev=0.1), name='connect1_Weights')
connect1_biases = tf.Variable(tf.constant(0.1, shape=[1024]), name='connect1_biases')
connect1_Wx_plus_b = tf.add(tf.matmul(connect1_flat, connect1_Weights), connect1_biases)
connect1_activated = tf.nn.relu(connect1_Wx_plus_b)

Then we can get final output to classify.

#full connected layer 2
connect2_Weights = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1), name='connect2_Weights')
connect2_biases = tf.Variable(tf.constant(0.1, shape=[10]), name='connect2_biases')
y = tf.add(tf.matmul(connect1_activated, connect2_Weights), connect2_biases)

Make predication

# Loss
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_label, logits=y)
train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

# Prediction
correction_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_label, axis=1))
accuracy = tf.reduce_mean(tf.cast(correction_prediction, tf.float32))

Finally, we can start to train this model

Start to train cnn network

init = tf.global_variables_initializer()
    with tf.Session() as sess:
        test_acc = 0.
        dev_acc = 0.
        better_acc = 0.0
        #set train times
        for step in range(total_steps + 1):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            _, acc =[train, accuracy] , feed_dict={x: batch_x, y_label: batch_y})
            print("train step="+str(step) +" accuracy = " + str(acc))
            if step % steps_per_validate == 0:
                dev_x, dev_y = mnist.validation.images, mnist.validation.labels
                dev_acc =,feed_dict = {x: dev_x, y_label: dev_y})
                print("dev step="+str(step) +" accuracy = " + str(dev_acc))
                if better_acc < dev_acc:
                    test_x, test_y = mnist.test.images, mnist.test.labels
                    test_acc =,feed_dict = {x: test_x, y_label: test_y})
                    print("test step="+str(step) +" accuracy = " + str(test_acc))
                    better_acc = dev_acc
        dev_x, dev_y = mnist.validation.images, mnist.validation.labels
        dev_acc =,feed_dict = {x: dev_x, y_label: dev_y})
except Exception as e:

Run this code, you will get the training result.

