Tutorial Example

Implement CNN for Text Classification in TensorFLow – TensorFlow Tutorial

Convolution networks have been used in text classification widely. For example: LSTM+CNN or CNN+LSTM. In this tutorial, we will introduce how to implement a cnn to text classification using tensorflow.

Preliminary

As to text classification, we have got a train batch data.

inputs: the shape of it is [batch_size, sequence_len, embeddings]

For example, the inputs is 64 * 50 * 200, which means we have 64 documents or sentences, each document or sentence contains 50 sentences or words, each sentence or word is 200 dimension.

How to implement CNN in text classification?

We can use tf.nn.conv2d() to implement a convolution operation. Here is a tutorial:

Understand tf.nn.conv2d(): Compute a 2-D Convolution in TensorFlow – TensorFlow Tutorial

As to tf.nn.conv2d(), it needs the shape of inputs is [batch, in_height, in_width, in_channels], the rank of inputs shoud be 4, however, the rank of  inputs is 3 in text classification. How to fix this problem?

We need convert the shape of inputs to 4 in text classification.

How to reshape the inputs for tf.nn.conv2d()?

As to source code: https://github.com/dennybritz/cnn-text-classification-tf/blob/master/text_cnn.py

We can find:

the shape of inputs shoud be converted to: 64 * 50 * 200 * 1.

It means the in_height = 50, in_width = 200, in_channels = 1.

How to define filters in tf.nn.conv2d()?

In order to use tf.nn.conv2d() for convolution operation, we should create some filter variables in tensorflow. The shape of filters should be: [filter_height, filter_width, in_channels, out_channels]

A filter can be created as follows:

filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") #filter
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")

We can notice:

filter_size is the height of a filter, it can be 3, 4, 5….

the width of a filter should be the embedding size. It is 200 in this tutorial.

the in_channels is 1, which is same to inputs.

num_filters is the shape of output. Such as 100, 128, 200, 300….

Then we can use tf.nn.conv2d() for text classification. Here is an example code:

import tensorflow as tf 
#tf.enable_eager_execution()
batch_size = 64
sequence_length = 50
filter_size = 3
embedding_size = 200
num_filters = 200
inputs = tf.Variable(tf.random_uniform([batch_size, sequence_length, embedding_size], -0.01, 0.01))
inputs = tf.expand_dims(inputs, -1) # 64*50*200*1

filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
            inputs,
            W,
            strides=[1, 1, 1, 1],
            padding="VALID",
            name="conv")
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
print(h)

Run this code, you will get:

Tensor("relu:0", shape=(64, 48, 1, 200), dtype=float32)

We can find the shape of h is 64 * 48 * 1 * 200

48 is 50-3+1

Then we can apply pooling operation on h.

How to apply pooling operation in CNN?

As to \(h\), we can apply a max-pooling on it. You can refer these tutorials:

Understand max-pooling Operation in Neural Networks – Machine Learning Tutorial

Understand TensorFlow tf.nn.max_pool(): Implement Max Pooling for Convolutional Network – TensorFlow Tutorial

Here is an example:

pooled = tf.nn.max_pool(
                h,
                ksize=[1, sequence_length - filter_size + 1, 1, 1],
                strides=[1, 1, 1, 1],
                padding='VALID',
                name="pool")    
print(pooled)

The pooled is:

Tensor("pool:0", shape=(64, 1, 1, 200), dtype=float32)

Then we can use pooled as a feature to classify text.

Meanwhile, we can use multiple filters (3, 4, 5) to get 3 pooled results, then concatenate them to classify text. Here is an example:

import tensorflow as tf
import numpy as np


class TextCNN(object):
    """
    A CNN for text classification.
    Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer.
    """
    def __init__(
      self, sequence_length, num_classes, vocab_size,
      embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0):

        # Placeholders for input, output and dropout
        self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
        self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
        self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")

        # Keeping track of l2 regularization loss (optional)
        l2_loss = tf.constant(0.0)

        # Embedding layer
        with tf.device('/cpu:0'), tf.name_scope("embedding"):
            self.W = tf.Variable(
                tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
                name="W")
            self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)
            self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

        # Create a convolution + maxpool layer for each filter size
        pooled_outputs = []
        for i, filter_size in enumerate(filter_sizes):
            with tf.name_scope("conv-maxpool-%s" % filter_size):
                # Convolution Layer
                filter_shape = [filter_size, embedding_size, 1, num_filters] # 3*200*1*200
                W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
                b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") # 200
                conv = tf.nn.conv2d(
                    self.embedded_chars_expanded,
                    W,
                    strides=[1, 1, 1, 1],
                    padding="VALID",
                    name="conv")
                # Apply nonlinearity
                h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
                # Maxpooling over the outputs
                pooled = tf.nn.max_pool(
                    h,
                    ksize=[1, sequence_length - filter_size + 1, 1, 1],
                    strides=[1, 1, 1, 1],
                    padding='VALID',
                    name="pool")
                pooled_outputs.append(pooled)

        # Combine all the pooled features
        num_filters_total = num_filters * len(filter_sizes)
        self.h_pool = tf.concat(pooled_outputs, 3)
        self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])

        # Add dropout
        with tf.name_scope("dropout"):
            self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)

        # Final (unnormalized) scores and predictions
        with tf.name_scope("output"):
            W = tf.get_variable(
                "W",
                shape=[num_filters_total, num_classes],
                initializer=tf.contrib.layers.xavier_initializer())
            b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
            l2_loss += tf.nn.l2_loss(W)
            l2_loss += tf.nn.l2_loss(b)
            self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
            self.predictions = tf.argmax(self.scores, 1, name="predictions")

        # Calculate mean cross-entropy loss
        with tf.name_scope("loss"):
            losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y)
            self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss

        # Accuracy
        with tf.name_scope("accuracy"):
            correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
            self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")