Convolution networks have been used in text classification widely. For example: LSTM+CNN or CNN+LSTM. In this tutorial, we will introduce how to implement a cnn to text classification using tensorflow.
Preliminary
As to text classification, we have got a train batch data.
inputs: the shape of it is [batch_size, sequence_len, embeddings]
For example, the inputs is 64 * 50 * 200, which means we have 64 documents or sentences, each document or sentence contains 50 sentences or words, each sentence or word is 200 dimension.
How to implement CNN in text classification?
We can use tf.nn.conv2d() to implement a convolution operation. Here is a tutorial:
Understand tf.nn.conv2d(): Compute a 2-D Convolution in TensorFlow – TensorFlow Tutorial
As to tf.nn.conv2d(), it needs the shape of inputs is [batch, in_height, in_width, in_channels], the rank of inputs shoud be 4, however, the rank of inputs is 3 in text classification. How to fix this problem?
We need convert the shape of inputs to 4 in text classification.
How to reshape the inputs for tf.nn.conv2d()?
As to source code: https://github.com/dennybritz/cnn-text-classification-tf/blob/master/text_cnn.py
We can find:
the shape of inputs shoud be converted to: 64 * 50 * 200 * 1.
It means the in_height = 50, in_width = 200, in_channels = 1.
How to define filters in tf.nn.conv2d()?
In order to use tf.nn.conv2d() for convolution operation, we should create some filter variables in tensorflow. The shape of filters should be: [filter_height, filter_width, in_channels, out_channels]
A filter can be created as follows:
filter_shape = [filter_size, embedding_size, 1, num_filters] W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") #filter b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
We can notice:
filter_size is the height of a filter, it can be 3, 4, 5….
the width of a filter should be the embedding size. It is 200 in this tutorial.
the in_channels is 1, which is same to inputs.
num_filters is the shape of output. Such as 100, 128, 200, 300….
Then we can use tf.nn.conv2d() for text classification. Here is an example code:
import tensorflow as tf #tf.enable_eager_execution() batch_size = 64 sequence_length = 50 filter_size = 3 embedding_size = 200 num_filters = 200 inputs = tf.Variable(tf.random_uniform([batch_size, sequence_length, embedding_size], -0.01, 0.01)) inputs = tf.expand_dims(inputs, -1) # 64*50*200*1 filter_shape = [filter_size, embedding_size, 1, num_filters] W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") conv = tf.nn.conv2d( inputs, W, strides=[1, 1, 1, 1], padding="VALID", name="conv") h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") print(h)
Run this code, you will get:
Tensor("relu:0", shape=(64, 48, 1, 200), dtype=float32)
We can find the shape of h is 64 * 48 * 1 * 200
48 is 50-3+1
Then we can apply pooling operation on h.
How to apply pooling operation in CNN?
As to \(h\), we can apply a max-pooling on it. You can refer these tutorials:
Understand max-pooling Operation in Neural Networks – Machine Learning Tutorial
Here is an example:
pooled = tf.nn.max_pool( h, ksize=[1, sequence_length - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") print(pooled)
The pooled is:
Tensor("pool:0", shape=(64, 1, 1, 200), dtype=float32)
Then we can use pooled as a feature to classify text.
Meanwhile, we can use multiple filters (3, 4, 5) to get 3 pooled results, then concatenate them to classify text. Here is an example:
import tensorflow as tf import numpy as np class TextCNN(object): """ A CNN for text classification. Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer. """ def __init__( self, sequence_length, num_classes, vocab_size, embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0): # Placeholders for input, output and dropout self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y") self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob") # Keeping track of l2 regularization loss (optional) l2_loss = tf.constant(0.0) # Embedding layer with tf.device('/cpu:0'), tf.name_scope("embedding"): self.W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name="W") self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x) self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1) # Create a convolution + maxpool layer for each filter size pooled_outputs = [] for i, filter_size in enumerate(filter_sizes): with tf.name_scope("conv-maxpool-%s" % filter_size): # Convolution Layer filter_shape = [filter_size, embedding_size, 1, num_filters] # 3*200*1*200 W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") # 200 conv = tf.nn.conv2d( self.embedded_chars_expanded, W, strides=[1, 1, 1, 1], padding="VALID", name="conv") # Apply nonlinearity h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") # Maxpooling over the outputs pooled = tf.nn.max_pool( h, ksize=[1, sequence_length - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") pooled_outputs.append(pooled) # Combine all the pooled features num_filters_total = num_filters * len(filter_sizes) self.h_pool = tf.concat(pooled_outputs, 3) self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total]) # Add dropout with tf.name_scope("dropout"): self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob) # Final (unnormalized) scores and predictions with tf.name_scope("output"): W = tf.get_variable( "W", shape=[num_filters_total, num_classes], initializer=tf.contrib.layers.xavier_initializer()) b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b") l2_loss += tf.nn.l2_loss(W) l2_loss += tf.nn.l2_loss(b) self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores") self.predictions = tf.argmax(self.scores, 1, name="predictions") # Calculate mean cross-entropy loss with tf.name_scope("loss"): losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y) self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss # Accuracy with tf.name_scope("accuracy"): correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1)) self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")