# Implement CNN for Text Classification in TensorFLow – TensorFlow Tutorial

By | March 1, 2021

Convolution networks have been used in text classification widely. For example: LSTM+CNN or CNN+LSTM. In this tutorial, we will introduce how to implement a cnn to text classification using tensorflow.

## Preliminary

As to text classification, we have got a train batch data.

inputs: the shape of it is [batch_size, sequence_len, embeddings]

For example, the inputs is 64 * 50 * 200, which means we have 64 documents or sentences, each document or sentence contains 50 sentences or words, each sentence or word is 200 dimension.

## How to implement CNN in text classification?

We can use tf.nn.conv2d() to implement a convolution operation. Here is a tutorial:

Understand tf.nn.conv2d(): Compute a 2-D Convolution in TensorFlow – TensorFlow Tutorial

As to tf.nn.conv2d(), it needs the shape of inputs is [batch, in_height, in_width, in_channels], the rank of inputs shoud be 4, however, the rank of  inputs is 3 in text classification. How to fix this problem?

We need convert the shape of inputs to 4 in text classification.

### How to reshape the inputs for tf.nn.conv2d()?

As to source code: https://github.com/dennybritz/cnn-text-classification-tf/blob/master/text_cnn.py

We can find:

the shape of inputs shoud be converted to: 64 * 50 * 200 * 1.

It means the in_height = 50, in_width = 200, in_channels = 1.

### How to define filters in tf.nn.conv2d()?

In order to use tf.nn.conv2d() for convolution operation, we should create some filter variables in tensorflow. The shape of filters should be: [filter_height, filter_width, in_channels, out_channels]

A filter can be created as follows:

filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") #filter
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")


We can notice:

filter_size is the height of a filter, it can be 3, 4, 5….

the width of a filter should be the embedding size. It is 200 in this tutorial.

the in_channels is 1, which is same to inputs.

num_filters is the shape of output. Such as 100, 128, 200, 300….

Then we can use tf.nn.conv2d() for text classification. Here is an example code:

import tensorflow as tf
#tf.enable_eager_execution()
batch_size = 64
sequence_length = 50
filter_size = 3
embedding_size = 200
num_filters = 200
inputs = tf.Variable(tf.random_uniform([batch_size, sequence_length, embedding_size], -0.01, 0.01))
inputs = tf.expand_dims(inputs, -1) # 64*50*200*1

filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
inputs,
W,
strides=[1, 1, 1, 1],
name="conv")
print(h)

Run this code, you will get:

Tensor("relu:0", shape=(64, 48, 1, 200), dtype=float32)

We can find the shape of h is 64 * 48 * 1 * 200

48 is 50-3+1

Then we can apply pooling operation on h.

### How to apply pooling operation in CNN?

As to $$h$$, we can apply a max-pooling on it. You can refer these tutorials:

Understand max-pooling Operation in Neural Networks – Machine Learning Tutorial

Understand TensorFlow tf.nn.max_pool(): Implement Max Pooling for Convolutional Network – TensorFlow Tutorial

Here is an example:

pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_length - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
name="pool")
print(pooled)

The pooled is:

Tensor("pool:0", shape=(64, 1, 1, 200), dtype=float32)

Then we can use pooled as a feature to classify text.

Meanwhile, we can use multiple filters (3, 4, 5) to get 3 pooled results, then concatenate them to classify text. Here is an example:

import tensorflow as tf
import numpy as np

class TextCNN(object):
"""
A CNN for text classification.
Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer.
"""
def __init__(
self, sequence_length, num_classes, vocab_size,
embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0):

# Placeholders for input, output and dropout
self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")

# Keeping track of l2 regularization loss (optional)
l2_loss = tf.constant(0.0)

# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
self.W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

# Create a convolution + maxpool layer for each filter size
pooled_outputs = []
for i, filter_size in enumerate(filter_sizes):
with tf.name_scope("conv-maxpool-%s" % filter_size):
# Convolution Layer
filter_shape = [filter_size, embedding_size, 1, num_filters] # 3*200*1*200
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") # 200
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
name="conv")
# Apply nonlinearity
# Maxpooling over the outputs
pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_length - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
name="pool")
pooled_outputs.append(pooled)

# Combine all the pooled features
num_filters_total = num_filters * len(filter_sizes)
self.h_pool = tf.concat(pooled_outputs, 3)
self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])

with tf.name_scope("dropout"):
self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)

# Final (unnormalized) scores and predictions
with tf.name_scope("output"):
W = tf.get_variable(
"W",
shape=[num_filters_total, num_classes],
initializer=tf.contrib.layers.xavier_initializer())
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
l2_loss += tf.nn.l2_loss(W)
l2_loss += tf.nn.l2_loss(b)
self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
self.predictions = tf.argmax(self.scores, 1, name="predictions")

# Calculate mean cross-entropy loss
with tf.name_scope("loss"):
losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y)
self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss

# Accuracy
with tf.name_scope("accuracy"):
correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")