Implement Convolution Bank (ConvBank) in TensorFlow – TensorFlow Tutorial

By | March 16, 2022

Convolution Bank (ConvBank) is proposed in paper: TACOTRON: TOWARDS ENDTO-END SPEECH SYNTHESIS. In this tutorial, we will introduce how to implement it using tensorflow.

Convolution Bank (ConvBank)

Convolution Bank is a 1-D convolutional networks, it contains K sets of 1-D convolutional filters, where the k-th set contains \(C_k\) filters of width k (i.e. k = 1; 2; : : : ; K), K = 8 in paper.

Convolution Bank filters explicitly model local and contextual information (akin to modeling unigrams, bigrams, up to K-grams). The convolution outputs are stacked together and further max pooled along time to increase local invariances.Note that we use a stride of 1 to preserve the original time resolution. Batch normalization (Ioffe & Szegedy, 2015) is used for all convolutional layers

Implement Convolution Bank (ConvBank) in TensorFlow - TensorFlow Tutorial

How to implement ConvBank in tensorflow?

We will use an example to show you how to do.

Step 1: convert 1-D convolutional layer with batch normalization

Here is the code:

import tensorflow as tf

def conv1d(inputs, kernel_size, channels, activation, is_training, scope):
  with tf.variable_scope(scope):
    conv1d_output = tf.layers.conv1d(
      inputs,
      filters=channels,
      kernel_size=kernel_size,
      activation=activation,
      padding='same')
    return tf.layers.batch_normalization(conv1d_output, training=is_training)

Here is some useful resources:

Understand TensorFlow tf.layers.conv1d() with Examples – TensorFlow Tutorial

A Step Guide to Implement Batch Normalization in TensorFlow – TensorFlow Tutorial

Step 2: create convolution bank with max pooling

#inputs: [N, T, C], 3 dim
def convbank(inputs, K = 8, is_training = True, scope = 'conv_bank'):
    with tf.variable_scope(scope):
        # Convolution bank: concatenate on the last axis to stack channels from all convolutions
        # conv_outputs = [N, T, k*128]
        conv_outputs = tf.concat(
            [conv1d(inputs, k, 128, tf.nn.relu, is_training, 'conv1d_%d' % k) for k in range(1, K + 1)],
            axis=-1
        )

        # Maxpooling:[N, T, k*128]
        maxpool_output = tf.layers.max_pooling1d(
            conv_outputs,
            pool_size=2,
            strides=1,
            padding='same')
        return maxpool_output

To understand, you can view tf.layers.max_pooling1d():

Understand tf.layers.max_pooling1d(): Max Pooling Layer for 1D Inputs – TensorFlow Tutorial

From this code, we can find the inputs is rank 3. the maxpool_output is also rank 3, the dimension of last axis is k*128.

We can use code below to test convbank.

w = tf.Variable(tf.glorot_uniform_initializer()([4, 50, 200]), name="w")

convbank_maxpool_out = convbank(w, K = 8, is_training = True, scope = 'conv_bank')
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    x = sess.run(convbank_maxpool_out)
    print(x.shape)
    print(x)

Run this code, we will get:

(4, 50, 1024)
[[[ 1.0207694e+00  2.9554680e-01  2.2016469e-01 ... -1.3390291e-01
    1.3971468e+00 -4.3614873e-01]
  [ 1.0207694e+00  2.9967129e-03  8.5925090e-01 ... -4.4449669e-01
   -4.6057454e-01  2.5842670e-01]
  [ 2.5612593e+00  4.2234072e-01  1.5048288e+00 ...  3.2633919e-01
    3.8881072e-01  2.5842670e-01]
  ...
  [-4.0765771e-01 -4.0997037e-01  3.8271800e-01 ...  1.2461096e-02
    1.2668070e-01  3.6472526e-01]
  [-4.0765771e-01 -4.0997037e-01  3.8271800e-01 ...  1.2461096e-02
   -2.2410196e-01  3.6472526e-01]
  [-4.0765771e-01 -4.0997037e-01 -4.4883779e-01 ... -4.4449669e-01
   -2.2410196e-01  3.5127762e-01]]
  ...
  [-4.0765771e-01  1.2246618e-01  1.5277933e+00 ...  1.3210013e+00
    3.5866559e-02  1.7703870e-01]
  [-4.0765771e-01  1.2246618e-01 -2.5711954e-03 ...  1.3210013e+00
    3.5866559e-02  1.7703870e-01]
  [-4.0765771e-01 -4.0997037e-01 -2.5711954e-03 ... -4.4449669e-01
   -4.6057454e-01 -4.3614873e-01]]]

The inputs w is [4, 50, 200], the final output is [4, 50, 1024], 1024 = 8 * 128 where k = 8.

Leave a Reply