Implement CBHG in Tacotron Using TensorFlow – TensorFlow Tutorial

admin

3 years ago

CBHG is an important model in Tacotron, we will use tensorflow to implement it in this tutorial.

CBHG Model

CBHG looks like:

It contains 1-D convolution bank + highway network + bidirectional GRU.

To create 1-D convolution bank, you can view:

Implement Convolution Bank (ConvBank) in TensorFlow – TensorFlow Tutorial

To create a highway network, you can view:

Implement Highway Networks in TensorFlow: A Step Guide – TensorFlow Tutorial

We should notice: there are 4 layers highway networks in CBHG.

In tensorflow, we can use tf.nn.bidirectional_dynamic_rnn(). Here is the tutorial:

An Introduction to How TensorFlow Bidirectional Dynamic RNN Process Variable Length Sequence – LSTM Tutorial

How to create CBHG in TensorFlow?

Here is an example code:

https://github.com/zuoxiang95/tacotron-1/blob/master/models/modules.py

The function is:

def cbhg(inputs, input_lengths, is_training, scope, K, projections):
  with tf.variable_scope(scope):
    with tf.variable_scope('conv_bank'):
      # Convolution bank: concatenate on the last axis to stack channels from all convolutions
      conv_outputs = tf.concat(
        [conv1d(inputs, k, 128, tf.nn.relu, is_training, 'conv1d_%d' % k) for k in range(1, K+1)],
        axis=-1
      )

    # Maxpooling:
    maxpool_output = tf.layers.max_pooling1d(
      conv_outputs,
      pool_size=2,
      strides=1,
      padding='same')

    # Two projection layers:
    proj1_output = conv1d(maxpool_output, 3, projections[0], tf.nn.relu, is_training, 'proj_1')
    proj2_output = conv1d(proj1_output, 3, projections[1], None, is_training, 'proj_2')

    # Residual connection:
    highway_input = proj2_output + inputs

    # Handle dimensionality mismatch:
    if highway_input.shape[2] != 128:
      highway_input = tf.layers.dense(highway_input, 128)

    # 4-layer HighwayNet:
    for i in range(4):
      highway_input = highwaynet(highway_input, 'highway_%d' % (i+1))
    rnn_input = highway_input

    # Bidirectional RNN
    outputs, states = tf.nn.bidirectional_dynamic_rnn(
      GRUCell(128),
      GRUCell(128),
      rnn_input,
      sequence_length=input_lengths,
      dtype=tf.float32)
    return tf.concat(outputs, axis=2)  # Concat forward and backward

We should notice:

We further pass the processed sequence to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections

The output of max pooling is filled into two projection layers (there is no activation function in the final layer), then be added by residual network.

Look at code below:

    # Two projection layers:
    proj1_output = conv1d(maxpool_output, 3, projections[0], tf.nn.relu, is_training, 'proj_1')
    proj2_output = conv1d(proj1_output, 3, projections[1], None, is_training, 'proj_2')

    # Residual connection:
    highway_input = proj2_output + inputs

As to inputs, the rank of it should be 3.