Implement Pre-net in Tacotron Using TensorFlow – TensorFlow Tutorial

admin

3 years ago

Pre-net is an important part of Tacotron, in this tutorial, we will introduce you how to create it using tensorflow.

Pre-net in Tacotron

The structure of Tacotron looks like:

We can find Pre-net is the first layer to process inputs.

From paper: TACOTRON: TOWARDS END–TO-END SPEECH SYNTHESIS, we can find:

Pre-net is: FC-256-ReLU->Dropout(0.5) ->FC-128-ReLU->Dropout(0.5)

How to create Pre-net using tensorflow?

We will refer to this code to create.

https://github.com/zuoxiang95/tacotron-1/blob/master/models/modules.py

Here is an Pre-net source code:

def prenet(inputs, drop_rate = 0.5, layer_sizes=[256, 128], scope=None):
  x = inputs
  with tf.variable_scope(scope or 'prenet'):
    for i, size in enumerate(layer_sizes):
      dense = tf.layers.dense(x, units=size, activation=tf.nn.relu, name='dense_%d' % (i+1))
      x = tf.layers.dropout(dense, rate=drop_rate, name='dropout_%d' % (i+1))
  return x

We will evaluate it.

w = tf.Variable(tf.glorot_uniform_initializer()([32, 128]), name="w")

v = prenet(w, scope = 'prenet')
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    x = sess.run(v)
    print(x.shape)
    print(x)

Run this code, we will see:

(32, 128)
[[0.10063383 0.00535887 0.         ... 0.         0.         0.07058308]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.08158886 0.         ... 0.04584407 0.         0.04442355]
 ...
 [0.0752174  0.         0.         ... 0.02113288 0.         0.02786144]
 [0.         0.         0.         ... 0.01715038 0.         0.04262433]
 [0.         0.03768676 0.         ... 0.13155384 0.         0.09612481]]

We should notice: when we test our mode, we should set drop_rate = 0.0. Because weights in tf.layers.dropout() is scaled by 1 / (1 - drop_rate )