Post-Net is also called post-network. It has been used in paper: Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions and AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss.
It is comprised of 512 filters with shape 5 × 1 with batch normalization, followed by tanh activations on all but the final layer.
Why use Post-Net?
From paper Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions, we can find: without post-net, our model only obtains a MOS score of 4.429 ± 0.071, compared to 4.526 ± 0.066 with it, meaning that empirically the post-net is still an important part of the network design.
How to build post-net in tensorflow?
In this tutorial, we will use an example to show you how to create post-net.
Here is an example code:
class PostNet(): def output(self, x, filters = 512, trainable = True): ''' :param x: [filter_width, in_channels, out_channels] :return: [batch, out_width, out_channels] ''' layers = 4 # for i in range(layers): x = tf.layers.conv1d(x, filters = filters, kernel_size = 5, use_bias = True, padding = 'same', name = 'postnet_'+str(i)) x = tf.layers.batch_normalization(x, axis=-1, training=trainable, name='postnet_bm'+str(i)) x = tf.tanh(x) # final layer x = tf.layers.conv1d(x, filters=80, kernel_size=5, use_bias=True, padding='same', name='postnet_5') x = tf.layers.batch_normalization(x, axis=-1, training=trainable, name='postnet_bm_5') return x
Post-Net contains 5 conv1d layers. We can use tf.layers.conv1d() to compute.
Understand TensorFlow tf.layers.conv1d() with Examples – TensorFlow Tutorial
Then, we will use tf.layers.batch_normalization() to implement a bath normalization.
A Step Guide to Implement Batch Normalization in TensorFlow – TensorFlow Tutorial
As to input x, it should be: [batch_size, sequence_length, 80]. Because we will find the x will be 80 on axis = -1 from two papers above.
We can evaluate PostNet layer as follows:
import tensorflow as tf import numpy as np inputs = tf.Variable(tf.truncated_normal([15, 512, 80], stddev=0.1), name="inputs") pnet = PostNet() out = pnet.output(inputs) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) a =sess.run(out) print(a.shape)
Run this code, we will get:
(15, 512, 80)
The shape of it is same to input x.