Post-Net is also called post-network. It has been used in paper: * Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions* and

*.*

**AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss**It is comprised of 512 filters with shape 5 × 1 with batch normalization, followed by tanh activations on all but the final layer.

## Why use Post-Net?

From paper ** Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions**, we can find: without post-net, our model only obtains a

**MOS**score of 4.429 ± 0.071, compared to 4.526 ± 0.066 with it, meaning that empirically the post-net is still an important part of the network design.

## How to build post-net in tensorflow?

In this tutorial, we will use an example to show you how to create post-net.

Here is an example code:

class PostNet(): def output(self, x, filters = 512, trainable = True): ''' :param x: [filter_width, in_channels, out_channels] :return: [batch, out_width, out_channels] ''' layers = 4 # for i in range(layers): x = tf.layers.conv1d(x, filters = filters, kernel_size = 5, use_bias = True, padding = 'same', name = 'postnet_'+str(i)) x = tf.layers.batch_normalization(x, axis=-1, training=trainable, name='postnet_bm'+str(i)) x = tf.tanh(x) # final layer x = tf.layers.conv1d(x, filters=80, kernel_size=5, use_bias=True, padding='same', name='postnet_5') x = tf.layers.batch_normalization(x, axis=-1, training=trainable, name='postnet_bm_5') return x

Post-Net contains 5 conv1d layers. We can use tf.layers.conv1d() to compute.

Understand TensorFlow tf.layers.conv1d() with Examples – TensorFlow Tutorial

Then, we will use tf.layers.batch_normalization() to implement a bath normalization.

A Step Guide to Implement Batch Normalization in TensorFlow – TensorFlow Tutorial

As to input x, it should be: [batch_size, sequence_length, 80]. Because we will find the x will be 80 on axis = -1 from two papers above.

We can evaluate PostNet layer as follows:

import tensorflow as tf import numpy as np inputs = tf.Variable(tf.truncated_normal([15, 512, 80], stddev=0.1), name="inputs") pnet = PostNet() out = pnet.output(inputs) init = tf.global_variables_initializer() init_local = tf.local_variables_initializer() with tf.Session() as sess: sess.run([init, init_local]) np.set_printoptions(precision=4, suppress=True) a =sess.run(out) print(a.shape)

Run this code, we will get:

(15, 512, 80)

The shape of it is same to input x.