Convert Mel-spectrogram to WAV Audio Using WaveRNN

WaveRNN is a vocoder, it can convert mel-spectrogram to wav file. In this tutorial, we will introduce you how to do.

WaveRNN

WaveRNN is built based on GRU. We can find a tensorflow version here. It can generate waveform from audio mel-spectrogram.

How to convert mel-spectrogram to WAV audio using WaveRNN?

Open run_wavernn.py and remove all @click

In this file, the main function is inference(). In this function, it will do:

read audio data using librosa.load()

Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial

use compute_spectrogram() function to compute mel-spectrogram, we also can use librosa.feature.melspectrogram() to get:

Compute and Display Audio Mel-spectrogram in Python – Python Tutorial

Then, we can call run_wavernn() to create waveform using mel-spectrogram.

Finally, we will use librosa.output.write_wav() to save wave file.

However, you may encounter error: AttributeError: module ‘librosa’ has no attribute ‘output’ , you can find the solution here:

Fix AttributeError: module ‘librosa’ has no attribute ‘output’ – Librosa Tutorial

We can use inference() function as follows:

if __name__ == '__main__':
    wav = r'samples/1221306.wav'
    model = r'models/frozen.pb'
    output = 'wavernn_1.wav'
    inference(wav, model, output)

Run this code, we will create a new wave file. However, the effect of the new wave file my be worse than origin. Because you should fine-tune wavernn model based on your own dataset.

We also can create new wavefrom using Griffin-Lim algorithm. Here is the tutorial:

Convert Mel-spectrogram to WAV Audio Using Griffin-Lim in Python – Python Tutorial