Convert Mel-spectrogram to WAV Audio Using Griffin-Lim in Python – Python Tutorial

By | March 8, 2022

In python, we can use vocoder to convert mel-spectrogram to WAV audio, for example: wavenet, wavernn, fftnet or griffin-lim et al. In this tutorial, we will introduce how to use griffin-lim to convert in python.

librosa.feature.inverse.mel_to_audio()

This function is defined as:

librosa.feature.inverse.mel_to_audio(M, *, sr=22050, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, pad_mode='constant', power=2.0, n_iter=32, length=None, dtype=<class 'numpy.float32'>, **kwargs)

It can invert a mel power spectrogram to audio using Griffin-Lim.

We will use an example to show you how to it correctly.

How to cnvert mel-spectrogram to WAV audio using Griffin-Lim?

We should get an wav audio mel-spectrogram. Here is the tutorial:

Compute and Display Audio Mel-spectrogram in Python – Python Tutorial

An example code is below:

import librosa
import soundfile
#
wav_file = r'F:\1221306.wav'
wav_data, sr = librosa.load(wav_file, sr=22050, mono=True)
print(wav_data.shape)
hop_length = 275 # 0.0125 * 22050
win_length = 1100 # 0.05 * 22050

mel = librosa.feature.melspectrogram(wav_data, sr=sr, n_fft=2048, hop_length=hop_length, win_length=win_length)
print(mel)
print(mel.shape)

We should notice: hop_length and win_length is set based on time and sample rate.

Run this code, we will get:

(1405757,)
(128, 5112)

Then, we will star to convert.

wav_data_2 = librosa.feature.inverse.mel_to_audio(mel, sr=22050, n_fft=2048, hop_length=hop_length, win_length=win_length)
saved_file = '1221306-1.wav'
soundfile.write(saved_file, wav_data_2, 22050)

Run this code, we will convert mel-spectrogram to wav file, however, we also can find this function will take a long time to convert.

Leave a Reply