Understand Frame Rate of the Mel-spectrogram in Audio

In this tutorial, we will introduce how to compute the frame rate of the mel-spectrogram using python librosa.

You may find this description in some papers:

In our implementation, the frame rate of the mel-spectrogram is 62.5 Hz and the sampling rate of speech waveform is 16 kHz

This sentence contains two questions:

Here we will answer these two question one by one.

How to compute the sampling rate of an audio?

It is easy to get the sampling rate of an audio. Here is the tutorial:

Meanwhile, we also can use librosa.load() to read audio data using a customized sampling rate.

In order to compute mel-spectrogram, we can use librosa.feature.melspectrogram(). Here is the tutorial:

The key parameter is: hop_length

We can use formula below to compute the frame rate of the mel-spectrogram.

frame_rate = sample_rate/hop_length

For example: frame_rate = 62.5, sampling rate = 16 kHz

hop_length = 16000 / 62.5 = 256

It means we will set hop_length = 256 when using librosa.feature.melspectrogram().