In this tutorial, we will introduce how to compute the frame rate of the mel-spectrogram using python librosa.
You may find this description in some papers:
In our implementation, the frame rate of the mel-spectrogram is 62.5 Hz and the sampling rate of speech waveform is 16 kHz
This sentence contains two questions:
- 1.How to compute the sampling rate of an audio?
- 2.How to compute the frame rate of the mel-spectrogram?
Here we will answer these two question one by one.
How to compute the sampling rate of an audio?
It is easy to get the sampling rate of an audio. Here is the tutorial:
View Audio Sample Rate, Data Format PCM or ALAW Using ffprobe – Python Tutorial
Meanwhile, we also can use librosa.load() to read audio data using a customized sampling rate.
Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial
How to compute the frame rate of the mel-spectrogram?
In order to compute mel-spectrogram, we can use librosa.feature.melspectrogram(). Here is the tutorial:
Compute and Display Audio Mel-spectrogram in Python – Python Tutorial
The key parameter is: hop_length
We can use formula below to compute the frame rate of the mel-spectrogram.
frame_rate = sample_rate/hop_length
For example: frame_rate = 62.5, sampling rate = 16 kHz
hop_length = 16000 / 62.5 = 256
It means we will set hop_length = 256 when using librosa.feature.melspectrogram().