When we are using python libraso to process audio, we often encounter these three parameters: n_fft, hop_length, win_length. In this tutorial, we will introduce it for beginners.
For example,you can find them in librosa.feature.melspectrogram() function.
librosa.feature.melspectrogram(*, y=None, sr=22050, S=None, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='constant', power=2.0, **kwargs)
hop_length and win_length
The can be viewed as follows:
As to input signal, we can process with a window length, for example 50ms, if the sample rate is 22050, the window length = int(22050 * 0.05).
We can move an window from left to right with a hop length, for example, 10ms, then the hop length = int(22050*0.01).
We can find if the time of window and hop length are fixed, the value will different based audio sample rate.
Usually, we can set hop_length = win_legth // 4.
In order to get input signal, we can read this tutorial:
Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial
n_fft
n_fft is the length of the windowed signal after padding with zeros. A window length input signal will be padded with zeros to match n_fft. It means win_length<= n_fft.