Understand n_fft, hop_length, win_length in Audio Processing

When we are using python libraso to process audio, we often encounter these three parameters: n_fft, hop_length, win_length. In this tutorial, we will introduce it for beginners.

For example,you can find them in librosa.feature.melspectrogram() function.

librosa.feature.melspectrogram(*, y=None, sr=22050, S=None, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='constant', power=2.0, **kwargs)

hop_length and win_length

The can be viewed as follows:

As to input signal, we can process with a window length, for example 50ms, if the sample rate is 22050, the window length = int(22050 * 0.05).

We can move an window from left to right with a hop length, for example, 10ms, then the hop length = int(22050*0.01).

We can find if the time of window and hop length are fixed, the value will different based audio sample rate.

Usually, we can set hop_length = win_legth // 4.

In order to get input signal, we can read this tutorial:

Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial

n_fft

n_fft is the length of the windowed signal after padding with zeros. A window length input signal will be padded with zeros to match n_fft. It means win_length<= n_fft.

Understand n_fft, hop_length, win_length in Audio Processing – Librosa Tutorial

hop_length and win_length

n_fft

Leave a Reply Cancel reply