TorchAudio Audio Resampling Tutorial for Beginners

By | February 7, 2023

In this tutorial, we will introduce how to resample an audio in torchaudio. It is very important when we are processing audio data.

How to resample an audio?

In torchaudio, we can use torchaudio.transforms.Resample() or torchaudio.functional.resample() to resample an audio.

torchaudio.transforms.Resample() is defined as:

def __init__(
            self,
            orig_freq: int = 16000,
            new_freq: int = 16000,
            resampling_method: str = 'sinc_interpolation',
            lowpass_filter_width: int = 6,
            rolloff: float = 0.99,
            beta: Optional[float] = None,
            *,
            dtype: Optional[torch.dtype] = None,
    )

torchaudio.functional.resample() is defined as:

def resample(
        waveform: Tensor,
        orig_freq: int,
        new_freq: int,
        lowpass_filter_width: int = 6,
        rolloff: float = 0.99,
        resampling_method: str = "sinc_interpolation",
        beta: Optional[float] = None,
)

Comparing these two methods, we will find:

resampling_method: The resampling method to use. It can be: “sinc_interpolation” or “kaiser_window”

lowpass_filter_width: Controls the sharpness of the filter. However, using a larger lowpass_filter_width provides a sharper, more precise filter, but is more computationally expensive.

rolloff: The roll-off frequency of the filter, as a fraction of the Nyquist. A lower rolloff will reduce the amount of aliasing, but it will also reduce some of the higher frequencies.

sinc_interpolation vs kaiser_window

By default, torchaudio’s resample uses the Hann window filter, which is a weighted cosine function. It additionally supports the Kaiser window, which is a near optimal window function that contains an additional beta parameter that allows for the design of the smoothness of the filter and width of impulse. This can be controlled using the resampling_method parameter.

Here is the effect of different resampling method.

sample_rate = 48000
resample_rate = 32000

resampled_waveform = F.resample(waveform, sample_rate, resample_rate, resampling_method="sinc_interpolation")
plot_sweep(resampled_waveform, resample_rate, title="Hann Window Default")

resampled_waveform = F.resample(waveform, sample_rate, resample_rate, resampling_method="kaiser_window")
plot_sweep(resampled_waveform, resample_rate, title="Kaiser Window Default")

We will see:

torchaudio resampling methods

The effect of lowpass_filter_width

sample_rate = 48000
resample_rate = 32000

resampled_waveform = F.resample(waveform, sample_rate, resample_rate, lowpass_filter_width=6)
plot_sweep(resampled_waveform, resample_rate, title="lowpass_filter_width=6")

resampled_waveform = F.resample(waveform, sample_rate, resample_rate, lowpass_filter_width=128)
plot_sweep(resampled_waveform, resample_rate, title="lowpass_filter_width=128")

The effect is:

The effect of lowpass_filter_width

The effect of rolloff

sample_rate = 48000
resample_rate = 32000

resampled_waveform = F.resample(waveform, sample_rate, resample_rate, rolloff=0.99)
plot_sweep(resampled_waveform, resample_rate, title="rolloff=0.99")

resampled_waveform = F.resample(waveform, sample_rate, resample_rate, rolloff=0.8)
plot_sweep(resampled_waveform, resample_rate, title="rolloff=0.8")

The effect is:

The effect of rolloff