In this tutorial, we will use some examples to introduce how to read an audio file using torchaudio.load()
Syntax
torchaudio.load() can be defined as:
torchaudio.load(filepath: str, frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None)
It will return (wav_data, sample_rate)
Here:
filepath: the path of audio file, it also can be a url
normalize: default = True. When True, it will convert the native sample type to float32. If input file is integer WAV, giving False will change the resulting Tensor type to integer type. This argument has no effect for formats other than integer WAV type.
channels_first: default = True. It will return [channel, time], when False, it will return [time, channel]
How to use torchaudio.load()?
Here we will use some example to show you how to use this function.
When normalize = False
import torchaudio wav_file = "008554.wav" wav, sr = torchaudio.load(wav_file, normalize = False) print(wav.shape, sr) print(wav)
We will see:
torch.Size([1, 230496]) 48000 tensor([[0, 0, 1, ..., 1, 0, 0]], dtype=torch.int16)
From the result, we can find:
- wav is a int16 tensor, the shape of it is [1,230496]. The channel = 1, which means 008554.wav is a mono audio.
- sr = 48000, which means the sample rate of this audio is 48k
When normalize = True
wav, sr = torchaudio.load(wav_file, normalize = True) print(wav.shape, sr) print(wav)
We will see:
torch.Size([1, 230496]) 48000 tensor([[0.0000e+00, 0.0000e+00, 3.0518e-05, ..., 3.0518e-05, 0.0000e+00, 0.0000e+00]])
We can find wav is a float32 tensor.
When channels_first = False
wav, sr = torchaudio.load(wav_file, channels_first = False) print(wav.shape, sr) print(type(wav))
We will get:
torch.Size([230496, 1]) 48000 <class 'torch.Tensor'>
How to slice the wav data?
We can slice a wav data similar to numpy array.
For example:
import torchaudio wav_file = "008554.wav" wav, sr = torchaudio.load(wav_file) wav = wav[0] sub_data = wav[0:20] print(sub_data.shape) sub_data = wav[-30:] print(sub_data.shape)
We will get:
torch.Size([20]) torch.Size([30])