We can use torchaudio.load() to read audio data easily. For example:
Understand torchaudio.load(): Read Audio with Examples
TorchAudio Load Audio with Specific Sampling Rate
In this tutorial, we will discuss more details on it.
Syntax
torchaudio.load() is defined as:
torchaudio.load(uri: Union[BinaryIO, str, PathLike], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True, format: Optional[str] = None, buffer_size: int = 4096, backend: Optional[str] = None)
There are three important parameters.
- normalize = True, it will convert to the value of each frame to [-1, 1]
- num_frames = -1, how many frames you want to read in this audio file
- frame_offset = 0, where you plan to read audio frame.
Then we will use some examples to discuss the effect of these parameters and help you understand them.
Normalize
If True
wav_path = r'10091.wav' waveform, sample_rate = torchaudio.load(filepath=wav_path) x = waveform[:,:100] print(x)
Output:
tensor([[-3.0518e-05, -3.0518e-05, 0.0000e+00, -9.1553e-05, -3.0518e-05, 0.0000e+00, -3.0518e-05, -9.1553e-05, -3.0518e-05, 0.0000e+00, 0.0000e+00, -6.1035e-05, -9.1553e-05, -6.1035e-05, -6.1035e-05, ... ]])
You can use (1 << 15) to get the effect of normalize = False
y = x * (1 << 15) print(y)
Output:
tensor([[-1., -1., 0., -3., -1., 0., -1., -3., -1., 0., 0., -2., -3., -2., -2., 0., -1., -2., -5., -1., -2., 0., -1., -2., -2., 0., -1., 2., -2., -3., 0., -2., -2., 0., -1., 0., -2., -3., 0., -2., -3., -1., ... ]])
num_frames and frame_offset
You should notice: the audio data shape = frame number
For example:
wav_path = r'10091.wav' waveform, sample_rate = torchaudio.load(filepath=wav_path) print(waveform.shape)
Output:
torch.Size([1, 164127])
It means 10091.wav contains 164127 frames.
end_frame = 10000 start_frame = 1000 waveform, sample_rate = torchaudio.load( filepath=wav_path, num_frames=end_frame - start_frame, frame_offset=start_frame) print(waveform.shape)
From this code, we can find we will get end_frame – start_frame = 10000 – 1000 = 9000 frames.
Output:
torch.Size([1, 9000])