Usually, we use python librosa.load() to read a wav file and we will get a numpy array between -1.0 and 1.0. Here is the tutorial:
Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial
However, librosa.load() may return a value that is greater than 1.0. In this tutorial, we will introduce you how to fix it.
For example:
wav = r"audio_data/speech-us-gov-0028.wav" wav_data, sr = librosa.load(wav, sr = 8000, mono = True) print(sr) print(wav_data) print(np.abs(wav_data).max())
Run this code, we may see:
8000 [-2.5019117e-05 -9.3096860e-06 2.3915986e-06 ... 4.3445010e-02 2.1312233e-02 0.0000000e+00] 1.1275722
It means the sample rate is 8000 and the maximum value in wav_data is 1.1275722, which is greater than 1.0
This may cause some error. For example, if we use webrtcvad to process this file, it need the value between -1.0 and 1.0. You may get an error:
ValueError: when data.type is float, data must be – 1.0 <= data <= 1.0.
How to fix this error?
We should limit the value in -1.0 and 1.0. We can do as follows:
if np.abs(wav_data).max() > 1.0: wav_data *= (0.99 / np.abs(wav_data).max()) print(wav_data) print(np.abs(wav_data).max())
Run this code, we will see:
[-2.1966598e-05 -8.1738353e-06 2.0998059e-06 ... 3.8144395e-02 1.8711982e-02 0.0000000e+00] 0.99
The maximum value in wav_data is 0.99, which is lower than 1.0.
Finally, this error is fixed.