Tutorial Example

Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial

When we use librosa.load() to read an audio file, we may get a numpy ndarray, the value of it is -1.0 and 1.0. In this tutorial, we will introduce you the reason.

Read an audio file

We also can use scipy.io.wavfile.read() to read an audio file, we will get an integer numpy array. The difference between scipy.io.wavfile.read() and librosa.load() you can read this tutorial:

The Difference Between scipy.io.wavfile.read() and librosa.load() in Python – Python Tutorial

librosa.load()

We can use code below to read an audio.

import librosa
import soundfile as sf

audio_file = r'F:\6.wav'

#read wav data
audio, sr = librosa.load(audio_file, sr= 8000, mono=True)
print(audio.shape, sr)
print(audio)

Run this code, we will get this result:

(101600,) 8000
[-0.00024414 -0.00024414  0.00024414 ... -0.00170898 -0.00219727
 -0.0012207 ]

The audio data is -1.0 and 1.0

librosa.load() is defined as:

librosa.load(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='kaiser_best')

It will call soundfile.read() to read an audio file, you can find answer in its source code.

Look at example code below:

#read wav data
audio, sr = librosa.load(audio_file, sr= 8000, mono=True)
print("read by librosa.load()")
print(audio.shape, sr)
print(audio)

print("read by soundfile.read()")
audio, sr = sf.read(audio_file, dtype="float32")
print(audio)

We will see:

We will find:

librosa.load(audio_file, sr= 8000, mono=True) = sf.read(audio_file, dtype=”float32″)

As to soundfile.read(), it will get different audio data based on dtype.

dtype ({‘float64’, ‘float32’, ‘int32’, ‘int16’}, optional) –

Data type of the returned array, by default ‘float64’. Floating point audio data is typically in the range from -1.0 to 1.0. Integer data is in the range from -2**15 to 2**15-1 for ‘int16’ and from -2**31 to 2**31-1 for ‘int32’.

As to librosa.load(), the default type of it is numpy.float32, it will determines the dtype is float32 in soundfile.read(). It means the audio data is -1.0 to 1.0

As to our wav file, it is pcm 16bits. You can find your audio data format by following this tutorial:

View Audio Sample Rate, Data Format PCM or ALAW Using ffprobe – Python Tutorial

The wav data is limited in -2**15 to 2**15-1

How to read integer audio data using librosa.load()?

We can set the data type is numpy.int32. Here is an example:

#read wav data
audio, sr = librosa.load(audio_file, sr= 8000, mono=True)
print("read float data by librosa.load()")
print(audio.shape, sr)
print(audio)
print("convert to integer using 2**15")
print(audio*32768.0) #2^15

print("read integer data by librosa.load()")
audio, sr = librosa.load(audio_file, sr = None, mono=False, dtype=np.int16)
print(audio.shape, sr)
print(audio)

You will see this result: