When we plan to read an audio file, we can use scipy.io.wavfile.read() and librosa.load(), in this tutorial, we will introduce the difference between them.
scipy.io.wavfile.read()
scipy.io.wavfile.read(filename, mmap=False)
This function will open a wav file and return the sample rate and data of this wav file.
librosa.load()
librosa.load(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='kaiser_best')
This function will open an audio file based on sample rate (if it is not None) and return audio data and sample rate.
We will compare them using some examples.
scipy.io.wavfile.read() Vs librosa.load()
scipy.io.wavfile.read(): we can not open a wav file based on custom sample rate. However, librosa.load() can read.
For example:
from scipy.io import wavfile import librosa import numpy as np np.set_printoptions(threshold=np.inf) audio_file = './waihu/6eb2612c-fc23-4ead-b2dd-05009817f7e7.wav' fs, wavdata = wavfile.read(audio_file) print(fs) print(type(wavdata)) audio, fs = librosa.load(audio_file) print(fs) audio, fs = librosa.load(audio_file, sr = 4000) print(fs)
Run this code, you will get:
8000 <class 'numpy.ndarray'> 22050 4000
It means:
- scipy.io.wavfile.read() only can read a wav file based on original sample rate.
- If sr = None, librosa.load() will open a wav file base on defualt sample rate 22050.
- If we have set a sr, librosa.load() will read a audio file based on this sr.
- If you have many wav files with different sample rates, librosa.load() is a good choice to read audio data.
Look at code below:
from scipy.io import wavfile import librosa import numpy as np np.set_printoptions(threshold=np.inf) audio_file = './waihu/6eb2612c-fc23-4ead-b2dd-05009817f7e7.wav' fs, wavdata = wavfile.read(audio_file) print(wavdata[5000:5100]) audio, fs = librosa.load(audio_file, sr = 8000) print(audio[5000:5100])
Run this code, you will see:
[-4261 -1797 585 1701 2108 1668 928 191 294 1228 2165 2229 1134 -127 -664 -77 1101 2242 2704 2309 1328 442 371 914 1594 1855 1493 855 660 732 632 -1586 -4957 -7701 -7927 -4847 -367 2493 1150 -2137 -4518 -3791 -1486 492 1239 1453 1512 1122 563 344 1263 2205 2379 1207 -45 -426 277 1300 1835 1960 1740 1441 994 810 902 1335 1583 1363 733 598 988 1133 -457 -4040 -7262 -8377 -5986 -1513 2121 1995 -1100 -4103 -4409 -2127 287 1418 1419 1223 950 645 325 882 2011 2640 1896 261 -648 -225 1215 2075] [-0.1300354 -0.05484009 0.01785278 0.0519104 0.06433105 0.05090332 0.02832031 0.00582886 0.00897217 0.03747559 0.06607056 0.06802368 0.03460693 -0.00387573 -0.02026367 -0.00234985 0.03359985 0.06842041 0.08251953 0.07046509 0.04052734 0.01348877 0.01132202 0.02789307 0.04864502 0.05661011 0.04556274 0.02609253 0.0201416 0.02233887 0.01928711 -0.04840088 -0.15127563 -0.23501587 -0.24191284 -0.1479187 -0.01119995 0.07608032 0.03509521 -0.06521606 -0.13787842 -0.11569214 -0.04534912 0.01501465 0.03781128 0.04434204 0.04614258 0.03424072 0.0171814 0.01049805 0.0385437 0.06729126 0.07260132 0.03683472 -0.00137329 -0.01300049 0.00845337 0.03967285 0.05599976 0.05981445 0.05310059 0.04397583 0.03033447 0.02471924 0.02752686 0.04074097 0.04830933 0.04159546 0.02236938 0.01824951 0.03015137 0.03457642 -0.01394653 -0.12329102 -0.22161865 -0.25564575 -0.18267822 -0.0461731 0.06472778 0.06088257 -0.03356934 -0.12521362 -0.134552 -0.06491089 0.00875854 0.04327393 0.04330444 0.037323 0.0289917 0.01968384 0.00991821 0.0269165 0.06137085 0.08056641 0.05786133 0.00796509 -0.01977539 -0.00686646 0.03707886 0.06332397]
We can find:
scipy.io.wavfile.read() will return integer value, however, librosa.load() will return value between -1 ~ +1.