Audio mel-spectrogram is a classic feature for deep learning. In this tutorial, we will introduce how to get and display it using python.
librosa.feature.melspectrogram()
This function can compute a mel-scaled spectrogram.
It is defined as:
- librosa.feature.melspectrogram(*, y=None, sr=22050, S=None, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='constant', power=2.0, **kwargs)
Here are some important parameters:
y: the audio data, it may (,n) shape.
sr: the audio sample rate.
hop_length: number of samples between successive frames. It will affect the result.
win_length: Each frame of audio is windowed by window()
From the source code, we can find the relation between hop_length and win_length is:
- # By default, use the entire frame
- if win_length is None:
- win_length = n_fft
- # Set the default hop, if it's not already specified
- if hop_length is None:
- hop_length = int(win_length // 4)
- fft_window = get_window(window, win_length, fftbins=True)
We will use an example to explain this function.
Read a wav file
- import librosa
- import numpy as np
- audio_file =r'D:\1481637021654134785_sep.wav'
- audio_data, sr = librosa.load(audio_file, sr= 8000, mono=True)
- print(audio_data.shape)
In this example code, we use librosa.load() to read audio data. Here is the detail.
Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial
Run this code, we will get:
- (182015,)
It means the sample poit is 182015 in this file.
Compute Mel-spectrogram
We will use librosa.feature.melspectrogram() to compute mel-spectrogram. Here is an example:
- melspectrum = librosa.feature.melspectrogram(y=audio_data, sr=sr, hop_length= 512, window='hann', n_mels=256)
- print(melspectrum.shape)
Run this code, we will get:
- (256, 356)
If we change parameters hop_length and n_mels, how about the result?
- melspectrum = librosa.feature.melspectrogram(y=audio_data, sr=sr, hop_length= 200, window='hann', n_mels=128)
- print(melspectrum.shape) #(128, 911)
The result will be 128*911.
From above we can find: the mel-spectrogram is a matrix. It is:
[n_mels, len(audio_data)//hop_length +1]
For example, if n_mels = 128, hop_length = 200,
len(audio_data)//hop_length +1 = 182015//200 + 1 = 911.
Display Mel-spectrogram
When we have computed Mel-spectrogram, we can display it. Here is an example:
- import matplotlib.pyplot as plt
- import librosa.display
- fig, ax = plt.subplots()
- S_dB = librosa.power_to_db(melspectrum, ref=np.max)
- img = librosa.display.specshow(S_dB, x_axis='time',
- y_axis='mel', sr=sr,
- ax=ax)
- fig.colorbar(img, ax=ax, format='%+2.0f dB')
- ax.set(title='Mel-frequency spectrogram')
- plt.show()
As to function: librosa.display.specshow() shoud be same to librosa.feature.melspectrogram().
So we should set hop_length = 512, then run this code, we will get an image as follows: