MelSpec, FBank and MFCC can be used as an audio feature in deep learning. What is the difference among them? In this tutorial, we will introduce it for you.
MelSpec
MelSpec is called Mel-filter bank coefficients. It can be computed by some python library.
python librosa:
librosa.feature.melspectrogram()
python python_speech_features:
fbank()
You should notice: although fbank() in python_speech_features is called fbank, it does not compute FBank feature.
FBank
FBank is called Log Mel-filter bank coefficients, it can be computed by log(MelSpec)
In python librosa, we can compute FBank as follows:
Compute Audio Log Mel Spectrogram Feature: A Step Guide – Python Audio Processing
In python python_speech_features:
logfbank() method can be used.
MFCC
MFCC is called Mel-frequency cepstral coefficients.
In python librosa:
librosa.feature.mfcc()
In python python_speech_features:
mfcc()
The relation among them are below:
This picture is from:
Comparison of Different Feature Types for Acoustic Event Detection System