Understand the Difference of MelSpec, FBank and MFCC in Audio Feature Extraction

MelSpec, FBank and MFCC can be used as an audio feature in deep learning. What is the difference among them? In this tutorial, we will introduce it for you.

MelSpec

MelSpec is called Mel-filter bank coefficients. It can be computed by some python library.

python librosa:

librosa.feature.melspectrogram()

python python_speech_features:

fbank()

You should notice: although fbank() in python_speech_features is called fbank, it does not compute FBank feature.

FBank

FBank is called Log Mel-filter bank coefficients, it can be computed by log(MelSpec)

In python librosa, we can compute FBank as follows:

Compute Audio Log Mel Spectrogram Feature: A Step Guide – Python Audio Processing

In python python_speech_features:

logfbank() method can be used.

MFCC

MFCC is called Mel-frequency cepstral coefficients.

In python librosa:

librosa.feature.mfcc()

In python python_speech_features:

mfcc()

The relation among them are below:

This picture is from:

Comparison of Different Feature Types for Acoustic Event Detection System

Understand the Difference of MelSpec, FBank and MFCC in Audio Feature Extraction – Python Audio Processing

MelSpec

FBank

MFCC

Leave a Reply Cancel reply