Understand Frame Rate of the Mel-spectrogram in Audio – Librosa Tutorial

By | March 5, 2022

In this tutorial, we will introduce how to compute the frame rate of the mel-spectrogram using python librosa.

You may find this description in some papers:

In our implementation, the frame rate of the mel-spectrogram is 62.5 Hz and the sampling rate of speech waveform is 16 kHz

This sentence contains two questions:

• 1.How to compute the sampling rate of an audio?
• 2.How to compute the frame rate of the mel-spectrogram?

Here we will answer these two question one by one.

How to compute the sampling rate of an audio?

It is easy to get the sampling rate of an audio. Here is the tutorial:

View Audio Sample Rate, Data Format PCM or ALAW Using ffprobe – Python Tutorial

Meanwhile, we also can use librosa.load() to read audio data using a customized sampling rate.

Understand librosa.load() is Between -1.0 and 1.0 – Librosa Tutorial

How to compute the frame rate of the mel-spectrogram?

In order to compute mel-spectrogram, we can use librosa.feature.melspectrogram(). Here is the tutorial:

Compute and Display Audio Mel-spectrogram in Python – Python Tutorial

The key parameter is: hop_length

We can use formula below to compute the frame rate of the mel-spectrogram.

frame_rate = sample_rate/hop_length

For example: frame_rate = 62.5, sampling rate = 16 kHz

hop_length = 16000 / 62.5 = 256

It means we will set hop_length = 256 when using librosa.feature.melspectrogram().