python-3.x - Python Librosa:MFCC機能の計算に使用されるデフォルトのフレームサイズは?

Question

Librosa ライブラリを使用して、オーディオファイル 1319 秒の MFCC 機能をマトリックスに生成しました20 X 56829。ここ20では、MFCC 機能の数を表します (手動で調整できます)。しかし、オーディオの長さをどのように分割したかはわかりません56829。オーディオの処理に必要なフレームサイズは?

import numpy as np
import matplotlib.pyplot as plt
import librosa

def getPathToGroundtruth(episode):
    """Return path to groundtruth file for episode"""
    pathToGroundtruth = "../../../season01/Audio/" \
                        + "Season01.Episode%02d.en.wav" % episode
    return pathToGroundtruth

def getduration(episode):
    pathToAudioFile = getPathToGroundtruth(episode)
    y, sr = librosa.load(pathToAudioFile)
    duration = librosa.get_duration(y=y, sr=sr)
    return duration
def getMFCC(episode):
    filename = getPathToGroundtruth(episode)
    y, sr = librosa.load(filename)  # Y gives 
    data = librosa.feature.mfcc(y=y, sr=sr)
    return data


data = getMFCC(1)

score 21 · Accepted Answer

簡潔な答え

stft 計算で使用されるパラメーターを変更することで、長さの変更を指定できます。次のコードは、出力のサイズを 2 倍にします (20 x 113658)。

data = librosa.feature.mfcc(y=y, sr=sr, n_fft=1012, hop_length=256, n_mfcc=20)

長い答え

Librosa のlibrosa.feature.mfcc()関数は実際には、librosa のlibrosa.feature.melspectrogram()関数 ( librosa.core.stftandlibrosa.filters.mel関数のラッパー) のラッパーとして機能します。

オーディオ信号のセグメンテーションに関連するすべてのパラメーター (フレーム値とオーバーラップ値) は、メルスケールパワースペクトログラム関数で使用されるように指定されています (ネストされたコア関数に指定された他の調整可能なパラメーターと共に)。これらのパラメーターは、関数でキーワード引数として指定しますlibrosa.feature.mfcc()。

すべての追加**kwargsパラメータはlibrosa.feature.melspectrogram()、その後に渡されますlibrosa.filters.mel()

デフォルトでは、メルスケールのパワースペクトログラムウィンドウとホップ長は次のとおりです。

n_fft=2048

hop_length=512

したがって、既定のサンプルレート ( sr=22050) を使用すると仮定すると、mfcc 関数の出力は次のようになります。

出力長 = (秒) * (サンプルレート) / (hop_length)

(1319) * (22050) / (512) = 56804サンプル

調整できるパラメータは次のとおりです。

Melspectrogram Parameters
-------------------------
y : np.ndarray [shape=(n,)] or None
    audio time-series

sr : number > 0 [scalar]
    sampling rate of `y`

S : np.ndarray [shape=(d, t)]
    power spectrogram

n_fft : int > 0 [scalar]
    length of the FFT window

hop_length : int > 0 [scalar]
    number of samples between successive frames.
    See `librosa.core.stft`

kwargs : additional keyword arguments
  Mel filter bank parameters.
  See `librosa.filters.mel` for details.

メルスケールパワースペクトログラムの定義に使用されるメルフィルターバンクの特性をさらに指定する場合は、次のように調整できます。

Mel Frequency Parameters
------------------------
sr        : number > 0 [scalar]
    sampling rate of the incoming signal

n_fft     : int > 0 [scalar]
    number of FFT components

n_mels    : int > 0 [scalar]
    number of Mel bands to generate

fmin      : float >= 0 [scalar]
    lowest frequency (in Hz)

fmax      : float >= 0 [scalar]
    highest frequency (in Hz).
    If `None`, use `fmax = sr / 2.0`

htk       : bool [scalar]
    use HTK formula instead of Slaney

Librosa のドキュメント:

librosa.feature.melspectrogram

librosa.filters.mel

librosa.core.stft

python-3.x - Python Librosa:MFCC機能の計算に使用されるデフォルトのフレームサイズは?

1 に答える 1

Related

Reference