From: Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Values
Sampling frequency
16 kHz
Frame length
25 ms
Frame shift
10 ms
Feature space
25 dimensions with CMN
(12 MFCCs + Δ + Δpower)
Acoustic model
GMMs with 128 diagonal
covariance matrices