Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Accuracies averaged over all noise types in test set A

	Clean	20 dB	15 dB	10 dB	5 dB	0 dB	−5 dB
Modulation features sparse coding	90.62	90.87	89.90	88.17	84.46	76.83	59.65
1-frame exemplar (Sys1)
Modulation features MLP 135	96.93	96.66	95.84	94.07	87.14	68.05	35.46
input nodes multi-condition
Modulation features + Δ + Δ Δ	97.71	97.36	96.74	95.08	89.79	70.58	34.55
MLP 405 input nodes
multi-condition
PLP + Δ and Δ Δ MLP 351 input	99.08	98.89	98.45	96.89	91.80	72.80	35.67
nodes [35] multi-condition
Mel features sparse coding [24]	93.43	90.94	89.06	84.57	75.91	58.20	32.57
5-frame exemplars
Mel features sparse coding [24]	93.68	92.53	92.02	90.78	88.01	78.93	57.11
30-frame exemplars

Accuracies (averaged over all noise types in test set A) obtained with Sys1 (SC system operating on 135-D modulation spectrum features), MLP classifiers (on same features without and with Δ s and Δ Δ s), MLP classifier on PLP features with Δ s and Δ Δ s [35], SC classifier on Mel spectra [24] using 5- and 30-frame windows, respectively.