Noise-robust speech feature processing with empirical mode decomposition

EURASIP Journal on Audio, Speech, and Music Processing

Table 1 The word accuracy rates of clean-train tasks

	20 dB		15 dB		10 dB		5 dB		0 dB		-5 dB
	RwC	EMD	RwC	EMD	RwC	EMD	RwC	EMD	RwC	EMD	RwC	EMD
L.E.	98.0	96.5	95.8	93.3	89.8	85.7	76.2	68.3	51.9	37.5	30.2	13.8
C ₁	95.8	95.8	91.2	91.0	81.0	79.8	63.4	57.2	37.7	27.4	16.3	11.3
C ₂	95.1	96.0	88.1	91.2	72.7	79.7	49.3	56.9	25.1	27.0	10.1	10.8
C ₃	95.3	95.8	88.1	90.9	72.2	79.0	45.5	55.6	21.7	25.3	10.8	10.3
C ₄	94.0	95.8	85.7	90.8	69.2	79.0	46.1	55.8	22.9	24.9	10.4	10.1
C ₅	94.5	95.7	86.8	90.6	70.5	78.6	45.9	55.4	22.0	25.8	10.1	10.7
C ₆	94.3	95.7	86.2	90.6	68.5	78.4	42.9	54.9	20.0	25.2	9.3	10.6
C ₇	94.6	95.7	86.3	90.6	68.5	78.4	42.6	54.8	19.6	24.8	9.7	10.3
C ₈	94.3	95.8	86.0	90.8	67.8	78.5	41.9	55.1	19.3	25.5	9.7	10.9
C ₉	94.4	96.0	88.8	91.0	71.1	79.1	42.7	55.7	18.5	26.0	9.9	10.8
C 10	94.4	95.9	86.1	90.7	68.0	78.6	42.3	55.0	19.5	24.9	9.5	10.1
C 11	94.4	95.6	86.1	90.3	68.4	78.3	42.2	54.6	19.0	24.8	9.2	10.3
C 12	94.3	95.9	85.9	90.9	67.7	78.6	41.7	54.9	19.0	25.2	9.2	10.5
All		96.5		93.1		85.6		68.2		37.2		13.7
None	94.1		85.5		67.0		40.6		18.3		9.0

The noisy feature sequences are replaced with the clean feature sequences or they are processed by the proposed EMD-based method. Each number in the table is the average word accuracy over 10 test subsets, 4 subsets from Set A, 4 subsets from Set B, and 2 subsets from Set C for each SNR. RwC: Replaced with Clean; L.E.: the log-energy sequence; C_i: the i th MFCC sequence; all: the entire feature vector; none: no replacement or post-processing (baseline).