On the Impact of Children's Emotional Speech on Acoustic and Language Models

EURASIP Journal on Audio, Speech, and Music Processing

Table 5 Scenario 1: adaptation of the acoustic and linguistic models; results are given in terms of word accuracy (%). The baseline system (acoustic and linguistic models are trained on Ohm_N) is given in column "Ohm_N" and is identical in all three tables. "" denotes the arithmetic (unweighted) mean. The average of the four subsets weighted by the prior probabilities of the four classes is given in row "Mont."

	Acoustic models trained on
Test set	Ohm_M	Ohm_N	Ohm_E	Ohm_A
Mont_M	43.1	43.6	34.2	32.8
Mont_N	44.9	60.3	54.0	55.8
Mont_E	42.8	61.3	74.8	67.2
Mont_A	51.2	64.9	75.5	73.5
	45.5	57.5	59.6	57.3
Mont	45.0	60.3	56.2	57.1
	Linguistic bigrams trained on
Test set	Ohm_M	Ohm_N	Ohm_E	Ohm_A
Mont_M	49.3	43.6	37.4	38.8
Mont_N	56.0	60.3	58.0	59.9
Mont_E	56.3	61.3	67.0	67.0
Mont_A	60.1	64.9	68.0	68.5
	55.4	57.5	57.6	58.6
Mont	56.1	60.3	58.8	60.5
	Acoustic and linguistic models trained on
Test set	Ohm_M	Ohm_N	Ohm_E	Ohm_A
Mont_M	47.4	43.6	32.0	30.6
Mont_N	40.8	60.3	52.6	54.7
Mont_E	35.7	61.3	76.5	70.2
Mont_A	46.0	64.9	75.3	75.3
	42.5	57.5	59.6	57.7
Mont	40.8	60.3	55.1	56.4