Skip to main content

Table 4 WER (%) for 5 ×512 network with soft targets from 6 ×2048 sequence-trained teacher with fMLLR inputs

From: Wise teachers train better DNN acoustic models

Input features Targets Data Hub5’00-SWB RT03S-FSH
FMLLR Hard alignment 110 h transcribed 15.1 % 18.2 %
FBANK Hard alignment 110 h transcribed 19.6 % 24.1 %
FBANK FMLLR-sMBR outputs 110 h untranscribed 20.0 % 24.7 %
FBANK FMLLR-sMBR outputs 300 h untranscribed 18.7 % 23.3 %