Skip to main content

Table 4 WER (%) for 5 ×512 network with soft targets from 6 ×2048 sequence-trained teacher with fMLLR inputs

From: Wise teachers train better DNN acoustic models

Input features

Targets

Data

Hub5’00-SWB

RT03S-FSH

FMLLR

Hard alignment

110 h transcribed

15.1 %

18.2 %

FBANK

Hard alignment

110 h transcribed

19.6 %

24.1 %

FBANK

FMLLR-sMBR outputs

110 h untranscribed

20.0 %

24.7 %

FBANK

FMLLR-sMBR outputs

300 h untranscribed

18.7 %

23.3 %