Figure 6
From: Phoneme and Sentence-Level Ensembles for Speech Recognition

In the experiments reported in Section 5. 2, the number of states and number of Gaussian mixtures per state were tuned on a hold-out set prior to the analysis. (a) displays the word error rate performance of an HMM with 10 Gaussians per state when the number of emitting states per phoneme is varied, with rather dramatic effects. (b) displays the word error rate performance of an HMM with 3 emitting states as the number of Gaussians per state varies. In this case, the effect on generalisation is markedly lower.Hold-out set, 10 Gaussians/stateHold-out set, 3 states/phoneme