Skip to main content

Table 1 Test set performance comparison of models selected on a validation set. The second column indicates the number of Gaussians per phoneme. For ensemble methods, denotes models, each having Gaussian components per state. GMM indicates a model consisting of a single Gaussian mixture for each phoneme. HMM indicates a model consisting of three Gaussian mixtures per phoneme. Thus, for HMMs, the total number of Gaussians is three times that of the GMMs with an equal number of components per state. Boost and Bag models indicate models trained using the standard boosting and bagging algorithm, respectively, on the phoneme classification task, while E-boost indicates the expectation boosting algorithm for word error rate minimisation. Finally embed indicates that embedded training was performed subsequently to initialisation of the model.

From: Phoneme and Sentence-Level Ensembles for Speech Recognition

Model

Gaussians

Word error rate (%)

GMM

30

8.31

GMM embed

40

8.12

Boost GMM

7.41

HMM

10

7.52

HMM embed

10

7.04

Boost HMM

6.81

E-Boost HMM

7 × 10 ()

6.75

Bag HMM

16 × 20

5.97