Skip to main content

Table 6 Word error rate (%) after unsupervised speaker adaptation on the WSJ task

From: Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Methods

Number of adaptation sentences

 

2

4

6

8

10

20

EV

13.88

13.82

13.76

13.68

13.64

13.58

 

K=100

K=120

K=150

K=150

K=150

K=150

MLLR

14.44

13.86

13.70

13.56

13.43

13.22

SAT + MLLR

13.96

13.41

13.37

13.35

13.26

13.06

ML-EP

16.28

14.24

13.75

13.47

13.41

13.06

SAT + ML-EP

16.80

14.24

13.51

13.17

13.12

12.70

SGL-EP

14.05

13.72

13.52

13.41

13.37

13.00

SAT + SGL-EP

13.92

13.36

13.29

13.11

13.03

12.70

  1. The WER of the SI model is 14.71%. For the sake of brevity, only the best results of each adaptation method are shown in the table. For MLLR, the best results were obtained at a prior weighting factor of 10 (for MAP) and 32 regression classes with a three-block-diagonal transformation matrix (for MLLR). For the eigenphone method, the number of eigenphones (N) was fixed to 100. The weighting factors of the SGL regularization method were set to λ1=10 and λ3=30.