Skip to main content

Advertisement

Table 6 Word error rate (%) after unsupervised speaker adaptation on the WSJ task

From: Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Methods Number of adaptation sentences
  2 4 6 8 10 20
EV 13.88 13.82 13.76 13.68 13.64 13.58
  K=100 K=120 K=150 K=150 K=150 K=150
MLLR 14.44 13.86 13.70 13.56 13.43 13.22
SAT + MLLR 13.96 13.41 13.37 13.35 13.26 13.06
ML-EP 16.28 14.24 13.75 13.47 13.41 13.06
SAT + ML-EP 16.80 14.24 13.51 13.17 13.12 12.70
SGL-EP 14.05 13.72 13.52 13.41 13.37 13.00
SAT + SGL-EP 13.92 13.36 13.29 13.11 13.03 12.70
  1. The WER of the SI model is 14.71%. For the sake of brevity, only the best results of each adaptation method are shown in the table. For MLLR, the best results were obtained at a prior weighting factor of 10 (for MAP) and 32 regression classes with a three-block-diagonal transformation matrix (for MLLR). For the eigenphone method, the number of eigenphones (N) was fixed to 100. The weighting factors of the SGL regularization method were set to λ1=10 and λ3=30.