Skip to main content

Advertisement

Table 1 Average tonal syllable recognition rate (%) after speaker adaptation using conventional methods

From: Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Methods Number of adaptation sentences
  1 2 4 6 8 10
MAP + MLLR 53.32 54.93 57.83 58.50 59.65 60.16
Eigenvoice       
K=20 55.32 56.38 56.61 56.90 57.11 57.05
K = 40 55.67 56.59 57.03 57.26 57.62 57.45
K = 60 55.72 57.01 57.15 57.36 57.87 57.95
K = 80 55.37 56.97 57.39 57.45 58.14 58.18
K = 100 55.20 57.11 57.24 57.53 57.91 58.39
ML eigenphone       
N = 10 51.45 56.71 56.95 57.41 57.87 58.12
N = 25 47.25 55.73 57.99 59.36 59.34 59.57
N = 50 33.74 51.38 58.16 59.00 59.84 60.62
N = 100 19.14 41.46 54.30 57.91 59.44 60.13
MAP eigenphone, N = 50       
σ(-2) = 10 43.26 53.67 58.43 59.11 59.78 60.45
σ(-2) = 100 50.08 53.69 56.71 58.35 59.21 59.80
σ(-2) = 1,000 53.69 54.28 55.35 56.13 56.95 57.41
σ(-2) = 2,000 53.63 54.13 54.80 55.43 56.27 56.69
MAP eigenphone, N = 100       
σ(-2) = 10 27.91 44.63 53.78 57.39 59.61 60.70
σ(-2) = 100 45.24 50.31 55.77 57.55 59.34 60.30
σ(-2) = 1,000 53.29 54.22 55.75 56.78 57.41 58.29
σ(-2) = 2,000 53.92 54.28 55.52 56.34 56.55 57.74
  1. For MLLR + MAP adaptation, we only show the best results which were obtained at a prior weighting factor of 10 (for MAP) and 32 regression classes with a three-block-diagonal transformation matrix (for MLLR). For eigenvoice adaptation, K denotes the number of eigenvoices. For the eigenphone-based method, N denotes the number of eigenphones. For the MAP eigenphone method, σ(-2) denotes the inverse prior variance for the eigenphone, i.e., the weighting factor λ2 of the squared l2 norm term.