Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

EURASIP Journal on Audio, Speech, and Music Processing

Table 1 Average tonal syllable recognition rate (%) after speaker adaptation using conventional methods

Methods	Number of adaptation sentences
	1	2	4	6	8	10
MAP + MLLR	53.32	54.93	57.83	58.50	59.65	60.16
Eigenvoice
K=20	55.32	56.38	56.61	56.90	57.11	57.05
K = 40	55.67	56.59	57.03	57.26	57.62	57.45
K = 60	55.72	57.01	57.15	57.36	57.87	57.95
K = 80	55.37	56.97	57.39	57.45	58.14	58.18
K = 100	55.20	57.11	57.24	57.53	57.91	58.39
ML eigenphone
N = 10	51.45	56.71	56.95	57.41	57.87	58.12
N = 25	47.25	55.73	57.99	59.36	59.34	59.57
N = 50	33.74	51.38	58.16	59.00	59.84	60.62
N = 100	19.14	41.46	54.30	57.91	59.44	60.13
MAP eigenphone, N = 50
σ^(-2) = 10	43.26	53.67	58.43	59.11	59.78	60.45
σ^(-2) = 100	50.08	53.69	56.71	58.35	59.21	59.80
σ^(-2) = 1,000	53.69	54.28	55.35	56.13	56.95	57.41
σ^(-2) = 2,000	53.63	54.13	54.80	55.43	56.27	56.69
MAP eigenphone, N = 100
σ^(-2) = 10	27.91	44.63	53.78	57.39	59.61	60.70
σ^(-2) = 100	45.24	50.31	55.77	57.55	59.34	60.30
σ^(-2) = 1,000	53.29	54.22	55.75	56.78	57.41	58.29
σ^(-2) = 2,000	53.92	54.28	55.52	56.34	56.55	57.74

For MLLR + MAP adaptation, we only show the best results which were obtained at a prior weighting factor of 10 (for MAP) and 32 regression classes with a three-block-diagonal transformation matrix (for MLLR). For eigenvoice adaptation, K denotes the number of eigenvoices. For the eigenphone-based method, N denotes the number of eigenphones. For the MAP eigenphone method, σ^(-2) denotes the inverse prior variance for the eigenphone, i.e., the weighting factor λ₂ of the squared l₂ norm term.