Skip to main content

Table 1 Average tonal syllable recognition rate (%) after speaker adaptation using conventional methods

From: Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Methods

Number of adaptation sentences

 

1

2

4

6

8

10

MAP + MLLR

53.32

54.93

57.83

58.50

59.65

60.16

Eigenvoice

      

K=20

55.32

56.38

56.61

56.90

57.11

57.05

K = 40

55.67

56.59

57.03

57.26

57.62

57.45

K = 60

55.72

57.01

57.15

57.36

57.87

57.95

K = 80

55.37

56.97

57.39

57.45

58.14

58.18

K = 100

55.20

57.11

57.24

57.53

57.91

58.39

ML eigenphone

      

N = 10

51.45

56.71

56.95

57.41

57.87

58.12

N = 25

47.25

55.73

57.99

59.36

59.34

59.57

N = 50

33.74

51.38

58.16

59.00

59.84

60.62

N = 100

19.14

41.46

54.30

57.91

59.44

60.13

MAP eigenphone, N = 50

      

σ(-2) = 10

43.26

53.67

58.43

59.11

59.78

60.45

σ(-2) = 100

50.08

53.69

56.71

58.35

59.21

59.80

σ(-2) = 1,000

53.69

54.28

55.35

56.13

56.95

57.41

σ(-2) = 2,000

53.63

54.13

54.80

55.43

56.27

56.69

MAP eigenphone, N = 100

      

σ(-2) = 10

27.91

44.63

53.78

57.39

59.61

60.70

σ(-2) = 100

45.24

50.31

55.77

57.55

59.34

60.30

σ(-2) = 1,000

53.29

54.22

55.75

56.78

57.41

58.29

σ(-2) = 2,000

53.92

54.28

55.52

56.34

56.55

57.74

  1. For MLLR + MAP adaptation, we only show the best results which were obtained at a prior weighting factor of 10 (for MAP) and 32 regression classes with a three-block-diagonal transformation matrix (for MLLR). For eigenvoice adaptation, K denotes the number of eigenvoices. For the eigenphone-based method, N denotes the number of eigenphones. For the MAP eigenphone method, σ(-2) denotes the inverse prior variance for the eigenphone, i.e., the weighting factor λ2 of the squared l2 norm term.