Skip to main content

Table 2 FN, FP, TN, and TP rates of detecting voiced/unvoiced regions through HMEM2 and the HSMM-based method

From: Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

# training data

Implemented systems

Really voiced (%)

Really unvoiced (%)

50

HMEM2

Voiced

77.00

3.70

Unvoiced

7.09

12.21

HSMM

Voiced

78.23

5.79

Unvoiced

5.86

10.12

100

HMEM2

Voiced

75.78

2.75

Unvoiced

8.31

13.16

HSMM

Voiced

78.07

5.34

Unvoiced

6.02

10.57

200

HMEM2

Voiced

77.25

1.54

Unvoiced

6.84

14.37

HSMM

Voiced

78.43

4.43

Unvoiced

5.66

11.48

400

HMEM2

Voiced

77.18

1.34

Unvoiced

6.91

14.57

HSMM

Voiced

76.09

2.70

Unvoiced

8.00

13.21

800

HMEM2

Voiced

77.10

0.83

Unvoiced

6.99

15.08

HSMM

Voiced

77.17

2.66

Unvoiced

6.92

13.25