Skip to main content

Advertisement

Table 2 FN, FP, TN, and TP rates of detecting voiced/unvoiced regions through HMEM2 and the HSMM-based method

From: Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

# training data Implemented systems Really voiced (%) Really unvoiced (%)
50 HMEM2 Voiced 77.00 3.70
Unvoiced 7.09 12.21
HSMM Voiced 78.23 5.79
Unvoiced 5.86 10.12
100 HMEM2 Voiced 75.78 2.75
Unvoiced 8.31 13.16
HSMM Voiced 78.07 5.34
Unvoiced 6.02 10.57
200 HMEM2 Voiced 77.25 1.54
Unvoiced 6.84 14.37
HSMM Voiced 78.43 4.43
Unvoiced 5.66 11.48
400 HMEM2 Voiced 77.18 1.34
Unvoiced 6.91 14.57
HSMM Voiced 76.09 2.70
Unvoiced 8.00 13.21
800 HMEM2 Voiced 77.10 0.83
Unvoiced 6.99 15.08
HSMM Voiced 77.17 2.66
Unvoiced 6.92 13.25