Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 FN, FP, TN, and TP rates of detecting voiced/unvoiced regions through HMEM2 and the HSMM-based method

# training data	Implemented systems		Really voiced (%)	Really unvoiced (%)
50	HMEM2	Voiced	77.00	3.70
	HMEM2	Unvoiced	7.09	12.21
	HSMM	Voiced	78.23	5.79
	HSMM	Unvoiced	5.86	10.12
100	HMEM2	Voiced	75.78	2.75
	HMEM2	Unvoiced	8.31	13.16
	HSMM	Voiced	78.07	5.34
	HSMM	Unvoiced	6.02	10.57
200	HMEM2	Voiced	77.25	1.54
	HMEM2	Unvoiced	6.84	14.37
	HSMM	Voiced	78.43	4.43
	HSMM	Unvoiced	5.66	11.48
400	HMEM2	Voiced	77.18	1.34
	HMEM2	Unvoiced	6.91	14.57
	HSMM	Voiced	76.09	2.70
	HSMM	Unvoiced	8.00	13.21
800	HMEM2	Voiced	77.10	0.83
	HMEM2	Unvoiced	6.99	15.08
	HSMM	Voiced	77.17	2.66
	HSMM	Unvoiced	6.92	13.25