EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Pitch-scale factor (∣α_st∣) percentages and good concatenation percentages

From: A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

		∣α_st∣				Concat.
	Configuration	[0–4]	(4–7]	(7–12]	> 12	Good
	S7p	94.2	4.5	1.2	0.1	33.1
	S7pdC (100 bpm)	48.0	24.0	22.9	5.1	67.5
S7	S7pdC (50 bpm)	47.1	24.0	23.7	5.1	68.1
	S7pdLC (100 bpm)	36.2	24.6	30.8	8.3	70.5
	S7pdLC (50 bpm)	36.2	24.1	31.2	8.4	70.4
	MLC	14.3	24.2	46.9	14.6	72.3
	S4p	98.6	1.2	0.3	0.0	44.2
	S4pdC (100 bpm)	69.2	19.0	11.1	0.7	70.4
S4	S4pdC (50 bpm)	68.7	18.9	11.7	0.7	71.2
	S4pdLC (100 bpm)	60.4	22.6	15.7	1.3	72.1*
	S4pdLC (50 bpm)	59.8	22.8	16.2	1.3	71.7
	MLC	37.7	31.4	28.6	2.3	72.3
	S0p	99.8	0.2	0.0	0.0	52.9
	S0pdC (100 bpm)	88.1	10.3	1.5	0.0	78.4
S0	S0pdC (50 bpm)	87.5	10.9	1.6	0.0	77.8
	S0pdLC (100 bpm)	82.5	14.2	3.3	0.0	76.7
	S0pdLC (50 bpm)	82.1	14.6	3.3	0.0	76.5
	MLC	68.1	25.6	6.3	0.0	72.3

Each row shows the percentages corresponding to a particular vocal range (S0, S4, or S7) and US configuration. Differences with respect to MLC are statistically significant (p < 0.01) for all configurations, except *

Back to article page