Skip to main content

Table 2 Pitch-scale factor (αst) percentages and good concatenation percentages

From: A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

  

αst

Concat.

 

Configuration

[0–4]

(4–7]

(7–12]

> 12

Good

 

S7p

94.2

4.5

1.2

0.1

33.1

 

S7pdC (100 bpm)

48.0

24.0

22.9

5.1

67.5

S7

S7pdC (50 bpm)

47.1

24.0

23.7

5.1

68.1

 

S7pdLC (100 bpm)

36.2

24.6

30.8

8.3

70.5

 

S7pdLC (50 bpm)

36.2

24.1

31.2

8.4

70.4

 

MLC

14.3

24.2

46.9

14.6

72.3

 

S4p

98.6

1.2

0.3

0.0

44.2

 

S4pdC (100 bpm)

69.2

19.0

11.1

0.7

70.4

S4

S4pdC (50 bpm)

68.7

18.9

11.7

0.7

71.2

 

S4pdLC (100 bpm)

60.4

22.6

15.7

1.3

72.1*

 

S4pdLC (50 bpm)

59.8

22.8

16.2

1.3

71.7

 

MLC

37.7

31.4

28.6

2.3

72.3

 

S0p

99.8

0.2

0.0

0.0

52.9

 

S0pdC (100 bpm)

88.1

10.3

1.5

0.0

78.4

S0

S0pdC (50 bpm)

87.5

10.9

1.6

0.0

77.8

 

S0pdLC (100 bpm)

82.5

14.2

3.3

0.0

76.7

 

S0pdLC (50 bpm)

82.1

14.6

3.3

0.0

76.5

 

MLC

68.1

25.6

6.3

0.0

72.3

  1. Each row shows the percentages corresponding to a particular vocal range (S0, S4, or S7) and US configuration. Differences with respect to MLC are statistically significant (p < 0.01) for all configurations, except *