Skip to main content

Table 2 MCD and F0-RMSE results for different emotions

From: Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform

 

MCD

F0-RMSE

 

N2A

N2S

N2H

N2A

N2S

N2H

Source

6.03

5.18

6.30

76.8

73.7

100.4

DBNs+LG

5.67

4.88

5.55

76.3

72.0

99.3

DBNs+NMF

5.67

4.88

5.54

70.4

62.3

75.2

DBNs+CWT(30)

5.68

4.88

5.55

39.5

40.1

64.5

DBNs+CWT(40)

5.68

4.88

5.55

41.6

40.5

67.5

DBNs+AS-CWT

5.68

4.88

5.55

41.5

39.4

63.2

  1. N2A, N2S, and N2H represent the datasets neutral to angry voice, neutral to sad voice and neutral to happy voice, respectively