Skip to main content

Table 2 MCD and F0-RMSE results for different emotions

From: Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform

  MCD F0-RMSE
  N2A N2S N2H N2A N2S N2H
Source 6.03 5.18 6.30 76.8 73.7 100.4
DBNs+LG 5.67 4.88 5.55 76.3 72.0 99.3
DBNs+NMF 5.67 4.88 5.54 70.4 62.3 75.2
DBNs+CWT(30) 5.68 4.88 5.55 39.5 40.1 64.5
DBNs+CWT(40) 5.68 4.88 5.55 41.6 40.5 67.5
DBNs+AS-CWT 5.68 4.88 5.55 41.5 39.4 63.2
  1. N2A, N2S, and N2H represent the datasets neutral to angry voice, neutral to sad voice and neutral to happy voice, respectively