Fig. 2From: Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transformExample of performing segmentation in the training data. Here, X s , X p , and X w represent the durations of sentence, phrase, and word, respectivelyBack to article page