Fig. 2
From: Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform

Example of performing segmentation in the training data. Here, X s , X p , and X w represent the durations of sentence, phrase, and word, respectively