Skip to main content

Table 3 Summary of unsupervised approaches. “RE” refers to direct reference encoding, “VAE” refers to approaches based on VAEs, “GST” refers to approaches based on GSTs and “ICL” refers to approaches based on in-context learning. Prosody level “U” stands for utterance, “Se” stands for sentence, “Pr” stands for phrase, “W” stands for word, “Sy” stands for syllable, “Pn” stands for phoneme, “C” stands for character and “F” stands for frame

From: Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

Ref No

Group

TTS Model

Prosody Level

Ref No

Group

TTS Model

Prosody Level

[58, 75, 91,92,93,94]

GST

Tacotron

U

[61, 95,96,97,98]

RE

FastSpeech2

Pn

[60, 62, 99,100,101]

RE

Tacotron2

U

[74, 102,103,104]

RE

Tacotron

U

[57, 105, 106]

GST

Tacotron2

U

[53, 77, 107]

VAE

Tacotron2

U

[35, 47]

VAE

FastSpeech

Pn

[48, 108]

OTHER

Tacotron2

C

[109, 110]

VAE

CHiVE

Se,W,Sy

[49, 111]

RE

Tacotron2

U,Pn

[112, 113]

RE

Tacotron

Pn

[59]

GST

Tacotron2

Se

[114]

GST

Tacotron2

Pn

[45]

OTHER

Tacotron2

Se

[115]

OTHER

Tacotron-like

Pn

[31]

RE

Tacotron

U,Pn

[55]

RE

Tacotron

Pn,F

[29]

OTHER

Tacotron2

U

[116]

RE

FastSpeech2

W

[68]

VAE

DL-SPSS

U,Pr,W

[38]

OTHER

FastSpeech

U,F

[117]

GST

Transformer TTS

U

[67]

VAE

DL-SPSS

Pr

[37]

RE

Tacotron2

U,Se

[118]

RE

FastSpeech2

U

[19]

RE

Tacotronr2

U

[76]

VAE

Voice-loop

U

[119]

VAE

NTTS

Pn

[120]

RE

Tacotron2

U,F

[17]

GST

FastSpeech2

Se

[121]

OTHER

Tacotron

U

[122]

RE

Tacotron2

Sy

[123]

VAE

Tacotron2

W,Pn

[71]

OTHER

Tacotron2

Pn

[124]

OTHER

Tacotron-like

Pr,W

[33]

RE

Tacotron/2

U,Sy

[51]

RE

Tacotron/2

U

[125]

OTHER

Tacotron

Pn

[126]

VAE

Tacotron

Pn

[52]

RE

FastSpeech

U

[72]

OTHER

Prosody-TTS

Pn

[70]

RE

Fastspeech2

C

[18]

ICL

NaturalSpeech2

F

[127]

RE

CopyCat, Tacotron2

W

[79]

RE

FastSpeech

U,Pn

[30]

OTHER

GraphPB

U,Pr,W

[128]

OTHER

FastSpeech2

U,Pn

[129]

VAE

DurIAN

Se

[130]

VAE

Tacotron-like

U,Pn

[44]

RE

AlignTTS

Pn

[131]

VAE

Tacotron-like

Se

[132]

OTHER

AdaSpeech 3

Pn

[63]

RE

Transformer TTS

U,Pn

[40]

OTHER

Transformer TTS

U,W

[54]

OTHER

Tacotronr2

W

[20]

OTHER

Tacotronr2

Pn

[21]

RE

InstructTTS

Se

[22]

ICL

VALL-E

F

[23]

RE

VITS

U,F

[24]

VAE

FastSpeech 2

U,W

[25]

ICL

Voicebox

F

[46]

GST

Tacotron2

W

[73]

GST

Tacotron

Sy

[133]

GST

Tacotron

Pn

[134]

RE

FastSpeech2

U,Pn

[135]

OTHER

DL-SPSS

Se,W,Sy,Pn

[136]

GST

FastSpeech2

U

[50]

GST

FastSpeech2

U,Se,Sy

[137]

GST

Tacotron

Pn

[138]

OTHER

DL-SPSS

F

[56]

RE

DL-SPSS

U