Metric | Description |
---|---|
Mean Opinion Score (MOS) | Listeners to scores quality (naturalness and intelligibility) of synthesized speech with a five-point scoring system. |
Comparison Mean Opinion score (CMOS) | Compares MOS values between models under test and the baseline via comparing ground truth and synthetic samples from each model. |
Differential mean opinion score (DMOS) | Listeners score samples from one to five based on its similarity to a specific emotion or style. |
AB preference test | Listeners score same sentence synthesized by the two models and select the one that fulfills the given condition more than the other. |
ABX preference test | Listeners hear three samples A, B and X ,where X represents the target speech, and they should score the one that is more close to target speech. |
MUltiple Stimuli Hidden Reference and Anchor (MUSHRA) | Listeners are presented with mixed samples including synthesized sample, natural speech samples (named proper reference) and total loss sample (named anchor). Listeners score each sample from 0 to 100 through a double-blind listening test. |