Skip to main content

Table 2 Performance of the three models under the Vocal Tract Length Normalization attack (UA/WA/ACC)

From: Black-box adversarial attacks through speech distortion for speech emotion recognition

αvtln

CNN-LSTM (%)

GCN (%)

CNN-MAA (%)

0.20

9.25/10.36/9.80

9.54/9.61/9.58

8.85/9.12/8.99

0.15

11.56/11.84/11.70

10.22/11.50/10.86

10.75/10.94/10.85

0.10

15.62/17.45/16.54

14.12/17.21/15.67

14.55/14.85/14.70

0.05

20.55/24.63/22.59

19.31/20.56/19.94

19.42/19.78/19.60

0.00

62.54/64.27/63.41

75.23/72.32/73.78

76.24/73.32/74.78

− 0.05

24.77/25.61/25.19

25.49/27.38/26.44

20.55/21.64/21.10

− 0.10

16.51/16.35/16.43

14.85/15.64/15.25

17.43/17.87/17.65

− 0.15

12.43/14.62/13.53

11.26/12.43/11.85

12.65/13.44/13.05

− 0.20

10.45/11.24/10.85

10.46/10.65/10.56

10.32/10.55/10.44

  1. Bold fonts indicate the best attack performance under the current modes