Black-box adversarial attacks through speech distortion for speech emotion recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Performance of the three models under the Vocal Tract Length Normalization attack (UA/WA/ACC)

α_vtln	CNN-LSTM (%)	GCN (%)	CNN-MAA (%)
0.20	9.25/10.36/9.80	9.54/9.61/9.58	8.85/9.12/8.99
0.15	11.56/11.84/11.70	10.22/11.50/10.86	10.75/10.94/10.85
0.10	15.62/17.45/16.54	14.12/17.21/15.67	14.55/14.85/14.70
0.05	20.55/24.63/22.59	19.31/20.56/19.94	19.42/19.78/19.60
0.00	62.54/64.27/63.41	75.23/72.32/73.78	76.24/73.32/74.78
− 0.05	24.77/25.61/25.19	25.49/27.38/26.44	20.55/21.64/21.10
− 0.10	16.51/16.35/16.43	14.85/15.64/15.25	17.43/17.87/17.65
− 0.15	12.43/14.62/13.53	11.26/12.43/11.85	12.65/13.44/13.05
− 0.20	10.45/11.24/10.85	10.46/10.65/10.56	10.32/10.55/10.44