Skip to main content

Table 4 The role of self-attention mechanisms on the recognition performance

From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

Models

VoxCeleb-E

VoxCeleb-H

CN-Celeb

EER

DCF08

DCF10

EER

DCF08

DCF10

EER

DCF08

DCF10

RawNet2

2.57%

0.14

0.52

4.89%

0.24

0.64

24.27%

0.78

0.97

RawNet2*

2.43%

0.13

0.50

4.60%

0.23

0.64

24.23%

0.78

0.96

RawNet2 w/out SA*

2.44%

0.14

0.48

4.69%

0.23

0.64

23.55%

0.77

0.94

RawNet-MHSA

2.75%

0.15

0.53

4.91%

0.24

0.65

22.16%

0.75

0.93

RawNet-all-SA

3.69%

0.20

0.61

6.61%

0.32

0.73

22.51%

0.77

0.96

RawNet-origin-SA*

2.37%

0.13

0.50

4.54%

0.22

0.63

23.49%

0.78

0.94

RawNet-SA

2.54%

0.14

0.47

4.52%

0.22

0.65

22.24%

0.76

0.94

  1. “*” denotes that the network is initialized with the trained RawNet2 parameters