Skip to main content

Table 4 The role of self-attention mechanisms on the recognition performance

From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

Models VoxCeleb-E VoxCeleb-H CN-Celeb
EER DCF08 DCF10 EER DCF08 DCF10 EER DCF08 DCF10
RawNet2 2.57% 0.14 0.52 4.89% 0.24 0.64 24.27% 0.78 0.97
RawNet2* 2.43% 0.13 0.50 4.60% 0.23 0.64 24.23% 0.78 0.96
RawNet2 w/out SA* 2.44% 0.14 0.48 4.69% 0.23 0.64 23.55% 0.77 0.94
RawNet-MHSA 2.75% 0.15 0.53 4.91% 0.24 0.65 22.16% 0.75 0.93
RawNet-all-SA 3.69% 0.20 0.61 6.61% 0.32 0.73 22.51% 0.77 0.96
RawNet-origin-SA* 2.37% 0.13 0.50 4.54% 0.22 0.63 23.49% 0.78 0.94
RawNet-SA 2.54% 0.14 0.47 4.52% 0.22 0.65 22.24% 0.76 0.94
  1. “*” denotes that the network is initialized with the trained RawNet2 parameters