Skip to main content

Table 3 Results of comparison to state-of-the-art method on VoxCeleb-E and VoxCeleb-H evaluation protocols

From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

Method Input Backbone Loss CN-Celeb VoxCeleb-E VoxCeleb-H
Chung et al. [35] S ResNet50 TAP / 4.42% 7.33%
Thin ResNet3 4[38] S Thin ResNet34 GhostVLAD 20.04% 3.13% 5.06%
Nagrani et al. [39] S Thin ResNet34 GhostVLAD / 2.95% 4.93%
SpeakerNet [40] S SpeakerNet-M SP 19.33% 2.69% 4.80%
DANet [42] S DANet Double SA 24.11% 3.18% 4.61%
RawNet2 Raw RawNet2 GRU 24.27% 2.57% 4.89%
RawNet-origin-SA* Raw RawNet-origin-SA GRU 23.49% 2.37% 4.54%
RawNet -SA Raw RawNet-SA GRU 22.24% 2.54% 4.52%
  1. “*” denotes that the network is initialized with the trained RawNet2 parameters. Original-SA denotes the self-attention layer in Fig. 6A