Skip to main content

Table 3 Results of comparison to state-of-the-art method on VoxCeleb-E and VoxCeleb-H evaluation protocols

From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

Method

Input

Backbone

Loss

CN-Celeb

VoxCeleb-E

VoxCeleb-H

Chung et al. [35]

S

ResNet50

TAP

/

4.42%

7.33%

Thin ResNet3 4[38]

S

Thin ResNet34

GhostVLAD

20.04%

3.13%

5.06%

Nagrani et al. [39]

S

Thin ResNet34

GhostVLAD

/

2.95%

4.93%

SpeakerNet [40]

S

SpeakerNet-M

SP

19.33%

2.69%

4.80%

DANet [42]

S

DANet

Double SA

24.11%

3.18%

4.61%

RawNet2

Raw

RawNet2

GRU

24.27%

2.57%

4.89%

RawNet-origin-SA*

Raw

RawNet-origin-SA

GRU

23.49%

2.37%

4.54%

RawNet -SA

Raw

RawNet-SA

GRU

22.24%

2.54%

4.52%

  1. “*” denotes that the network is initialized with the trained RawNet2 parameters. Original-SA denotes the self-attention layer in Fig. 6A