Skip to main content

Table 9 The influence of different similarity measurements on the recognition performance

From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

Models

Similarity

VoxCeleb-E

VoxCeleb-H

CN-Celeb

EER

DCF08

DCF10

EER

DCF08

DCF10

EER

DCF08

DCF10

RawNet2

Cosine

2.57%

0.14

0.52

4.89%

0.24

0.64

24.27%

0.78

0.97

PLDA

3.78%

0.19

0.58

6.43%

0.29

0.70

27.76%

0.82

1.00

B-vector

3.39%

0.19

0.69

5.99%

0.32

0.87

26.16%

0.82

1.00

RawNet-origin-SA*

Cosine

2.37%

0.13

0.50

4.54%

0.22

0.63

23.49%

0.78

0.94

PLDA

3.51%

0.17

0.59

6.04%

0.28

0.72

27.46%

0.81

0.97

B-vector

3.17%

0.18

0.69

5.60%

0.29

0.84

26.24%

0.81

1.00

RawNet-SA

Cosine

2.54%

0.14

0.47

4.52%

0.22

0.65

22.24%

0.76

0.94

PLDA

3.94%

0.19

0.59

6.48%

0.29

0.76

24.67%

0.80

0.96

B-vector

3.54%

0.21

0.72

6.46%

0.37

0.91

22.84%

0.84

1.00

  1. “*” denotes that the network is initialized with the trained RawNet2 parameters