From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit
Method | Input | Backbone | Loss | CN-Celeb | VoxCeleb-E | VoxCeleb-H |
---|---|---|---|---|---|---|
Chung et al. [35] | S | ResNet50 | TAP | / | 4.42% | 7.33% |
Thin ResNet3 4[38] | S | Thin ResNet34 | GhostVLAD | 20.04% | 3.13% | 5.06% |
Nagrani et al. [39] | S | Thin ResNet34 | GhostVLAD | / | 2.95% | 4.93% |
SpeakerNet [40] | S | SpeakerNet-M | SP | 19.33% | 2.69% | 4.80% |
DANet [42] | S | DANet | Double SA | 24.11% | 3.18% | 4.61% |
RawNet2 | Raw | RawNet2 | GRU | 24.27% | 2.57% | 4.89% |
RawNet-origin-SA* | Raw | RawNet-origin-SA | GRU | 23.49% | 2.37% | 4.54% |
RawNet -SA | Raw | RawNet-SA | GRU | 22.24% | 2.54% | 4.52% |