From: A large TV dataset for speech and music activity detection
Model Arch. | Training data | PCEN | Muspeak | AVASpeech | TVSM-test | |
---|---|---|---|---|---|---|
Third-party method (T1) | CNN | 0.94 | 0.79 | 0.84 | ||
Third-party method (T2) | CRNN | 0.97 | 0.77 | 0.81 | ||
TCN-Cue | TCN | TVSM-cuesheet | 0.60 | 0.86 | 0.90 | |
TCN-P-Cue | TCN | TVSM-cuesheet | ✓ | 0.61 | 0.86 | 0.89 |
TCN-P-Pseu | TCN | TVSM-pseudo | ✓ | 0.60 | 0.88 | 0.91 |
CRNN-Cue | CRNN | TVSM-cuesheet | 0.63 | 0.86 | 0.91 | |
CRNN-P-Cue | CRNN | TVSM-cuesheet | ✓ | 0.63 | 0.86 | 0.91 |
CRNN-P-Pseu | CRNN | TVSM-pseudo | ✓ | 0.67 | 0.88 | 0.91 |