From: A large TV dataset for speech and music activity detection
Model Arch. | Training data | PCEN | ORF TV | Muspeak | OpenBMAT | TVSM-test | |
---|---|---|---|---|---|---|---|
Third-party method (T1) | CNN | 0.60 | 0.93 | 0.47 | 0.48 | ||
Third-party method (T2) | CRNN | 0.85 | 0.99 | 0.85 | 0.88 | ||
TCN-Cue | TCN | TVSM-cuesheet | 0.79 | 0.86 | 0.82 | 0.88 | |
TCN-P-Cue | TCN | TVSM-cuesheet | ✓ | 0.86 | 0.93 | 0.84 | 0.90 |
TCN-P-Pseu | TCN | TVSM-pseudo | ✓ | 0.87 | 0.97 | 0.87 | 0.93 |
CRNN-Cue | CRNN | TVSM-cuesheet | 0.89 | 0.93 | 0.88 | 0.93 | |
CRNN-P-Cue | CRNN | TVSM-cuesheet | ✓ | 0.92 | 0.94 | 0.90 | 0.91 |
CRNN-P-Pseu | CRNN | TVSM-pseudo | ✓ | 0.92 | 0.95 | 0.91 | 0.94 |