From: A large TV dataset for speech and music activity detection
Error rate
Deletion rate
Insertion rate
Music
0.70
0.68
0.02
Speech
0.33
0.18
0.15