From: Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset
Accuracy (%)
False positive (%)
False negative (%)
Speech
Single-task network
83.99
10.41
5.60
Double-task network
83.81
10.67
5.52
Music
84.20
9.86
5.94
84.16
8.68
7.16