Skip to main content

Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 6 Simultaneous speech-music event detection results with different network architectures

From: Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Model L N p Train Validation Test
     Cost Acc.% Cost Acc.% Cost Acc.%
FConn 6 256 5.77 0.977 58.93 1.038 56.19 1.043 55.80
CNN3x3 6 256 6.68 0.726 71.10 0.740 70.39 0.746 70.37
C1-LSTM 4 256 6.53 0.788 67.58 0.877 64.82 0.886 64.04
C2-LSTM 6 256 6.59 0.651 74.43 0.726 71.48 0.733 70.98
  1. The model column refers to the network architecture, L and N are the number of hidden layers and nodes in each layer (the detailed function of these parameters in each structure can be found in Section 3.3). p is a base-10 logarithmic measure of the number of parameters. The value of the cost or loss function and the clasiffication accuracy is included for the training, validation and test subsets. The best model in terms of validation cost is highlighted in italics