Skip to main content

Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 4 Speech event detection results with different network architectures

From: Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Model L N p Train Validation Test
     Cost Acc.% Cost Acc.% Cost Acc.%
FConn 6 512 6.23 0.489 77.03 0.510 76.45 0.518 75.58
CNN 3×3 7 128 6.04 0.322 86.86 0.383 83.65 0.387 83.72
CNN 7×7 6 64 6.17 0.362 85.02 0.380 84.07 0.390 83.21
LSTM 1 64 4.70 0.547 73.69 0.544 73.51 0.547 73.41
C1-LSTM 3 256 6.40 0.406 82.56 0.436 80.96 0.437 80.80
C2-LSTM 6 256 6.59 0.377 84.30 0.375 84.34 0.382 83.99
  1. The Model column refers to the network architecture, L and N are the number of hidden layers and nodes in each layer (the detailed function of these parameters in each structure can be found in Section 3.3). p is a base-10 logarithmic measure of the number of parameters. The value of the cost or loss function and the clasiffication accuracy is included for the training, validation, and test subsets. The best model in terms of validation cost is highlighted in italics