Fig. 2From: Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networksThe confusion matrix (CM) of the best prediction on the test set of the DCASE 2018, task 5 datasetBack to article page