EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Event detection performance with various components of the proposed model. The standard deviations are computed from 5 iterations of each model

From: Multi-rate modulation encoding via unsupervised learning for audio event detection

	Condition	PSDS1	PSDS2
	Condition	avg \(\varvec{\pm }\) std	avg \(\varvec{\pm }\) std
1\(\times\)CRNN	\(f_c\) = 0.8 Hz	0.361 ± 0.007	0.601 ± 0.007
	\(f_c\) = 2.4 Hz	0.371 ± 0.006	0.603 ± 0.002
	\(f_c\) = 4 Hz	0.365 ± 0.005	0.593 ± 0.008
	VAE	0.365 ± 0.003	0.605 ± 0.004
2\(\times\)CRNN	\(f_{c1}\) = 0.8 Hz	0.368 ± 0.004	0.612 ± 0.007
	\(f_{c2}\) = 2.4 Hz	0.368 ± 0.004	0.612 ± 0.007
	\(f_{c1}\) = 0.8 Hz	0.365 ± 0.003	0.594 ± 0.003
	\(f_{c2}\) = 4 Hz	0.365 ± 0.003	0.594 ± 0.003
	\(f_{c1}\) = 2.4 Hz,	0.374 ± 0.009	0.611± 0.009
	\(f_{c2}\) = 4 Hz	0.374 ± 0.009	0.611± 0.009
3\(\times\)CRNN	Low-pass only	0.373 ± 0.007	0.607 ± 0.005
3\(\times\)CRNN	High-pass only	0.376 ± 0.007	0.613 ± 0.008

Back to article page