EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Resolution-event-wise results (frame-score %) for the best performing spectrogram patch size (20 frames/patch) DNN-only model

From: Exploiting spectro-temporal locality in deep learning based acoustic event detection

AE	Frame length (resolution)
	10 ms	20 ms	30 ms	40 ms	50 ms	60 ms
ap	76.39 %	65.39 %	66.32 %	82.65 %	69.90 %	72.85 %
cl	71.84 %	84.04 %	68.17 %	62.26 %	62.40 %	70.64 %
cm	31.59 %	35.71 %	33.73 %	30.98 %	44.00 %	23.62 &
co	36.97 %	27.82 %	27.49 %	17.09 %	21.58 %	29.97 %
ds	29.70 %	16.92 %	17.76 %	11.62 %	38.74 %	21.66 %
kj	12.90 %	11.46 %	14.64 %	12.70 %	17.11 %	13.66 %
kn	49.66 %	27.08 %	37.03 %	66.89 %	44.57 %	23.55 %
kt	38.37 %	27.97 %	26.98 %	27.29 %	32.61 %	28.59 %
la	13.67 %	12.14 %	12.48 %	10.78 %	10.90 %	11.48 %
pr	53.58 %	55.98 %	51.35 %	60.25 %	55.43 %	52.82 %
pw	83.34 %	82.28 %	87.28 %	92.02 %	92.69 %	88.15 %
st	54.85 %	47.15 %	51.83 %	46.43 %	63.27 %	48.38 &
all	69.20 %	69.80 %	67.34 %	68.09 %	68.33 %	67.34 %

Back to article page