An Adaptive Framework for Acoustic Monitoring of Potential Hazards

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Recognition rates achieved regarding each stage of system's topology for different kinds of environments. The recognition score without the additional feature extraction stage is depicted in parenthesis for comparison.

Classification problem	No. of mixtures	Feature set	Recognition rate (%)
Vocalic versus non-vocalic sound events (subway environment)	64	MFCC+dMFCC	100
Vocalic versus non-vocalic sound events (urban environment)	128	MFCC+dMFCC	99.85
Vocalic versus non-vocalic sound events (military environment)	128	MFCC+dMFCC+MPEG-7 LLDs	100
Typical versus atypical non-vocalic sound events (subway environment)	128	MFCC+dMFCC+MPEG-7 LLDs	97.2 (87.6)
Typical versus atypical non-vocalic sound events (urban environment)	128	MFCC+dMFCC+MPEG-7 LLDs	92.95 (88.2)
Typical versus atypical non-vocalic sound events (military environment)	32	MFCC+dMFCC+MPEG-7 LLDs	100 (91.6)
Explosion versus gunshot sound events	512	MFCC+dMFCC+MPEG-7 LLDs	83.9 (76.4)
Normal versus screamed speech	128	MFCC+dMFCC+intonation+CB-TEO-auto-Env	100 (89.1)