From: Multi-encoder attention-based architectures for sound recognition with partial visual assistance
Speech
Frying
Dog
Blender
Cat
Running water
Alarm/bell/ringing
Vacuum cleaner
Dishes
Electric shaver/toothbrush