Fig. 1From: Multi-encoder attention-based architectures for sound recognition with partial visual assistanceDiagram of transformer model for sound recognitionBack to article page