From: Multi-encoder attention-based architectures for sound recognition with partial visual assistance