From: Automated audio captioning: an overview of recent progress and new challenges
Dataset | # of audios | # of captions per audio | Audio duration | Vocab size | Avg caption lengths |
---|---|---|---|---|---|
AudioCaps | 51308 | 1, 5 | 10 s | 5066 | 8.79 |
Clotho | 5929 | 5 | 15–30 s | 4365 | 11.33 |
MACS | 3930 | 2, 3 ,4, 5 | 10 s | 2776 | 9.24 |