Skip to main content

Table 2 An overview of English-annotated datasets

From: Automated audio captioning: an overview of recent progress and new challenges

Dataset

# of audios

# of captions per audio

Audio duration

Vocab size

Avg caption lengths

AudioCaps

51308

1, 5

10 s

5066

8.79

Clotho

5929

5

15–30 s

4365

11.33

MACS

3930

2, 3 ,4, 5

10 s

2776

9.24