Skip to main content

Table 1 An overview of published methods for audio captioning

From: Automated audio captioning: an overview of recent progress and new challenges

Reference

Year

Audio encoder

Text decoder

Key aspects

Drossos et al. [11]

2017

RNN

RNN

Attention

Wu et al. [29]

2019

RNN

RNN

N\(\backslash\)A

Xu et al. [19]

2019

RNN

RNN

Sentence similarity loss

Ikawa et al. [30]

2019

RNN

RNN

“Specificity” term

Kim et al. [20]

2019

CNN(VGGish)+RNN

RNN

Multi-scale features, semantic attention

Nguyen et al. [33]

2020

RNN

RNN

Temporal subsampling

Cakir et al. [57]

2020

RNN

RNN

Multi-task learning (keywords)

Perez-Castanos et al. [76]

2020

CNN

RNN

Attention

Chen et al. [34]

2020

CNN

Transformer

Pre-trained encoder

Xu et al. [43]

2020

CRNN

RNN

Reinforcement learning

Takeuchi et al. [42]

2020

CNN+RNN

RNN

Keywords, sentence length estimation

Tran et al. [40]

2020

CNN

Transformer

1-D and 2-D CNN

Eren et al. [39]

2020

CNN(PANNs)+RNN

RNN

Keywords

Koizumi et al. [18]

2020

CNN(VGGish)+Transformer

Transformer

Keywords

Koizumi et al. [68]

2020

CNN(VGGish)

GPT-2+Transformer

GPT-2, similar captions retrieval

Xu et a. [44]

2021

CNN\(\backslash\)CRNN

RNN

Attention, transfer learning

Mei et al. [35]

2021

CNN(PANNs)

Transformer

Transfer learning, reinforcement learning

Mei et al. [47]

2021

Transformer

Transformer

Full transformer network

Han et al. [37]

2021

CNN(PANNs)

Transformer

Weakly supervised pre-training, keywords

Ye et al. [36]

2021

CNN(PANNs)

RNN

Keywords, attention

Gontier et al. [69]

2021

CNN(VGGish)

BART

YAMNet tags, BART

Narisetty et al. [48]

2021

CNN(PANNs)+Conformer

Transformer+RNN

ASR techniques

Liu et al. [23]

2021

CNN(PANNs)

Transformer

Contrastive learning

Won et al. [77]

2021

CNN(PANNs)

Transformer

Transfer learning

Berg et al. [22]

2021

CNN

Transformer

Continual learning

Weck et al. [56]

2021

CNN(VGGish,YAMNet,OpenL3,COALA)

Transformer

Transfer learning

Mei et al. [62]

2021

CNN(PANNs)

Transformer

GAN, diversity

Xiao et al. [59]

2022

CNN

Transformer

Attention-free Transformer

Liu et al. [70]

2022

CNN(PANNs)

BERT

Transfer learning, BERT

Chen et al. [73]

2022

CNN

Transformer

Transfer learning, contrastive learning

Koh et al. [66]

2022

CNN(PANNs)+Transformer

Transformer

Transfer learning, regularization

Narisetty et al. [75]

2022

Transformer

Transformer

Joint modeling of ASR and AAC