Fig. 2From: Automated audio captioning: an overview of recent progress and new challengesDiagram of an RNN audio encoder for acoustic encoding. The RNN encoder aims at modeling temporal relationships within the input representation. The encoded audio features usually have the same number of time frames as the input representation and interact with the decoder through a pooling or attention mechanismBack to article page