Skip to main content
Fig. 5 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 5

From: Segment boundary detection directed attention for online end-to-end speech recognition

Fig. 5

Conventional offline attention and segment boundary detection directed attention for utterance MRTK0_SX193 from validation set. Reference transcription is used as decoder input for both models and each phone symbol is denoted to its corresponding alignment. The soft offline attention is generated by the “soft attention bigger-D” model, which contains a downsampling layer with 1/3 downsampling rate in encoder and consumes comparable amounts of parameters to boundary detection directed attention model (see Table 2). The dark dash lines in the first row indicate ground-truth phone segments with 1/3 downsampling rate. The red dash lines in the second row indicate the detected segment boundaries

Back to article page