Skip to main content
Fig. 2 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 2

From: Segment boundary detection directed attention for online end-to-end speech recognition

Fig. 2

Learning curves of several attention models. Left: The valid PER for each epoch on TIMIT dataset. Right: The valid WER for each epoch on WSJ dataset with dev93 as validation set. The “soft attention,” also denoted as baseline model, means a model equipped with conventional soft offline attention while using the same unidirectional encoder and decoder of our proposed model. And the “soft attention bigger-E” means the baseline model with another GRU layer stacked on the encoder. “Soft attention bigger-D” means the baseline model with another GRU layer stacked on the decoder. Both “soft attention bigger” models have similar or more amount of parameters with our proposed model (see Table 2). The “boundary detection directed attention” means our proposed model in this work. The sudden drops in dev93 WER represent the learning rate decay during training procedure

Back to article page