Fig. 4From: Segment boundary detection directed attention for online end-to-end speech recognitionSpeech spectrogram and segment boundary detection output for utterance MRTK0_SX193 from validation set. Top: Speech spectrogram with ground-truth phone segments represented by dash lines and reference transcription denoted in each segment. Bottom: Segment boundary detection output generated with recognized hypothesis as decoder input. The blue line is the boundary probability of each input memory item ranging from 0 to 1 (most of the probabilities are around 0.2 to 0.6). The green dashed line indicates the threshold to emit output symbols which is set to 0.35 on TIMIT. And the red bars are detected boundaries based on threshold decision. The frame rate of boundary sequence is 1/3 of original input speech because of 1/3 downsampling rate in our model. And recognized hypothesis is also denoted in each detected segment with recognition errors marked in redBack to article page