Skip to main content
Fig. 4 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 4

From: Segment boundary detection directed attention for online end-to-end speech recognition

Fig. 4

Speech spectrogram and segment boundary detection output for utterance MRTK0_SX193 from validation set. Top: Speech spectrogram with ground-truth phone segments represented by dash lines and reference transcription denoted in each segment. Bottom: Segment boundary detection output generated with recognized hypothesis as decoder input. The blue line is the boundary probability of each input memory item ranging from 0 to 1 (most of the probabilities are around 0.2 to 0.6). The green dashed line indicates the threshold to emit output symbols which is set to 0.35 on TIMIT. And the red bars are detected boundaries based on threshold decision. The frame rate of boundary sequence is 1/3 of original input speech because of 1/3 downsampling rate in our model. And recognized hypothesis is also denoted in each detected segment with recognition errors marked in red

Back to article page