Segment boundary detection directed attention for online end-to-end speech recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Test PER results of different models for online recognition on TIMIT dataset

Model	#Param (M)	PER (%)
Partial condition [12]	3.1	20.8
Hard alignment with RL [14]	6.8	20.5
Gaussian prediction attention [40]	5.8	20.4
Hard monotonic attention [16]	6.4	20.4
Stacked LSTM [15]	1.0	20.0
CTC [41]	3.8	19.6
Proposed method	6.9	20.2
Soft attention*	5.9	21.0
Soft attention bigger-E*	7.5	20.8
Soft attention bigger-D*	7.0	20.6

All the models use a unidirectional encoder and * indicates offline attention model