EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Test WER results of different models for online recognition on WSJ dataset

From: Segment boundary detection directed attention for online end-to-end speech recognition

Model	#Param (M)	WER (%)
Hard alignment with RL [14]	2.7	27.0
CTC [52]	–	22.7
Hard monotonic attention [16]	2.9	17.4
MoChA [17]	3.0	15.0 ±0.6
Soft attention* [17]	2.9	14.6 ±0.3
Proposed method (BPE)	6.6	15.5 ±0.4
Soft attention (BPE)*	5.4	15.3 ±0.3
Soft attention bigger-E (BPE)*	6.7	15.1 ±0.3
Soft attention bigger-D (BPE)*	6.7	15.0 ±0.3
Proposed method (Char)	6.6	22.2 ±0.5
Soft attention bigger-D (Char)*	6.7	16.3 ±0.3

All the models use a unidirectional encoder and * indicates offline attention model

Back to article page