Three-stage training and orthogonality regularization for spoken language recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Parameters of the conformer ASR model

Conformer encoder
Number of blocks	12
Linear dimensionality	2048
Output size	256
Number of attention heads	4
Dropout rate	0.1
Type of activation	Swish
Type of the positional encoding layer	Relative
Transformer decoder
Linear dimensionality	2048
Number of blocks	6
Number of attention heads	4
ASR Training
CTC weight	0.3
Label smoothing	0.1