Skip to main content

Table 2 Major experimental settings

From: Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Model structure
Attention-heads 8 Decoder-blocks 6
Hidden-units 512 Residual-drop 0.3
Encoder-blocks 6 Attention-drop 0.0
Training settings
Max-length 5000 GPUs (K40m) 4
Tokens/batch 10000 Warmup-steps 12000
Epochs 30 Steps 300000
Label-smooth 0.1 Optimizer Adam
Testing settings
Ave. chkpoints Last 20 Batch-size 100
Length-penalty 0.6 Beam-size 13
Max-length 50 GPUs (K40m) 4