From: Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
Model structure | |||
Attention-heads | 8 | Decoder-blocks | 6 |
Hidden-units | 512 | Residual-drop | 0.3 |
Encoder-blocks | 6 | Attention-drop | 0.0 |
Training settings | |||
Max-length | 5000 | GPUs (K40m) | 4 |
Tokens/batch | 10000 | Warmup-steps | 12000 |
Epochs | 30 | Steps | 300000 |
Label-smooth | 0.1 | Optimizer | Adam |
Testing settings | |||
Ave. chkpoints | Last 20 | Batch-size | 100 |
Length-penalty | 0.6 | Beam-size | 13 |
Max-length | 50 | GPUs (K40m) | 4 |