Skip to main content

Table 2 Major experimental settings

From: Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Model structure

Attention-heads

8

Decoder-blocks

6

Hidden-units

512

Residual-drop

0.3

Encoder-blocks

6

Attention-drop

0.0

Training settings

Max-length

5000

GPUs (K40m)

4

Tokens/batch

10000

Warmup-steps

12000

Epochs

30

Steps

300000

Label-smooth

0.1

Optimizer

Adam

Testing settings

Ave. chkpoints

Last 20

Batch-size

100

Length-penalty

0.6

Beam-size

13

Max-length

50

GPUs (K40m)

4