From: A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
Layer | Input channels | Output channels | Kernel size | Stride |
---|---|---|---|---|
Convolutional layer 1 | 1 | 64 | (3, 3) | (1, 1) |
Convolutional layer 2 | 64 | 64 | (3, 3) | (1, 1) |
Convolutional layer 3 | 64 | 128 | (3, 3) | (1, 1) |
Convolutional layer 4 | 128 | 128 | (3, 3) | (1, 1) |