From: Accent modification for speech recognition of non-native speakers using neural style transfer
Layer | Output shape | Parameters |
---|---|---|
InputLayer | B, T, 161 | 0 |
Conv1D (F=220, K=3) | B, T, 220 | 389840 |
Conv1D (F=220, K=3) | B, T, 220 | 389840 |
Maxpool (P=2) | B, T, 220 | 880 |
Conv1D (F=150, K=3) | B, T, 150 | 265800 |
Conv1D (F=150, K=3) | B, T, 150 | 265800 |
Maxpool (P=2) | B, T, 150 | 600 |
Conv1D (F=100, K=3) | B, T, 100 | 177200 |
Conv1D (F=100, K=3) | B, T, 100 | 177200 |
Maxpool (P=2) | B, T, 100 | 400 |
Conv1D (F=80, K=3) | B, T, 80 | 141760 |
Conv1D (F=80, K=3) | B, T, 80 | 141760 |
Maxpool (P=2) | B, T, 80 | 320 |
Conv1D (F=80, K=3) | B, T, 80 | 141760 |
Conv1D (F=80, K=3) | B, T, 80 | 141760 |
Maxpool (P=2) | B, T, 80 | 320 |
Conv1D (F=80, K=3) | B, T, 80 | 141760 |
Conv1D (F=80, K=3) | B, T, 80 | 141760 |
Bidirectional (U=200) | B, T, 400 | 505200 |
BatchNormalization | B, T, 400 | 1600 |
TimeDistributed | B, T, 29 | 11629 |
Dropout | B, T, 29 | 0 |
TimeDistributed | B, T, 29 | 870 |
SoftmaxActivation | B, T, 29 | 0 |
 | Total params: | 3,038,059 |
 | Trainable params: | 3,038,059 |
 | Non-trainable params: | 0 |