Deep neural networks for automatic speech processing: a survey from large corpora to limited data

EURASIP Journal on Audio, Speech, and Music Processing

Table 5 Architecture used for siamese and prototypical networks

Layer number	Layer type	Parameters
0	Input data	MFCC with a windowing of 25 ms and a 10 ms stride
1	Stacked bidirectional GRUs	5 GRUs of 256 cells each
2	Dropout	Of 0.2
3	Batch normalization	For each direction
4	Linear layer	128 filters