From: Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music
Hyperparameter
Vi-T
Swin-T
Image size
72  × 72
Patch dimension
6  × 6
4  × 4
Hyper parameter (C)
64
96
Number of heads
8
Number of windows
NA
4
Number of MLP nodes
2048,1048
256, 256
Mini batch-size
256
32