Skip to main content

Table 5 Ablation study of the Mel-spectrogram architecture showing the effect of number of heads, patch size, projection dimension, and number of MLP nodes. Highest values are highlighted

From: Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music

SL.No

Architecture Spec.

Option

Vi-T

Swin-T

   

F1 Micro

F1 Macro

F1 Micro

F1 Macro

1

Number of heads

4

0.58

0.52

0.62

0.53

  

8

0.59

0.57

0.62

0.61

2

Patch size

4 x 4

0.55

0.50

0.62

0.61

  

6 x 6

0.59

0.57

0.60

0.53

3

Projection dimension

64

0.59

0.57

0.56

0.51

  

96

0.58

0.51

0.62

0.61

4

Number of MLP nodes

2048,1024

0.59

0.57

0.62

0.55

  

256,256

0.57

0.51

0.62

0.61