Skip to main content

Table 3 The details of the proposed transformer encoder model. TE transformer encoder, SMSA sliced multi-head self-attention, MSA multi-head self-attention, MLP multi-layer perceptron, FC fully connected, MP global max pooling

From: Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy

Layer

Parameters

Output

Extractor TE×3

SAMA

Slice

k=4

Tdim×32

  

FC×k

[1×1,32]

Tdim×32

  

MSA

h=4

Tdim×32

 

MLP

Cat

k=4

Tdim×128

  

FC

[1×1,128]

Tdim×128

  

FC

[1×1,512]

Tdim×512

  

FC

[1×1,128]

Tdim×128

  

GMP

[Tdim×1,1]

1×128

Classifier

MLP

FC

[1×1,64]

1×64

  

FC

[1×1,Cnum]

1×Cnum