EURASIP Journal on Audio, Speech, and Music Processing

Table 3 The details of the proposed transformer encoder model. TE transformer encoder, SMSA sliced multi-head self-attention, MSA multi-head self-attention, MLP multi-layer perceptron, FC fully connected, MP global max pooling

From: Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy

Layer			Parameters	Output
Extractor TE×3	SAMA	Slice	k=4	T_dim×32
		FC×k	[1×1,32]	T_dim×32
		MSA	h=4	T_dim×32
	MLP	Cat	k=4	T_dim×128
		FC	[1×1,128]	T_dim×128
		FC	[1×1,512]	T_dim×512
		FC	[1×1,128]	T_dim×128
		GMP	[T_dim×1,1]	1×128
Classifier	MLP	FC	[1×1,64]	1×64
		FC	[1×1,C_num]	1×C_num

Back to article page