Skip to main content

Table 1 Definition and gradient of four pooling functions

From: Neural network-based non-intrusive speech quality assessment using attention pooling function

Pooling function

Definition

Gradient

Max pooling

\(y = \mathop {\max }\limits _{i} {y_{i}}\)

\(\frac {{\partial y}}{{\partial {y_{i}}}} = \left \{ \begin {array}{l} 1,i = \mathop {\arg \max }\limits _{i} {y_{i}}\\ 0,{\text {else}} \end {array} \right.\)

Average pooling

\({y} = \frac {1}{n}{\Sigma _{i}}{y_{i}}\)

\(\frac {{\partial y}}{{\partial {y_{i}}}} = \frac {1}{n}\)

Linear softmax

\({y = \frac {{{\Sigma _{i}}{{\left ({{y_{i}}} \right)}^{2}}}}{{{\Sigma _{i}}{y_{i}}}}}\)

\(\frac {{\partial y}}{{\partial {y_{i}}}} = \frac {{2{y_{i}} - y}}{{{\sum \nolimits }_{j} {{y_{j}}} }} \)

Attention

\({y = \frac {{{\Sigma _{i}}{y_{i}} {w_{i}}}}{{{\Sigma _{i}}{w_{i}}}}}\)

\(\frac {{\partial y}}{{\partial {y_{i}}}} = \frac {{{w_{i}}}}{{{\sum \nolimits }_{j} {{w_{j}}}}},\frac {{\partial y}}{{\partial {w_{i}}}} = \frac {{{y_{i}} - y}}{{{\sum \nolimits }_{j} {{w_{j}}} }} \)

  1. n is the number of frames in a utterance