Skip to main content

Table 1 Definition and gradient of four pooling functions

From: Neural network-based non-intrusive speech quality assessment using attention pooling function

Pooling function Definition Gradient
Max pooling \(y = \mathop {\max }\limits _{i} {y_{i}}\) \(\frac {{\partial y}}{{\partial {y_{i}}}} = \left \{ \begin {array}{l} 1,i = \mathop {\arg \max }\limits _{i} {y_{i}}\\ 0,{\text {else}} \end {array} \right.\)
Average pooling \({y} = \frac {1}{n}{\Sigma _{i}}{y_{i}}\) \(\frac {{\partial y}}{{\partial {y_{i}}}} = \frac {1}{n}\)
Linear softmax \({y = \frac {{{\Sigma _{i}}{{\left ({{y_{i}}} \right)}^{2}}}}{{{\Sigma _{i}}{y_{i}}}}}\) \(\frac {{\partial y}}{{\partial {y_{i}}}} = \frac {{2{y_{i}} - y}}{{{\sum \nolimits }_{j} {{y_{j}}} }} \)
Attention \({y = \frac {{{\Sigma _{i}}{y_{i}} {w_{i}}}}{{{\Sigma _{i}}{w_{i}}}}}\) \(\frac {{\partial y}}{{\partial {y_{i}}}} = \frac {{{w_{i}}}}{{{\sum \nolimits }_{j} {{w_{j}}}}},\frac {{\partial y}}{{\partial {w_{i}}}} = \frac {{{y_{i}} - y}}{{{\sum \nolimits }_{j} {{w_{j}}} }} \)
  1. n is the number of frames in a utterance