Table 8 ASR performance vs. hardware requirements trade-off

From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition

  1. Performance is expressed in terms of the word error rate obtained on LibriSpeech test-clean dataset (lower is better). Hardware requirements are expressed in terms of memory load, in Mega bytes (MB), and minimum throughput, in Giga operations per second (GOPS). Note that for memory load we only took into account the amount needed to load the neural model and store all the activations of the network for processing 1 second of speech. More memory might be needed for other components, such as the language model etc. For throughput we only considered the operations required to pass the speech through the network. More operations might be needed for other processes, such as language rescoring etc