Fig. 5
From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Facebook Wav2Letter networks: the fully convolutional architecture with ASG loss function (left) and the encoder-decoder with time-depth separable (TDS) blocks (right)