Fig. 3From: Three-stage training and orthogonality regularization for spoken language recognitionThree types of features fed into LID. a End-to-end LID tends to learn hybrid-level knowledge, which includes noise. b A pretrained frozen ASR encoder can only provide phonetic features for LID. c Ideally, a pretrained unfrozen ASR encoder should provide noiseless hybrid-level featuresBack to article page