Fig. 1From: Three-stage training and orthogonality regularization for spoken language recognitiona The end-to-end LID architecture. It directly gets utterance-level LID decisions from frame-level acoustic features. b The ASR-LID parallel branches architecture. The shared encoder trained with ASR-loss can produce frame-level phonetic information, which helps to improve the LID performance dramaticallyBack to article page