Fig. 4From: Wise teachers train better DNN acoustic modelsDistribution of singular values of hidden layer weights. RIC shown for 6 ×2048 and 5 ×512 networks trained with hard and soft targets. Soft target-trained networks have a slower decay in singular values, requiring more singular values to be retained for a given value of RIC compared to hard target-trained networksBack to article page