Fig. 1

Pair of teacher and student networks. Predictions from the teacher network are used as training targets for an easier-to-evaluate student network using a large amount of unlabeled data. Teacher and student networks may have different input representations, sizes, and architectures