Table 2 Word recognition accuracy [%] for each method on the source domain

From: Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation

  # Utterance/word
Model 250 500
Baseline 48.21 54.62
Proposed 50.06 (86.65) 55.07 (90.51)
  1. #Utterance/word indicates the number of utterances per word used to train the model
  2. The value in parentheses shows the accuracy of the audio model