Skip to main content
Fig. 4 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 4

From: Language agnostic missing subtitle detection

Fig. 4

Proposed method for combined VAD and AC inference and the algorithm to identify missing subtitle blocks. Consider the dialogue at the start which consists of a caption (Simon Breathes) and speech following it. However, this dialogue is missing from the subtitle file. To identify the true speech timings, we divide the audio in 800 ms (with 90% overlap) and 2 s clips (with no overlap) and pass them to VAD and AC models respectively. Following the VAD and AC timing generation step for the clips, we perform a logical AND between the timings and generate the refined predicted speech blocks. VAD can potentially identify the caption (Simon Breathes) as a speech block. The time duration associated with the caption is identified by the AC model and is removed from the VAD’s timing to generate the correct timings. We then compare the timings of predicted refined speech block to the timings present in the subtitle blocks and predict the missing subtitle blocks

Back to article page