Skip to main content
Figure 1 | EURASIP Journal on Audio, Speech, and Music Processing

Figure 1

From: Multi-candidate missing data imputation for robust speech recognition

Figure 1

The block diagram of the MC MDT system. The noisy spectrum is extracted and the missing data mask is estimated from the speech waveform. Two layers of CG, namely the CG and the CCG are trained before decoding. For each frame, the imputation is performed firstly on the CCGs using the mask and the noisy observation. Their likelihoods are calculated on the corresponding imputed spectrum. A shortlist of selected CGs is derived from these likelihoods. The imputed clean spectrum of each selected CG is then calculated based again on the noisy observation and the mask. The likelihoods of the CGs are calculated on their imputed spectra and used to determine which BGs are worth calculating. Each selected BG is evaluated on multiple related imputed spectra proposed by the CGs and the largest likelihood is retained as the final BG likelihood. The beam search can then proceed as in a conventional HMM system, integrating acoustical evidence, lexical information and a language model to generate the recognition result.

Back to article page