Skip to main content
Fig. 1 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 1

From: Multi-rate modulation encoding via unsupervised learning for audio event detection

Fig. 1

a Waveform and time-frequency spectrogram of example speech signal. A cross section at \(F=850\) Hz shows the changes in spectral energy near that frequency band and highlights the syllabic rate of the speech. b Spectrotemporal modulation profile of the speech utterance highlights temporal peaks near 4 Hz, which are commensurate with the peaks observed in the cross-section. c Spectrotemporal modulation profiles for three different audio classes from the DESED dataset. The x-axis represents temporal modulations that reflect how fast sound dynamics unfold over time (in units of Hz). The y-axis represents spectral modulations that indicate the spectral spread of energy of the frequency profile of the sound event (in units of cycles/octave). Note that the speech profile shown in c reflects an average over a large number of speech utterances, while the profile shown in b is an example derived from one signal

Back to article page