Skip to main content
Figure 1 | EURASIP Journal on Audio, Speech, and Music Processing

Figure 1

From: SIFT-based local spectrogram image descriptor: a novel feature for robust music identification

Figure 1

Relationships between audio manipulations and corresponding spectrogram image transformations. (a) is the spectrogram image of an original 10-s music clip. From the second row, the leftmost column displays spectrogram images of four audio excerpts distorted from the original clip: (b1) −20% time stretching, (c1) +20% time stretching, (d1) −30% pitch shifting, and (e1) +30% pitch shifting. The middle column displays corresponding images after spectrogram image (a) is modified with image transformations: (b2) 20% time-axis shortening, (c2) 20% time-axis lengthening, (d2) six frequency bins downshifting, and (e2) five frequency bins upshifting. The rightmost column (b3, c3, d3, e3) accordingly illustrates the differences between corresponding sub-figures of the leftmost and the middle columns. Note that warmer colors represent larger spectral differences while cooler colors represent smaller ones.

Back to article page