Skip to main content
Fig. 2 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 2

From: Points2Sound: from mono to binaural audio using 3D point cloud scenes

Fig. 2

Overview diagram of Points2Sound. It consists of a sparse Resnet18 network for visual analysis and a Demucs network for binaural audio synthesis. The vision network extracts a visual feature \(\mathbf {h}\) from the 3D point cloud. Then, this visual feature serves to condition the audio network to generate a binaural version from the mono audio that matches the visual counterpart. Both networks are jointly optimized during the training of the model

Back to article page