Points2Sound: from mono to binaural audio using 3D point cloud scenes

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Quantitative results of Points2Sound and Mono2Binaural. For each method, we report the performance depending on the number of sources (\(N = 1,2,3\)) and the type of 3D point cloud attributes (depth or rgb-depth), based on the average of the evaluation metrics. Average values for any number of sources are given by\(\overline{\mathrm {d}_\mathrm {ENV}}\)and\(\overline{\mathrm {d}_\mathrm {STFT}}\)

	Visual features	\(\mathrm {d}_\mathrm {ENV} \downarrow\)			\(\mathrm {d}_\mathrm {STFT} \downarrow\)			\(\overline{\mathrm {d}_\mathrm {ENV}} \downarrow\)	\(\overline{\mathrm {d}_\mathrm {STFT}} \downarrow\)
		1	2	3	1	2	3
Mono2Binaural [7]	Depth	0.038	0.101	0.132	0.192	1.300	2.029	0.090	1.174
Mono2Binaural [7]	rgb-depth	0.036	0.094	0.126	0.213	1.142	1.858	0.085	1.071
Points2Sound	Depth	0.016	0.080	0.122	0.082	0.885	1.736	0.072	0.901
Points2Sound	rgb-depth	0.015	0.073	0.114	0.099	0.762	1.521	0.067	0.794