From: Points2Sound: from mono to binaural audio using 3D point cloud scenes
\(\mathrm {d}_{\mathrm {ENV}} \downarrow\) | \(\mathrm {d}_{\mathrm {STFT}} \downarrow\) | \(\overline{\mathrm {d}_{\mathrm {ENV}}} \downarrow\) | \(\overline{\mathrm {d}_{\mathrm {STFT}}} \downarrow\) | |||||
---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | |||
\(s_{m}\)(true mono) | ||||||||
Mono-Mono | 0.387 | 0.403 | 0.388 | 26.719 | 26.414 | 26.747 | 0.392 | 26.626 |
Rotated-Visual | 0.232 | 0.285 | 0.305 | 9.002 | 10.588 | 12.016 | 0.274 | 10.535 |
Points2Sound (\(\mathcal {L}_{\mathrm {full}}\)) | 0.173 | 0.248 | 0.280 | 3.297 | 6.645 | 9.080 | 0.233 | 6.340 |
\(s_{m} = s_{b}^L\) | ||||||||
Mono-Mono | 0.148 | 0.155 | 0.159 | 7.472 | 6.997 | 6.951 | 0.154 | 7.14 |
Rotated-Visual | 0.165 | 0.166 | 0.165 | 7.610 | 6.808 | 6.345 | 0.165 | 6.921 |
Points2Sound (\(\mathcal {L}_{\mathrm {full}}\)) | 0.054 | 0.103 | 0.130 | 0.636 | 1.820 | 2.604 | 0.095 | 1.686 |
\(s_{m} = s_{b}^L+s_{b}^R\) | ||||||||
Mono-Mono | 0.142 | 0.166 | 0.178 | 4.046 | 4.112 | 4.058 | 0.162 | 4.072 |
Rotated-Visual | 0.166 | 0.192 | 0.209 | 5.663 | 5.918 | 6.031 | 0.189 | 5.870 |
Points2Sound (\(\mathcal {L}_{\mathrm {full}}\)) | 0.015 | 0.073 | 0.114 | 0.099 | 0.762 | 1.521 | 0.067 | 0.794 |
Points2Sound (\(\mathcal {L}_{\mathrm {full}}\)) (only-depth) | 0.016 | 0.080 | 0.122 | 0.082 | 0.885 | 1.736 | 0.072 | 0.901 |
Points2Sound (\(\mathcal {L}_{\mathrm {diff}}\)) | 0.015 | 0.090 | 0.125 | 0.153 | 1.205 | 1.832 | 0.076 | 1.063 |