Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

Yen, Benjamin; Hioka, Yusuke

doi:10.1186/s13636-020-00181-5

EURASIP Journal on Audio, Speech, and Music Processing

Table 5 Haversine distance error performance comparison (task 3—flying UAV, speech sound source)

From: Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

	Haversine distance errorD (rad)							p value: paired samplet test (ref. best case)
	Mean	Median	Min	Max	0.25 quartile	0.75 quartile	RMSE
Baseline
GCC-PHAT (max)	1.143	0.9486	0.00440	3.039	0.5444	1.680	1.362	1.81 ×10⁻¹⁷
GCC-PHAT (sum)	1.375	1.2730	0.02862	3.039	0.7456	1.920	1.582	2.11 ×10⁻²⁹
GCC-NONLIN (max)	1.022	0.8480	0.00440	2.912	0.4434	1.638	1.233	2.13 ×10⁻¹¹
GCC-NONLIN (sum)	1.206	1.0604	0.02862	2.971	0.7088	1.665	1.390	8.98 ×10⁻²¹
MVDR (max)	0.913	0.8164	0.00708	2.993	0.3166	1.515	1.138	5.83 ×10⁻⁷
MVDR (sum)	1.045	0.8897	0.01019	2.501	0.5146	1.680	1.232	2.39 ×10⁻¹⁸
DS (max)	0.881	0.7378	0.00800	3.015	0.2789	1.361	1.141	1.27 ×10⁻⁴
DS (sum)	1.083	0.8598	0.01492	3.024	0.4927	1.668	1.328	2.47 ×10⁻¹⁶
DNM (max)	1.088	0.9123	0.00437	3.015	0.5015	1.674	1.300	2.31 ×10⁻¹⁶
DNM (sum)	1.160	0.9798	0.03962	3.056	0.6410	1.698	1.356	1.19 ×10⁻²⁰
w/ [28] T-F mask
GCC-PHAT (max)	1.277	1.2783	0.07181	2.776	0.6593	1.824	1.441	1.18 ×10⁻³¹
GCC-PHAT (sum)	1.247	1.2153	0.07979	2.827	0.7090	1.735	1.398	1.06 ×10⁻²⁸
w/ [30] T-F mask
DS (max)	1.052	0.9775	0.01399	2.540	0.4983	1.602	1.243	3.05 ×10⁻¹⁷
DS (sum)	1.097	1.0249	0.01038	2.540	0.5042	1.644	1.281	7.08 ×10⁻²⁰
w/ SNR response scaling
GCC-PHAT (max)	1.207	1.1717	0.07321	2.669	0.6054	1.811	1.390	5.77 ×10⁻²²
GCC-PHAT (sum)	1.325	1.2988	0.11448	2.989	0.7246	1.866	1.501	3.25 ×10⁻³⁰
GCC-NONLIN (max)	1.170	1.1192	0.07740	2.633	0.5953	1.711	1.347	2.85 ×10⁻²¹
GCC-NONLIN (sum)	1.250	1.2045	0.06879	2.956	0.7501	1.761	1.404	1.75 ×10⁻²⁹
MVDR (max)	1.045	0.9847	0.02859	2.472	0.5349	1.596	1.204	2.97 ×10⁻¹⁶
MVDR (sum)	1.103	1.0282	0.02711	2.729	0.6316	1.594	1.253	1.22 ×10⁻²²
DS (max)	0.999	0.8668	0.03225	2.583	0.4278	1.626	1.198	1.15 ×10⁻¹¹
DS (sum)	1.115	1.0375	0.01969	2.991	0.5803	1.722	1.292	4.33 ×10⁻²¹
DNM (max)	1.174	1.2215	0.02945	2.613	0.6012	1.728	1.345	2.37 ×10⁻²³
DNM (sum)	1.160	1.0620	0.04191	2.740	0.6189	1.724	1.323	5.48 ×10⁻²⁴
w/ RPSL post-processing
GCC-PHAT (max)	1.129	1.0451	0.03077	2.691	0.6323	1.547	1.282	3.22 ×10⁻²²
GCC-PHAT (sum)	1.294	1.1847	0.03292	2.877	0.7323	1.769	1.458	1.44 ×10⁻³¹
GCC-NONLIN (max)	1.083	1.0325	0.00474	2.543	0.5527	1.568	1.265	1.66 ×10⁻¹⁷
GCC-NONLIN (sum)	1.093	1.0342	0.00901	2.644	0.6126	1.505	1.251	4.22 ×10⁻²⁰
MVDR (max)	0.826	0.6226	0.02069	2.382	0.2362	1.476	1.063	3.07 ×10⁻⁵
MVDR (sum)	0.864	0.6437	0.02214	2.250	0.3550	1.583	1.078	4.67 ×10⁻⁸
DS (max)	0.706	0.4435	0.02207	2.424	0.1956	1.176	0.962	n.s.
DS (sum)	0.850	0.6395	0.02145	2.501	0.3336	1.325	1.071	1.28 ×10⁻⁶
DNM (max)	0.982	0.8109	0.01968	2.444	0.4646	1.441	1.167	1.33 ×10⁻¹⁴
DNM (sum)	0.980	0.7641	0.03962	2.551	0.4765	1.434	1.169	1.81 ×10⁻¹⁵
w/ [28] T-F mask + RPSL post-processing
GCC-PHAT (max)	1.167	1.0996	0.09304	2.617	0.6643	1.607	1.316	8.90 ×10⁻²⁶
GCC-PHAT (sum)	1.285	1.1122	0.04744	2.912	0.6868	1.751	1.473	4.50 ×10⁻³⁰
w/ [30] T-F mask + RPSL post-processing
DS (max) (best case)	0.684	0.4362	0.00038	2.593	0.1827	0.937	0.951	N/A
DS (sum)	0.786	0.5264	0.02560	2.452	0.2546	1.288	1.015	2.66 ×10⁻³
w/ SNR response scaling + RPSL post-processing
GCC-PHAT (max)	1.067	0.9980	0.07649	2.852	0.5852	1.515	1.241	5.90 ×10⁻¹⁸
GCC-PHAT (sum)	1.202	1.1322	0.06775	2.945	0.6837	1.696	1.373	4.18 ×10⁻²³
GCC-NONLIN (max)	0.893	0.6941	0.01799	2.408	0.4562	1.208	1.082	1.40 ×10⁻⁷
GCC-NONLIN (sum)	1.066	0.9414	0.07722	2.510	0.5493	1.565	1.236	2.76 ×10⁻¹⁹
MVDR (max)	0.770	0.5080	0.01199	2.530	0.2716	1.207	0.996	n.s.
MVDR (sum)	0.984	0.8222	0.03901	2.297	0.4840	1.558	1.167	2.44 ×10⁻¹⁵
DS (max)	0.759	0.5406	0.01738	2.344	0.2379	1.268	0.996	n.s.
DS (sum)	0.753	0.5300	0.02859	2.311	0.2693	1.094	0.978	n.s.
DNM (max)	0.996	0.8491	0.04045	2.451	0.4684	1.520	1.189	9.52 ×10⁻¹⁵
DNM (sum)	0.957	0.8381	0.05366	2.303	0.4342	1.371	1.142	1.19 ×10⁻¹²

Results from the baseline method are first presented, followed by results using the T-F mask from [28] and [30] and the proposed method (SNR response scaling and RPSL). Best-performing numericals for each category are highlighted in bold

Back to article page