Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

Yen, Benjamin; Hioka, Yusuke

doi:10.1186/s13636-020-00181-5

EURASIP Journal on Audio, Speech, and Music Processing

Table 4 Haversine distance error performance comparison (task 2—flying UAV, broadband sound source)

From: Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

	Haversine distance errorD (rad)							p value: paired samplet test (ref. best case)
	Mean	Median	Min	Max	0.25 quartile	0.75 quartile	RMSE
Baseline
GCC-PHAT (max)	0.0868	0.07004	0.002181	2.350	0.04567	0.1035	0.1662	n.s.
GCC-PHAT (sum)	0.0810	0.06459	0.004032	2.457	0.04299	0.0918	0.1690	n.s.
GCC-NONLIN (max)	0.0905	0.06951	0.002702	2.350	0.04345	0.1008	0.1746	n.s.
GCC-NONLIN (sum)	0.0811	0.06633	0.004032	0.686	0.04325	0.0936	0.1155	n.s.
MVDR (max)	0.1690	0.08072	0.004032	2.490	0.04379	0.1199	0.3743	7.88 ×10⁻⁶
MVDR (sum)	0.1590	0.07550	0.003670	1.823	0.04377	0.1148	0.3633	5.15 ×10⁻⁵
DS (max)	0.2293	0.08965	0.004032	2.436	0.05588	0.1365	0.4655	1.10 ×10⁻⁹
DS (sum)	0.1648	0.07734	0.002618	2.465	0.04694	0.1169	0.3625	1.21 ×10⁻⁵
DNM (max)	0.2619	0.09346	0.004032	2.436	0.05745	0.1381	0.5411	2.02 ×10⁻¹⁰
DNM (sum)	0.1563	0.07733	0.003670	2.807	0.04490	0.1130	0.3910	2.62 ×10⁻⁴
w/ [28] T-F mask
GCC-PHAT (max)	0.1816	0.07557	0.002919	2.029	0.04480	0.1733	0.3429	1.78 ×10⁻¹¹
GCC-PHAT (sum)	0.1078	0.06546	0.005047	2.490	0.04372	0.1000	0.2203	5.32 ×10⁻⁶
w/ [30] T-F mask
GCC-PHAT (max)	0.2472	0.09462	0.002449	2.558	0.05509	0.2263	0.4670	2.35 ×10⁻¹⁴
GCC-PHAT (sum)	0.1285	0.07141	0.004966	2.492	0.04797	0.1124	0.2464	4.34 ×10⁻⁹
w/ SNR response scaling
GCC-PHAT (max)	0.1335	0.07025	0.005124	2.259	0.03684	0.1176	0.2813	6.94 ×10⁻⁶
GCC-PHAT (sum)	0.1010	0.06524	0.007256	2.501	0.03983	0.1105	0.1973	1.48 ×10⁻⁶
GCC-NONLIN (max)	0.1227	0.07184	0.006535	2.254	0.04138	0.1196	0.2403	1.46 ×10⁻⁴
GCC-NONLIN (sum)	0.1101	0.06766	0.004056	2.495	0.03898	0.1116	0.2240	6.54 ×10⁻⁶
MVDR (max)	0.2669	0.11614	0.002071	2.498	0.05972	0.3117	0.4617	7.10 ×10⁻¹⁹
MVDR (sum)	0.2468	0.11260	0.002633	2.475	0.05254	0.1940	0.4403	9.62 ×10⁻¹⁷
DS (max)	0.2333	0.10711	0.004002	2.501	0.05783	0.1973	0.4350	1.37 ×10⁻¹⁴
DS (sum)	0.1957	0.09952	0.007290	2.501	0.04939	0.1626	0.3760	2.86 ×10⁻¹²
DNM (max)	0.2448	0.10371	0.000610	2.219	0.05417	0.1925	0.4523	1.69 ×10⁻¹⁴
DNM (sum)	0.1967	0.09452	0.003334	2.478	0.05081	0.1501	0.3783	2.13 ×10⁻¹²
w/ RPSL post-processing
GCC-PHAT (max)	0.0922	0.06516	0.003577	1.936	0.04173	0.0930	0.1844	n.s.
GCC-PHAT (sum) (best case)	0.0746	0.05987	0.004076	2.490	0.04177	0.0852	0.1622	N/A
GCC-NONLIN (max)	0.0965	0.06428	0.003356	1.937	0.03727	0.0962	0.1927	3.34 ×10⁻³
GCC-NONLIN (sum)	0.0805	0.06190	0.002988	2.484	0.03913	0.0900	0.1706	n.s.
MVDR (max)	0.1613	0.07477	0.001200	2.466	0.04130	0.1186	0.3334	7.74 ×10⁻⁹
MVDR (sum)	0.1244	0.07330	0.003783	2.478	0.04414	0.1089	0.2646	3.56 ×10⁻⁶
DS (max)	0.1619	0.07810	0.005783	2.481	0.04922	0.1179	0.3559	2.41 ×10⁻⁷
DS (sum)	0.1689	0.07352	0.002433	2.484	0.04421	0.1159	0.3812	3.06 ×10⁻⁷
DNM (max)	0.1810	0.07726	0.004760	2.661	0.04732	0.1266	0.4117	2.86 ×10⁻⁷
DNM (sum)	0.1777	0.07683	0.004642	2.475	0.04434	0.1205	0.4284	1.12 ×10⁻⁶
w/ [28] T-F mask + RPSL post-processing
GCC-PHAT (max)	0.0845	0.06388	0.002919	2.490	0.04131	0.0944	0.1750	3.10 ×10⁻³
GCC-PHAT (sum)	0.0743	0.06234	0.005047	2.490	0.04126	0.0851	0.1621	n.s.
w/ [30] T-F mask + RPSL post-processing
GCC-PHAT (max)	0.1125	0.06586	0.002449	1.905	0.04632	0.1076	0.2055	4.39 ×10⁻⁶
GCC-PHAT (sum)	0.1038	0.06491	0.004966	2.492	0.04532	0.0944	0.2311	1.03 ×10⁻³
w/ SNR response scaling + RPSL post-processing
GCC-PHAT (max)	0.0979	0.06657	0.005124	2.259	0.03562	0.1070	0.2145	n.s.
GCC-PHAT (sum)	0.0827	0.05976	0.007256	2.501	0.03822	0.1043	0.1704	2.03 ×10⁻³
GCC-NONLIN (max)	0.0954	0.06629	0.006180	2.254	0.03947	0.1096	0.1854	n.s.
GCC-NONLIN (sum)	0.0971	0.06379	0.004056	2.495	0.03823	0.1076	0.2084	1.20 ×10⁻³
MVDR (max)	0.1476	0.09110	0.002071	1.935	0.05083	0.1392	0.2560	2.79 ×10⁻¹¹
MVDR (sum)	0.1590	0.09449	0.002633	2.475	0.04783	0.1377	0.3062	2.02 ×10⁻¹⁰
DS (max)	0.1321	0.08498	0.004002	2.501	0.04730	0.1347	0.2426	2.21 ×10⁻¹⁰
DS (sum)	0.1208	0.08031	0.007290	2.501	0.04661	0.1321	0.2312	2.51 ×10⁻⁸
DNM (max)	0.1693	0.09204	0.000610	1.948	0.05196	0.1463	0.3022	5.16 ×10⁻¹²
DNM (sum)	0.1429	0.08071	0.003334	2.478	0.04672	0.1352	0.2823	4.35 ×10⁻⁹

Results from the baseline method are first presented, followed by results using the T-F mask from [28] and [30] and the proposed method (SNR response scaling and RPSL). Best-performing numericals for each category are highlighted in bold

Back to article page