Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

Yen, Benjamin; Hioka, Yusuke

doi:10.1186/s13636-020-00181-5

EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Haversine distance error performance comparison (task 1—hovering UAV)

From: Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

	Haversine distance errorD (rad)							p value: paired samplet test (ref. best case)
	Mean	Median	Min	Max	0.25 quartile	0.75 quartile	RMSE
Baseline
GCC-PHAT (max)	0.1541	0.04908	0	2.674	0.01745	0.0849	0.4384	6.86 ×10⁻⁴
GCC-PHAT (sum)	0.4637	0.07804	0	2.347	0.03491	0.7808	0.8073	3.42 ×10⁻²¹
GCC-NONLIN (max)	0.1305	0.03903	0	2.736	0.01745	0.0752	0.3887	n.s.
GCC-NONLIN (sum)	0.4528	0.07376	0	2.347	0.02468	0.7160	0.8010	2.08 ×10⁻²⁰
MVDR (max)	0.1774	0.03903	0	2.947	0.02369	0.1047	0.4471	1.18 ×10⁻⁵
MVDR (sum)	0.3738	0.05794	0	2.222	0.02314	0.2934	0.7081	3.91 ×10⁻¹⁷
DS (max)	0.1976	0.03801	0	2.793	0.01745	0.1196	0.5143	4.81 ×10⁻⁶
DS (sum)	0.2935	0.04637	0	2.722	0.01745	0.1795	0.6218	4.71 ×10⁻¹²
DNM (max)	0.2535	0.03491	0	3.018	0.01745	0.0837	0.6435	3.14 ×10⁻⁷
DNM (sum)	0.3811	0.05456	0	2.806	0.01745	0.1988	0.8074	1.87 ×10⁻¹³
w/ [28] T-F mask
GCC-PHAT (max)	0.1356	0.03903	0	2.818	0.01745	0.0837	0.4064	n.s.
GCC-PHAT (sum)	0.5164	0.08382	0	2.742	0.03654	1.0378	0.8776	1.48 ×10⁻²³
w/ [30] T-F mask
GCC-PHAT (max)	0.2155	0.05236	0	2.605	0.01745	0.1018	0.4933	1.10 ×10⁻⁷
GCC-PHAT (sum)	0.4162	0.08372	0	2.347	0.02468	0.7080	0.7424	1.16 ×10⁻¹⁹
GCC-NONLIN (max)	0.2411	0.05236	0	2.605	0.01745	0.1162	0.5308	3.13 ×10⁻⁹
GCC-NONLIN (sum)	0.3837	0.06981	0	2.347	0.02429	0.5923	0.7012	3.90 ×10⁻¹⁸
MVDR (max)	0.1806	0.05504	0	1.941	0.03491	0.1101	0.4154	1.39 ×10⁻⁷
MVDR (sum)	0.2702	0.05058	0	2.065	0.02427	0.1711	0.5633	4.44 ×10⁻¹²
DS (max)	0.1897	0.06292	0	2.403	0.02030	0.1589	0.4344	2.11 ×10⁻⁷
DS (sum)	0.1792	0.03796	0	2.110	0.01745	0.1180	0.4095	1.64 ×10⁻⁷
DNM (max)	0.2586	0.03504	0	2.986	0.01745	0.1064	0.6273	2.35 ×10⁻⁸
DNM (sum)	0.2113	0.03810	0	2.657	0.01745	0.1144	0.4941	4.24 ×10⁻⁹
w/ SNR response scaling
GCC-PHAT (max)	0.1278	0.03903	0	2.488	0.01745	0.0780	0.3732	n.s.
GCC-PHAT (sum)	0.4997	0.09259	0	3.031	0.03903	0.9861	0.8501	7.02 ×10⁻²⁴
GCC-NONLIN (max)	0.0961	0.03880	0	1.897	0.01745	0.0719	0.2596	n.s.
GCC-NONLIN (sum)	0.4575	0.08899	0	2.756	0.03654	0.6880	0.8133	1.16 ×10⁻²⁰
MVDR (max) (best case)	0.0833	0.03903	0	1.886	0.02424	0.0715	0.2269	N/A
MVDR (sum)	0.3263	0.05236	0	2.418	0.02468	0.2067	0.6420	1.10 ×10⁻¹⁴
DS (max)	0.0975	0.03803	0	1.876	0.01745	0.0719	0.2790	n.s.
DS (sum)	0.3275	0.05504	0	3.031	0.03491	0.2080	0.7017	2.99 ×10⁻¹²
DNM (max)	0.1323	0.03491	0	2.409	0.01745	0.0698	0.3893	n.s.
DNM (sum)	0.3515	0.05504	0	3.031	0.02468	0.1896	0.7498	2.16 ×10⁻¹²

Results from the baseline method are first presented, followed by results using the T-F mask from [28] and [30] and the proposed method (SNR response scaling). Best-performing numericals for each category are highlighted in bold

Back to article page