Skip to main content

Table 1 Hyperparameters

From: Dual input neural networks for positional sound source localization

Parameter

Value

Num. parameters (DI-NN)

3.5M

Num. conv. kernels

64, 128, 256, 512

Conv. kernel size

2x2

Conv. layer pooling size

2x2

GRU output size

256

Metadata fusion net. layer out. sizes

\(512 + N_{\phi }\), 2

Metadata embedding layer out. sizes

\(2 N_{\phi }\), \(N_{\phi }\)

Activation func. last layer

None

Activation func. other layers

Rectified Linear Unit (ReLU)

Num. Discrete Fourier Transform (DFT) bins (for STFT)

1024

DFT hop length (for STFT)

512

Input duration

0.5 secs.

Sampling rate

16kHz

Grid resolution of LS method

2 cm

Learning rate

0.0005

Batch size

32

Num. epochs

40

Batch normalization [44]

Only after conv. layers

Optimizer

Adam [45]