MYRiAD: a multi-array room acoustic database

In the development of acoustic signal processing algorithms, their evaluation in various acoustic environments is of utmost importance. In order to advance evaluation in realistic and reproducible scenarios, several high-quality acoustic databases have been developed over the years. In this paper, we present another complementary database of acoustic recordings, referred to as the Multi-arraY Room Acoustic Database (MYRiAD). The MYRiAD database is unique in its diversity of microphone configurations suiting a wide range of enhancement and reproduction applications (such as assistive hearing, teleconferencing, or sound zoning), the acoustics of the two recording spaces, and the variety of contained signals including 1214 room impulse responses (RIRs), reproduced speech, music, and stationary noise, as well as recordings of live cocktail parties held in both rooms. The microphone configurations comprise a dummy head (DH) with in-ear omnidirectional microphones, two behind-the-ear (BTE) pieces equipped with 2 omnidirectional microphones each, 5 external omnidirectional microphones (XMs), and two concentric circular microphone arrays (CMAs) consisting of 12 omnidirectional microphones in total. The two recording spaces, namely the SONORA Audio Laboratory (SAL) and the Alamire Interactive Laboratory (AIL), have reverberation times of 2.1 s and 0.5 s, respectively. Audio signals were reproduced using 10 movable loudspeakers in the SAL and a built-in array of 24 loudspeakers in the AIL. MATLAB and Python scripts are included for accessing the signals as well as microphone and loudspeaker coordinates. The database is publicly available (https://zenodo.org/record/7389996).


Introduction
Acoustic signal processing using multiple microphones has received significant attention due to its fundamental role in a number of applications such as assistive hearing with hearing aids or cochlear implants, teleconferencing, hands-free telephony, voice-controlled devices, spatial audio reproduction, and sound-zoning, just to name a few.Some of the specific tasks which can be accomplished with acoustic signal processing include speech enhancement and speech dereverberation [2][3][4][5][6][7][8][9], room parameter estimation [10], acoustic echo and feedback cancellation [11,12], source localisation [3,6,13], audio source separation [8,9], sound field control [14,15], and automatic speech recognition [16], all of which are pertinent to the aforementioned applications.One of the core phases in the development of acoustic signal processing algorithms is that of the evaluation phase, where the performance of a newly developed algorithm is compared to that of existing algorithms in various acoustic environments which are relevant for the application at hand.This is clearly challenging because the laboratory conditions under which the algorithm is evaluated rarely match the real-world conditions where the algorithm must perform.Additionally, recorded audio signals with the target microphone configurations and specified acoustic scenarios may be unavailable, resulting in the use of simulated data for evaluation.Although simulated data can be useful in the evaluation of initial proof of concept ideas, it does not necessarily provide accurate indication whether the algorithm will perform well in real-world conditions.In an effort to overcome these challenges and to encourage the use of more realistic data, several high-quality acoustic databases containing room impulse responses (RIRs) [7,10,[17][18][19][20][21][22][23][24][25][26][27][28], speech [7,10,11,16,21,23,24], music [21], and babble or cocktail party noise [23,29,30] have been developed over the years, which have played an important role in building confidence in the real-world performance of various acoustic signal processing algorithms.
In this paper, we present another complementary database of acoustic recordings from multiple microphones in various acoustic scenarios, referred to as the Multi-arraY Room Acoustic Database (MYRiAD).In comparison to the existing databases, the MYRiAD database is unique in its diversity of the employed microphone configurations suiting a wide range of applications, the acoustics of the recording spaces, and the variety of signals contained in the database, which includes RIRs, recordings of reproduced speech, music, and stationary noise, as well as recordings of live cocktail parties.
The database consists specifically of two different microphone configurations used across two different rooms.The first microphone configuration consists of a dummy head (DH) with in-ear omnidirectional microphones, two behind-the-ear (BTE) pieces mounted on the DH, each equipped with 2 omnidirectional microphones, [1] as well as 5 external omnidirectional microphones (XMs) located at various distances and angles from the DH. [2]This microphone configuration will be referred to as M1.The second microphone configuration consists of two concentric circular microphone arrays (CMAs) with in total 12 omnidirectional microphones, [3] which will be referred to as M2.The two different rooms where audio recordings were made are: (i) the SONORA Audio Laboratory [36] located at the Depeartment of Electrical Engineering (ESAT-STADIUS), KU Leuven, Belgium, which we will refer to as the SAL, and (ii) the Alamire Interactive Laboratory [36] located at the Park Abbey in Heverlee, Belgium, referred to as the AIL.The main acoustical difference between these two rooms is that the SAL is significantly more reverberant than the AIL, with reverberation times of 2.1 s and 0.5 s, respectively.In the SAL, the microphone configuration M1 was used in one position, and in the AIL, a combination of microphone configurations M1 and M2 was used in two positions.In terms of sound generation, 10 different movable loudspeakers were used as artificial sound sources in the SAL, while the AIL has been equipped with an array of 24 loudspeakers. [1]BTE pieces are commonly used for hearing aids or cochlear implant devices.There is no additional processing done on the microphone signals before arriving to the data acquisition system. [2]A typical use-case of such XMs consists in providing additional information to improve the enhancement of BTE signals [31][32][33]. [3]Among others, use-cases of CMAs include signal enhancement [34] and localization [35], for instance in smart speakers, and sound zoning [14,15].The following audio signals were played back through the speakers and recorded by the microphones: exponential sine sweeps used to compute RIRs [37] between source and microphone positions, resulting in 110 RIRs for the SAL and 1104 RIRs for the AIL, as well as three male speeches [38], three female speeches [38], a drum beat [39], a piano piece [40], and speech-shaped stationary noise.Additionally, in both rooms, several participants were invited to re-create a live cocktail party scenario.The resulting noise from the different cocktail parties held at each of the spaces was recorded for both microphone configurations.
In total, the MYRiAD database contains 76 hours of audio data sampled at 44.1 kHz in 24 bit, which results in 36.2GB.All computed RIRs and recorded signals are available in the database and can be downloaded [1].MATLAB and Python scripts are included in the database for accessing the signals and corresponding microphone and loudspeaker coordinates.
The remaining sections of this paper provide a detailed overview of the database and are organised as follows.In Sec. 2, an overview of the two different rooms, the SAL and the AIL, is presented.In Sec.  a detailed description is given of the equipment used.In Sec. 4, the microphone and loudspeaker configurations within the two rooms are discussed.In Sec. 5, an overview is given of the recorded signals, details of the cocktail party, and the computed RIRs.In Sec. 6, practical instructions for using the database are provided, along with a description of relevant MATLAB and Python scripts, and some examples from the database are illustrated.In Sec. 7, the database is briefly summarised.

Room description
In this section, we provide a brief overview on the characteristics of the two recording rooms.The SAL is described in Sec.2.1 and the AIL is described in Sec.2.2.

SONORA Audio Laboratory (SAL)
The SAL [36] is located at the Department of Electrical Engineering (ESAT-STADIUS), KU Leuven, Heverlee, Belgium.Fig. 1 shows a fisheye view and Fig. 6 shows a floor plan of the L-shaped SAL with approximate dimensions.The height of the room is 3.75 m, yielding a volume of approximately 102 m 3 .The walls and ceiling are made of plasterboard covering mineral wool, while the floor is made of concrete covered with vinyl.Two windows, each of 4 m 2 are located on one side of the room.Adjacent to the recording room, separated by glass of area 6.5 m 2 , is the control room, where all the acquisition equipment and a computer are located.
From the RIRs measured in the SAL, we estimated the reverberation time T 20 to be 2.1 s as described in Sec.   2. The radial grid spacing of the polar plot is 0.25 m.The DH is placed at a height of approximately 1.3 m ear level from the floor and all XMs are placed at a height approximately 1 m from the floor.The trapezoidal shape is used to represent the M1 microphone configuration in the floor plans of Fig. 6.For extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Alamire Interactive Laboratory (AIL)
The AIL [36] is located in a historic gate building, the Saint Norbert's gate of the Park Abbey in Heverlee, Belgium.Fig. 1 shows a fisheye view and Fig. 6 shows a floor plan of the room.Apart from a staircase leading to a floor above, the room is approximately shoebox shaped with 6.4 m width, 6.9 m depth, and 4.7 m height, yielding a volume of approximately 208 m 3 .The floor and ceiling are made of wood.The room is closed by thin line plastered brick walls with two windows each to the front and the back of about 3.3 m 2 each, and wide passages to adjacent rooms, with one of them closed by a glass door.These passages were closed off with curtains during recording, except for a part of the cocktail party noise, cf.Sec.5.3.The housing of the staircase is plastered, the stairs are wooden, and the railing is made of glass.From the RIRs measured in the AIL, the reverberation time T 20 is estimated to be 0.5 s, cf.Sec 6.4.The AIL is equipped with a permanent, fixed array of 24 loudspeakers for spatial audio reproduction as shown in Fig. 1.Further details on the audio hardware used in the AIL are given in Sec. 3, while the microphone and loudspeaker configuration and placement are described in Sec.

Recording equipment
A list of the recording and processing equipment used to create the database is shown in Table 1.In regards to the microphones, the DH contains 2 in-ear omnidirectional microphones (one for each ear) and the two BTE pieces (one for each ear) are each equipped with 2 omnidirectional microphones.The BTE pieces and their proprietary pre-amplifier were provided by Cochlear Ltd. and shown in Fig. 2. The specific loudspeaker and microphone configurations used for the various recordings in the database will be outlined in Sec. 4, and naming conventions of files will be defined in Sec. 6.The recording chains were built as follows.As the digital audio workstations for sending and acquiring the signals, Logic Pro X and Adobe Audition on an iMac were used in the SAL and the AIL, respectively.In the SAL, the signals were sent from Logic Pro X via USB to the RME Digiface, then to the RME M-32 DA using the ADAT protocol, and finally to the respective Genelec 8030 CP loudspeakers.In the AIL, the signals were sent from Adobe Audition via the DANTE protocol to the Powersoft OTTOCANALI 4K4 DSP+D, and finally to the Martin Audio CDD6 loudspeakers.In both rooms, all microphone signals were sent to an RME Micstasy (except for the BTE microphone signals which were firstly routed to the proprietary preamplifier) and converted to ADAT.In the SAL, the ADAT signals were sent to the RME Digiface and finally recorded on Logic Pro X, whereas in the AIL, the ADAT signals were sent to the Ferrofish Verto 64 and via DANTE to Adobe Audition.The various types of recorded signals are outlined in Sec. 5.For post-processing (such as RIR computation, cf.Sec.5), MATLAB and Python were used.

Microphone and loudspeaker configurations
This section describes the microphone configurations in Sec.4.1, the loudspeaker configurations in Sec.4.2, and the placement of these configurations within the SAL and AIL in Sec.4.3.The exact coordinates of the loudspeaker and microphone positions within the SAL and AIL from the various configurations can be loaded from the database, but the details of this procedure will be elaborated upon in Sec. 6.

Microphone configurations 4.1.1 M1
The first microphone configuration, M1, consists of the in-ear microphones from the DH, the microphones from the BTE pieces, three AKG CK97-O microphones, and two AKG CK32 microphones.As the AKG CK97-O and AKG CK32 microphones are not mounted on the DH, they are considered to be 'external' in relation to the DH, and hence will be referred to as external microphones (XMs).This M1 configuration was used in both the SAL and the AIL, cf.Sec.4.3.Fig. 3 depicts the plan view of the measurement configuration of the loudspeakers and microphones used for the audio recordings made in the SAL.For now, however, we will focus only on the trapezoidal shape enclosing the microphones, which is a depiction of the M1 configuration.A description of the corresponding microphone labels is given in Table 2.
For this M1 configuration, the DH is placed at a height of approximately 1.3 m ear level from the floor.Each of the BTE pieces is mounted on the DH as shown in Fig. 2. The XMs are placed [4] within a radius of 1 m from the DH as shown in Fig. 3. XM1, XM2, and XM3 are AKG CK97-O microphones, while XM4 and XM5 are AKG CK32 microphones.The XMs are all positioned at 1 m above the floor.

M2
The second microphone configuration, M2, consists of two concentric circular microphone arrays (CMAs) composed of 4 DPA 4060 and 8 AKG CK 32 microphones.Fig. 4 shows a plan view of the M2 configuration, and a description of the microphone labels is given in Table 2.The inner circular microphone array [4] Note that XM1 is taped on a stand of 18 mm diameter (holding the DH), which may impact the effective directivity pattern of the microphone at high frequencies.has a radius of 10 cm and consists of 4 equidistantly placed DPA 4060 microphones.The outer circular microphone array has a radius of 20 cm and consists of 8 equidistantly placed AKG CK 32 microphones.The microphones are all placed at a height of 1 m above the floor using a holder made of laser-cut acrylic glass, centred around the stand of the DH of the M1 configuration.This M2 configuration was used at two different positions within the AIL, always in combination with M1 as depicted in Fig. 6.It should be noted that since M2 was used in combination with M1, it is also possible to define arrays that contain microphones of both configurations, such as a linear array composed of CMA20 180, CMA10 180, XM1, CMA10 0, CMA20 0, XM2, and XM3.

Loudspeaker configurations 4.2.1 LS-SAL
The loudspeaker configuration LS-SAL as the name suggests is used in the SAL only.It is defined relative to the M1 microphone configuration, and consists of 10 loudspeakers.The loudspeakers are positioned  3 [a] ∈ {-90, -60, -45, -30, 0, 30, 45, 60, 90} as depicted in Fig. 5 [l] ∈ {L, U, T} (indicating lower, upper, and top level) at various spatial locations at a height such that the centre of each of the woofers is approximately 1.3 m above the floor.Fig. 3 is a plan view of this LS-SAL loudspeaker configuration along with the M1 microphone configuration.A description of the loudspeaker labels is also provided in Table 2.During recordings, the loudspeaker S0 1 was removed before recording the signals for the loudspeaker S0 2 so that there was a direct line of sight from the latter to the DH.

LS-AIL
The loudspeaker configuration LS-AIL is a 24-loudspeaker array, permanently installed in the AIL, cf.Fig. 1, which is typically used for spatial sound reproduction.Fig. 5 shows the geometry of the loudspeaker array.The loudspeakers are labeled as described in Fig. 5 and Table 2.The width and depth of the array are approximately 5.6 m and 4.85 m, and the loudspeakers are arranged in three groups of different height levels, referred to as lower, upper, and top level.The lower level consists of 8 speakers located around the room along the walls at about 1.5 m height, the upper level containing 12 speakers is located above at about 3.3 m height, and the top level containing 4 speakers is located more centrally at about 4.1 m height.Note that for the sake of simplicity, the presented locations are only approximate.Using measurements of the distances between the speakers and a set of four refer-ence points on the floor with known coordinates, the exact coordinates of the loudspeakers have been estimated based on the theory on Euclidean distance matrices [41].All microphone and loudspeaker coordinates can be loaded from the database as discussed in Sec.6.2.

4.3
Microphone and loudspeaker configuration placement Fig. 6 illustrates the placement of the M1 microphone configuration as well as the LS-SAL loudspeaker configuration within the SAL at a recording position near the corner of the L-shaped room.
Fig. 6 shows a floor plan of the setups M1 and M2 within the AIL, together with the lower speakers of the LS-AIL loudspeaker array.As can be seen, there are two recording positions in the AIL, referred to as P1 and P2, with the DH facing the speakers SU6 and SU7, located roughly below ST2 and ST1 (not shown in the figure), respectively.In both recording positions, both microphone configurations M1 and M2 are used, with the stand of the DH of M1 being the center of the circular microphone arrays of M2.Fig. 7 shows a combination of M1 and M2 as used in position P2.
The coordinates of all speakers and microphones in both rooms can be loaded from the database using MATLAB or Python, cf.Sec 6.2.

Recorded signals
The MYRiAD database contains 76 hours of audio data and has a size of 36.2GB.All microphone signals in the database are provided at a sampling frequency of 44.1 kHz with a 24 bit resolution.Their gains are set such that the recording level across the different microphone models is approximately the same around 1 kHz in diffuse noise.For the sake of consistency, recordings were done simultaneously [5] for all microphones in the SAL as well as in each of the two recording positions P1 and P2 in the AIL.A summary of the signals recorded and computed, along with the quantity of each (i.e. the number of different instances of that type of signal), their duration, their source, their acquisition method (i.e.how the signals were generated), the employed loudspeakers, and a signal label is provided in Table 3.In the remainder of this section, we discuss in more detail the RIR measurements in Sec.5.1, the recorded speech, noise and music signals in Sec.5.2, and the recorded cocktail party in Sec.5.3. [5]This implies acoustic scattering effects that may not match the envisioned application.For instance, when simulations are performed using the CMAs, scattering from the DH may not be meaningful to the simulated scenario.Nevertheless, given that an accurate reproduction of scattering is hardly ever practical, this does not compromise the use of these signals to evaluate acoustic signal processing algorithms.

Room impulse responses
The database includes in total 110 RIRs from the SAL and 1104 RIRs from the AIL.To obtain the RIRs, two exponential sine sweep signals were played and recorded for each loudspeaker-microphone combination.In the AIL, the sides of the room were closed off with curtains during the recording.From these sine sweeps, the RIRs were computed by cross-correlation [6]  according to the procedure detailed in [37].From each pair of recorded sine sweeps, one of them was selected for RIR estimation by visual inspection of the spectrograms (more specifically, spectrograms containing any type of non-stationary noise were discarded).In order to obtain as clean as possible RIRs, some of the recorded sine sweeps were post-processed as to suppress low-level (stationary) harmonic noise components produced by the recording equipment.In this post-processing procedure, frequency bins containing harmonic noise components were identified during silence by comparing their magnitude to the median magnitude of neighbouring frequency bins.If the dif- [6] It should be noted that the estimated impulse responses also include some characteristics of the recording hardware.Consequently these impulse responses are, in a strict sense, not the true RIRs which represent the characteristics of the room only.Nevertheless these impulse responses are designated as RIRs for simplicity.ference was above the threshold of 4 dB, a Wiener filter [2] was applied in that frequency bin.The recorded signals were further post-processed to remove the inputoutput delay caused by the recording hardware.

Speech, noise, music
Speech, stationary noise, and music signals were played through the loudspeakers indicated in Table 3 and recorded by all microphones.Three male and three female speech segments were chosen randomly from the Centre for Speech Technology Research (CSTR) Voice Cloning Toolkit (VCTK) corpus [38].The stationary noise source signal has a speech-shaped spectrum and was generated in MATLAB based on speech spectra from the VCTK corpus.The drum piece was taken from the studio recording sessions in [39].The piano piece is track 60 (Schubert) from the European Broadcast Union Sound Quality Assessment Material Recordings for Subjective Tests (EBU SQAM) [40].In the AIL, the sides of the room were closed off with curtains during recording.These signals were acquired for all loudspeakers in the SAL, but only for the lower loudspeaker level in the AIL, that is SL1 to SL8 (in contrast to the RIRs, which were computed for all possible loudspeaker-microphone combinations, cf.Sec.5.1).The recorded signals were post-processed to remove the input-output delay caused by the recording hardware.For the signals recorded in the SAL, a slow phase drift was observed between the recorded data and simulated data obtained from convolving the estimated RIR with the source signal, cf.Sec.6.3.This phase drift can be associated to hardware limitations in the recording setup and has been compensated for by time-shifting some of the recorded signals [7] such as to minimize the error between the recorded and the convolved data.For the signals recorded in the AIL, no phase drift was observed.Both the source signals and the recorded signals are included in the database.

Cocktail party
In addition to the aforementioned signals, a cocktail party scenario was re-created and recorded in both the [7] Only a minority of the recorded signals required a shift of at most 2 samples.[37] all RIR 1 The subset L sub includes all speakers in SAL and SL1 to SL8 in the AIL, cf.Fig. 5 and Table 2. 2 The raw sine sweeps are not included in the database and hence do not have a label.SAL and the AIL.All participants gave informed consent.They were instructed to stay outside of a 1 m circumference around the DH in both rooms and periodically move around in a random manner engaging in conversation.Snacks and beverages in glasses were also served to the participants during the recordings.

AIL
For the SAL cocktail party, at any given time, there were at least 15 people present in the room, whereas for the AIL cocktail party, there were at least 10 and at most 14 people present.In the SAL, the microphone configuration M1 located as shown in Fig. 6 was used (the loudspeakers were removed from the room).In the AIL, the microphone configurations M1 and M2 located in position P2 as shown in Fig. 6 were used.The curtains on the sides of the room in the AIL were closed during the recordings of CP1, CP2, and CP3, and open during CP4, CP5, and CP6.Photos from the cocktail parties in the SAL and AIL are shown in Fig. 8.

Using the database
In this section, we elaborate on the file path structure of the database in Sec.6.1 as well as the code The signal label [s] takes the forms as defined in Table 3. 2 The speaker labels S[a] [d] and S[l][i] and the microphone label [m] take the forms as defined in Table 2.
3 P1 and P2 refer to the microphone configuration placements at the AIL as shown in Fig. 6. 4 The script or function names [f] take the forms as defined in Table 5.The folder /coord/ contains files with coordinates of all speakers and microphones in both the SAL and the AIL, and the folder /tools/ contain MATLAB and Python scripts for accessing audio data and coordinates, cf.Sec.6.2.

Creating Microphone Signals and Retrieving
Coordinates The database comes with MATLAB and Python scripts intended to facilitate retrieving loudspeaker and microphone coordinates and generating signals, as listed in Table 5.
The script load audio data is an example script demonstrating how a .wav-filecan be loaded given a list of loudspeaker, microphone, and signal labels provided by the user.This script also calls the function load coordinates(), which reads corresponding coordinates from SAL.csv or AIL.csv (cf.Table 4) and optionally visualizes them.

Examples of the audio signals
In this section, we take a glimpse into the database by observing some of the signals in both the SAL and the AIL, which will also make evident the different acoustics of the spaces.The colourmap in the spectrograms corresponds to the squared magnitude of the short-time Fourier transform coefficients and is plotted in dB.Fig. 9 (a) is the first 10 seconds of the source signal corresponding to a female speaker, F1 (cf.Table 3).Fig. 9 (b) is a computed RIR in the SAL from the loudspeaker S0 1 to microphone BTELF (cf.Fig. 3), where the reverberation time is seen to be quite long and highly frequency-dependent.Fig. 9 (c) shows the recorded signal of the source signal F1 (from Fig. 9 (a)) in the microphone BTELF after being played through the loudspeaker S0 1.The effect of the reverberation is evident as the spectrogram shows how the source signal has now been distorted in both time and frequency.Fig. 9 (d) is the result of a convolution between the RIR from loudspeaker S0 1 to microphone BTELF (Fig. 9 (b)) and the F1 source signal (Fig. 9 (a)).This signal is representative of how the recorded signal from Fig. 9 (c) would typically be simulated.As should be expected, Fig. 9 (c) and Fig. 9 (d), appear quite similar.However, Fig. 9 (e) illustrates the difference (error) between the waveform plots in Fig. 9 (c) and Fig. 9 (d), with the corresponding spectrogram of this error, demonstrating that the simulated signal and recorded signal are not identical.The error may be due to a variety of reasons such as acoustic noise, loudspeaker non-linearities, recording hardware limitations including slow phase drifts, cf.Sec.5.2, and slowly time-variant as well as not perfectly linear sound propagation.Fig. 10 displays signals from the AIL in a similar manner to that of Fig. 9.The first 10 seconds of the same source signal, F1 (cf.Table 3) is observed (Fig. 10 (a)).Fig. 10 (b) is a computed RIR in the AIL from the loudspeaker SL5 1 to microphone BTELF (cf.Fig. 3), where it can be observed that the reverberation time is significantly shorter as compared to the SAL and more uniform across frequency.Fig. 10 (c) shows the recorded signal of the source signal F1 (from Fig. 10 (a)) in the microphone BTELF after being played through the loudspeaker SL5 1. Fig. 10 (d) is the result of a convolution between the RIR from loudspeaker SL5 1 to microphone BTELF (Fig. 10 (b)) and the F1 source signal (Fig. 10 (a)).Fig. 10 (e) is the difference (error) between the waveform plots in Fig. 10 (c) and Fig. 10 (d).It can once again be observed that although the simulated and recorded signals are quite similar, they are not identical.
Figure 11 depicts the waveform and corresponding spectrogram from a 15 s sample of the cocktail party noise.The left of Fig. 11 is the signal CP2 (cf.Table 3) for microphone XM2 in the SAL and the right of Fig. 11 is the signal CP5 from XM2 in the AIL.The non-stationary behaviour of this type of noise over time and frequency is quite evident.

Reverberation times
The reverberation time T 20 for the two rooms SAL and AIL is estimated at full bandwidth as well as in different octave bands.The estimate is obtained from the slope of a line fitted on the decay curves of the RIRs according to the ISO standard [42] and using the code in [43].Here, the line was fitted in the dynamic range between −5 dB and −25 dB of the decay curve.A plot of the estimated reverberation times is shown in Fig. 12.As can be seen, the full-band reverberation time is significantly higher in the SAL with 2.1 s as compared to the AIL with 0.5 s.We further note that T 20 in the SAL is largest between 1 and 2 kHz and quickly reduces above, while it is less dependent on frequency in the AIL.While in the AIL, the variance of the T 20 estimates continuously decreases with frequency, we observe that it increases again above to 2 kHz in the  SAL.This may be due to an observed magnitude decay of the SAL RIRs above 2 kHz, resulting in less accurate line fitting.In addition, the increased directivity of the loudspeakers at higher frequencies may result in stronger variations of the generated sound field with regards to the loudspeaker placement.

Conclusion
In this paper, a database of acoustic recordings, referred to as the Multi-arraY Room Acoustic Database (MYRiAD), has been presented, which facilitates the recreation of noisy and reverberant microphone signals for the purpose of evaluating audio signal processing algorithms.Recordings were made in two different rooms, the SONORA audio laboratory (SAL) and the Alamire Interactive Laboratory (AIL), with significantly different reverberation times of 2.1 s and 0.5 s, respectively.In the SAL, a microphone configuration, M1, was used, which consists of in-ear dummy head microphones, microphones on behind-the-ear pieces placed on the dummy head, and external microphones (i.e.other microphones in the room).In the AIL, recordings were made in two different positions within the room using the microphone configuration M1 along with a second microphone configuration, M2, which consists of two concentric circular microphone arrays.In the SAL, 10 movable loudspeakers were used for sound generation, while in the AIL, a built-in array of 24 loudspeakers was used.The database contains room impulse responses, speech, music and stationary noise signals, as well as recordings of a live cocktail party held in each room.MATLAB and Python scripts are included for accessing audio data and coordinates.The database is publicly available at [1].

Figure 1 :
Figure 1: Fisheye view of the SAL and the AIL.

Figure 2 :
Figure 2: Dummy BTE pieces used for creating the database.Each BTE piece consists of two omnidirectional microphones as indicated by the circles.

6 . 4 .
Details on the audio hardware used in the SAL are given in Sec. 3, while the microphone and loudspeaker configuration and placement are described in Sec.4.1.1,Sec.4.2.1, and Sec.4.3.

Figure 3 :
Figure 3: Plan view of the M1 microphone configuration and the LS-SAL loudspeaker configuration.A description of the microphone and loudspeaker labels is given in Table2.The radial grid spacing of the polar plot is 0.25 m.The DH is placed at a height of approximately 1.3 m ear level from the floor and all XMs are placed at a height approximately 1 m from the floor.The trapezoidal shape is used to represent the M1 microphone configuration in the floor plans of Fig.6.For extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Figure 4 :
Figure 4: Plan view of the M2 microphone configuration.A description of the microphone labels is given in Table2.The radial grid spacing of the polar plot is 0.1 m.DPA 4060 microphones are used for the inner circular microphone array and AKG CK 32 microphones are used for the outer circular microphone array.The circle drawn around the microphones represents the M2 microphone configuration in the floor plans in Fig.6.For extracting more precise coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Figure 5 :
Figure5: View of the LS-AIL loudspeaker array in the AIL.A description of the loudspeaker labels is given in Table2.The speakers are organized in three di↵erent height levels of about 1.5 m (lower level), 3.3 m (upper level), and 4.1 m (top level) above the floor.The axes limits coincide with the boundaries of the approximately shoe-boxed shaped room, cf.Sec.2.2.On the horizontal axes, the approximate distance between neighbouring speakers is indicated.The given dimensions are of indicative nature and not exact; for extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Figure 6 :
Figure 6: and loudspeaker configuration placement.(Left) Placement of the M1 microphone configuration and the LS-SAL loudspeaker configuration within the SAL.(Right) Placement of the M1 and M2 microphone configurations in P1 and P2 as well as the lower level of the LS-AIL loudspeaker configuration within the AIL.Details of the M1 and M2 microphone configurations and the LS-SAL and LS-AIL loudspeaker configuration can be seen in Fig. 3, Fig. 4, and Fig. 5.For extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Figure 7 :
Figure 7: A combination of the microphone configurations M1 and M2 as used at the AIL.

Figure 8 :
Figure 8: Cocktail party recordings at the SAL and the AIL.

Figure 9 :
Figure 9: Waveform and corresponding spectrogram of signals related to the SAL recordings.(a) First 10 seconds of the source signal corresponding to a female speaker, F1 (cf.Table 3), (b) computed RIR from the loudspeaker S0 1 to microphone BTELF (cf.Fig. 3), (c) recorded microphone BTELF signal after the signal from (a) was played through the loudspeaker S0 1, (d) simulated signal from the convolution of (a) and (b), error between signals (c) and (d).

Fig. 9
displays the waveform (top of each sub-figure) and corresponding spectrogram (bottom of each subfigure) for a number of signals related to the SAL.

Figure 10 :
Figure 10: Waveform and corresponding spectrogram of signals related to the AIL recordings.(a) First 10 seconds of the source signal corresponding to a female speaker, F1 (cf.Table 3), (b) computed RIR from the loudspeaker SL5 1 to microphone BTELF (cf.Fig. 3), (c) recorded microphone BTELF signal after the signal from (a) was played through the loudspeaker SL5 1, (d) simulated signal from the convolution of (a) and (b), (e) error between signals (c) and (d).

Figure 11 :Figure 15 :
Figure 11: Waveform and corresponding spectrogram for a 15 s sample of the cocktail party noise.(Left) Signal CP2 for XM2 in the SAL.(Right) Signal CP5 for XM2 in the AIL.

Figure 12 :
Figure 12: Reverberation time T20 for the two rooms SAL and AIL at full bandwidth and in different octave bands.The error bars indicate the standard deviation of the estimate across all possible loudspeaker-microphone combinations.

Table 1 :
3, Equipment used for creating the database.

Table 2
. The radial grid spacing of the polar plot is 0.1 m.DPA 4060 microphones are used for the inner circular microphone array and AKG CK 32 microphones are used for the outer circular microphone array.The circle drawn around the microphones represents the M2 microphone configuration in the floor plans in Fig.6.For extracting more precise coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Table 2 :
Microphone and loudspeaker labels.
• and in [d]m distance as depicted in Fig.

Table 2 .
The speakers are organized in three di↵erent height levels of about 1.5 m (lower level), 3.3 m (upper level), and 4.1 m (top level) above the floor.The axes limits coincide with the boundaries of the approximately shoe-boxed shaped room, cf.Sec.2.2.On the horizontal axes, the approximate distance between neighbouring speakers is indicated.The given dimensions are of indicative nature and not exact; for extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Table 5 .
In the remainder of this section, we discuss in more detail the recorded speech, noise and music signals in Sec.5.1, the recorded cocktail party in Sec.5.2, and the RIR measurements in Sec.5.3.View of the LS-AIL loudspeaker array in the AIL.A description of the loudspeaker labels is given in Table2.The speakers are organized in three different height levels of about 1.5 m (lower level), 3.3 m (upper level), and 4.1 m (top level) above the floor.The axes limits coincide with the boundaries of the approximately shoe-boxed shaped room, cf.Sec.2.2.On the horizontal axes, the approximate distance between neighbouring speakers is indicated.The given dimensions are of indicative nature and not exact; for extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Sec.6.2 should be used.

Table 3 :
Signals recorded and computed in the database.

Table 4 :
File path structure of the database.

Table 5 :
Scripts facilitating the use of the database., and the folders in /audio/AIL/SU * / and /audio/AIL/ST * / only contain files of signal type RIR, cf.Sec.5.2. *