Online distributed waveform-synchronization for acoustic sensor networks with dynamic topology

Acoustic sensing by multiple devices connected in a wireless acoustic sensor network (WASN) creates new opportunities for multichannel signal processing. However, the autonomy of agents in such a network still necessitates the alignment of sensor signals to a common sampling rate. It has been demonstrated that waveform-based estimation of sampling rate offset (SRO) between any node pair can be retrieved from asynchronous signals already exchanged in the network, but connected online operation for network-wide distributed sampling-time synchronization still presents an open research task. This is especially true if the WASN experiences topology changes due to failure or appearance of nodes or connections. In this work, we rely on an online waveform-based closed-loop SRO estimation and compensation unit for nodes pairs. For WASNs hierarchically organized as a directed minimum spanning tree (MST), it is then shown how local synchronization propagates network-wide from the root node to the leaves. Moreover, we propose a network protocol for sustaining an existing network-wide synchronization in case of local topology changes. In doing so, the dynamic WASN maintains the MST topology after reorganization to support continued operation with minimum node distances. Experimental evaluation in a simulated apartment with several rooms proves the ability of our methods to reach and sustain accurate SRO estimation and compensation in dynamic WASNs.


Introduction
The availability of smart devices equipped with diverse sensors has stimulated ample research in wireless sensor networks (WSNs) [1][2][3][4][5][6].Meanwhile, wireless acoustic sensor networks (WASNs) have emerged as a research area of its own [7][8][9].Due to the autonomy of agents, methods for sampling-time synchronization are a crucial piece of network infrastructure to discipline all WASN nodes to a consistent sampling rate [10].However, considerable attention is still required for smooth and efficient network-wide treatment.
Importance of time synchronization for signal processing in WASN is evident from the fact that asynchronous signals even with sampling rate offset (SRO) values in the subhertz range cause a significant decrease of overall network performance, such as, in acoustic source separation that operates with a sampling rate of 16 kHz , an SRO of only 1 Hz leads to a drop of the signal-to-interference- ratio gain from 9 -10 dB down to only 3 -4 dB [11,12].For similar SRO values, the intelligibility of distributed beamforming-based noise reduction is reduced from 0.8 up to 0.5 in terms of extended short-term objective intelligibility values, if sensor nodes are equipped with one or two microphones [13].The SRO quantity is often normalized to the sampling rate of a reference node and measured in parts per million (ppm) 1 , since in real-world WASN applications it is usually a rather small value within the range of ±100 ppm [14,15].
Two core tasks of time synchronization are estimation and compensation of all SRO values [16,17].SRO compensation can be implemented either in hardware by changing the oscillator frequency (requiring a direct access to respective circuitry of analog-to-digital converters) or in software by digital-to-digital conversion (i.e., resampling) of microphone signals [18].In the scope of this publication, we rely on comprehensive options for software-based online-capable SRO compensation [19][20][21][22][23][24].Methods for SRO estimation are generally based either on time stamp exchange between network agents or on the acoustic waveforms already shared for joint signal processing [25].
Time stamp-based SRO estimation has traditionally received larger attention, especially for network-wide distributed synchronization of WSNs [26][27][28][29][30], which aims at shared responsibilities across the network and at scalability in terms of communication bandwidth and computational load in contrast to centralized network operation [10].In such scenarios, the time stamps are exclusively exchanged either in one-way or in two-way communication procedure between neighboring nodes, which is referred to as a gossiping approach [31].In the seminal work [32], a wide-spread timing-sync protocol for sensor networks (TPSN) has been proposed where network-wide clock synchronization is provided by two consecutive steps: organization of the network in a hierarchical topology and pair-wise synchronization of network agents along the topology edges.Furthermore, a reference node is set to whose timing all other nodes are to be aligned.Further control must be applied with the TPSN scheme to accommodate dynamic WSNs [33], meaning networks that may change their structure during operation as a reaction to failure or appearance of nodes or communication links.Similar techniques are hardly available for waveform-based network-wide synchronization of WASNs and a major goal of this paper is to fill this gap.
Waveform-based SRO estimation solely uses asynchronous acoustic signals without any time stamp information or protocol [34][35][36][37][38][39][40][41][42][43][44][45], which is particularly rational when the network already exchanges acoustic waveforms for joint acoustic signal processing over the network.Typical acoustic excitation here is a directional or diffuse sound field from single or multiple acoustic sources like speech, music, or even spatially correlated noise in non-reverberant and reverberant settings 2 .With the exception of [45], waveform-based methods typically operate on pairs of sensor signals, i.e., one reference signal with nominal sampling and one non-reference signal with SRO.Apart from [34], the methods for pairwise waveform-based SRO estimation can be categorized into three groups.The first group makes explicit use of the complex-valued spectral coherence function, whose phase drift is directly connected to the underlying SRO [35,39,40,44].Methods of the second group rest upon statistical modeling of short-time Fourier transform (STFT) coefficients [36,41].A desired SRO value is estimated here via maximization of the likelihood function defined on STFT coefficients of asynchronous and pre-synchronized sensor signals.In the third group, different techniques for correlation or coherence processing are deployed either in the time domain or in the STFT domain [37,38,42,43].Note that with the exception of [38,44], the majority of the waveform-based methods are designed for offline SRO estimation.
Considering a network-wide waveform-based synchronization, small WASNs comprising more than two sensor nodes have been investigated in [38,42,44,45] with no particular considerations regarding the network topology (it appears centralized).In [38,42,44] every sensor node is directly connected to the central reference node via a single-hop link.In larger networks, the centralized topology, however, leads to a computational overload of the central node and to an inefficient use of a communication bandwidth or can even be completely unfeasible [28].In [45], all sensor nodes were linked with each other in a so-called fully connected topology that is even more demanding than a centralized topology.To avoid the drawbacks of the centralized method, a distributed SRO estimation for WASNs with arbitrary topology has been proposed very recently [47], however, only for offline signal processing based on a specific calibration signal and implemented only on fully connected or almost-fully connected topologies.From time stamp-based WSN synchronization [32], we know that networks can be more efficiently organized in hierarchical tree topologies and synchronized by distributed procedures where every node aligns its own signal to the sampling rate of the reference node.On the way to a distributed online-capable waveform-based synchronization, we have come up with a number of own developments that are briefly described as next.

Relation to own works
Before the synchronization of acoustic sensor networks received greater attention, a precursor of waveform-based SRO estimation and compensation was described in the context of acoustic echo cancellation [48], where SRO was tracked by means of an LMS-type adaptive filter operating on two slightly asynchronous input signals.A related tracking theory for adaptive filters with asynchronous input and output signals was later reported in [49].
In the context of WASNs, as in Fig. 1, a double-crosscorrelation processor (DXCP) in the time domain with remarkable robustness to acoustic reverberation and noise has been proposed in [50] and restated as an FFTbased implementation with phase transform (PhaT) for online SRO estimation with outstanding accuracy [51] 3 .DXCP essentially refers to the concept of a secondary cross-correlation computed over a moving primary cross-correlation on signals with SRO.The secondary correlation then allows unbiased extraction of the underlying SRO.The DXCP-PhaT version has further evolved with a demonstration of robustness to packet loss in WASNs [52], with a closed-loop implementation to integrate sampling rate compensation [53], with extensions for tree-based distributed network-wide time synchronization [54], and very recently with robustness for long-term operation under nonpersistent acoustic activity [55].
The real-world utility of DXCP-based SRO estimation has been assessed with open-source developments of demonstrators in a larger research unit on acoustic sensor networks: (1) a first demo at WASPAA-2021 uses the MARVELO software on Raspberry Pi computers [56,57] as a framework for our online SRO estimation between two sensor nodes; (2) a second demo at IWAENC-2022 uses Python notebooks to present the network-wide closed-loop WASN synchronization on various topologies and geometries created by means of the PaderWASN toolbox [44] applied to the Sound Interface to the Swarm (SINS) apartment [58] simulated as shown by [59] and depicted in Fig. 1 4 .

Proposals of this contribution
Based on our previous developments, a distributed online-capable network-wide waveform-synchronization will be proposed in this paper.Additionally, it will be extended for use in dynamic WASNs.The specific novelty of our contribution here is threefold.

1) All propagation of state and information in a network
based on distributed local operation takes its time and effort [54].To support the information flow for network consensus, we propose: • A buffer-based closed-loop (online) SRO estimation and compensation taking place from the outset and round-robin on all nodes of the network in Section 3.2, • A network topology according to a minimum spanning tree (MST) for better local connectivity in the network in Section 3.3. 3Similar to SRO estimators based on time stamp exchange, e.g.,from [31], the waveform-based DXCP-PhaT achieves root-mean-square error (RMSE) of around 0.03 ppm without a need of an additional communication link.
2) Real-world networks with continued operation will sooner or later experience radical modifications, such as the appearance of new nodes or failure of nodes and communication links between them.Section 4 therefore introduces a somewhat generic network protocol to handle these modifications with sustained synchronization of already synchronous network parts but with new MST configuration for continued operation.3) An acoustic shoe-box room simulation might be an oversimplified enclosure regarding acoustic connectivity of the available network nodes.Thus, we simulate a sophisticated SINS apartment with several connected rooms in Sections 5.1 and 6.1 in order to meaningfully assess DXCP-based network-wide synchronization under the aforementioned organizational constraints.
The paper is otherwise organized as shown by Fig. 2, where sections with the specific novelty are marked by superscript asterisks.Methods for pairwise waveformbased synchronization are revisited in Section 2 to support our distributed network synchronization in Section 3 and our proposed synchronization protocol for dynamic WASN in Section 4. Experiments including a proof of concept and a large-scale quantitative assessment followed by some ablation studies are reported in Sections 5, 6, and 7.

Sampling rate offset and pairwise waveform-based signal synchronization
After introduction of SRO, its impact on the acoustic sensor signals in time and frequency domain is discussed.Furthermore, components of a waveform-based synchronization are considered including SRO compensation that consists of an integer-based time shift of asynchronous signal followed by signal resampling.Finally, a closed-loop architecture for pairwise signal synchronization [53] is explained more elaborately.
In frame-based signal processing, an averaged SROinduced ATD is thus observed, i.e., where ℓ ≥ 1 is the frame index and n mid [ℓ]=(N −1)/2 + N s • ℓ are the time points on the dimensionless axis t/T r corresponding to the midpoint of the ℓ-th data frame with frame size N and frame shift N s .
A linear phase-drift (LPD) model [36,37] in the STFT domain is then expressed as where Y [k, ℓ] and Z[k, ℓ] are the STFT coefficients of y[n] and z[n], respectively, j is the imaginary unit, and k ∈ {0, . . ., N − 1} denotes a discrete frequency index.According to Eqs. ( 2), (3), and (4), z[n] is a time-scaled waveform of y[n] corresponding to a time shift between y[n] and z[n] linearly growing with time for fixed SRO ε = 0 .Note that this constitutes a common assumption, as in reality the SRO varies over time only very little 6 . (1) Fig. 2 Workflow of the paper

Waveform-based SRO estimation and compensation
Considering any two acoustic nodes indexed by r and i, the node r is assumed to be the reference node with perfect ADC ( ε = 0 ).In contrast, node i uses an imper- fect ADC characterized by the SRO parameter ε ri = 0 .Waveform-based synchronization (WS) of z r [n] and z i [n] consists of SRO estimation and compensation.Using one of the methods for SRO estimation designed for frame-based processing [35-37, 39-42, 44, 45, 51], SRO estimates ε ri [ℓ] can be obtained from the observed asyn- chronous signals z r [n] and Next, ε ri [ℓ] should be appropriately removed from asynchronous signal z i [n] , leading to an SRO-compen- sated, synchronized signal z i,S [n] , aligned to the reference signal z r [n] in terms of sampling rate.For this, the real- valued time-variant ATD from (3) can be recursively estimated in every ℓ-th data frame by Note, Eq. ( 5) implies that both SRO estimation and compensation are executed at the same frame-rate f WS = f r /N s .Then, τ ri [ℓ] can be compensated in every sig- nal frame by execution of two processing steps: (a) correction of an integer-valued ATD that can be removed from z i [n] by sample-wise shift of the i-th sensor signal, leading to a roughly synchronized signal z i [n − τ int ri [ℓ]] and (b) compensation of a fractional ATD via resampling of the roughly synchronized signal; see Fig. 3. Various resampling methods can be applied for compensation of fractional ATD [19][20][21][22][23]36]. Since the STFT resampling method from [36] proved to be a very computationally efficient and sufficiently accurate resampling method 7 , it seems to be an appropriate choice for frame-wise compensation of τ  4) is used in (8).Further it should be mentioned (5) that the FFT window size can be different for SRO estimation and compensation.

Closed-loop synchronization of sensor node pairs using internal model control
In order to accomplish a robust waveform-based time synchronization of large acoustic networks by using the subsystems for SRO estimation and compensation described in the previous section, a structural combination of both subsystems to obtain a feasible synchronization unit has to be discussed.

Open-loop synchronization
Retrieval of SRO from asynchronous signals z r [n] and z i [n] can lead to estimation with significant bias and uncertainty, where a subsequent SRO compensation can leave an unacceptable synchronization error [40].In terms of control theory, such a consecutive implementation of the subsystems can be referred to as an open-loop control system depicted in Fig. 4a.A significant disadvantage of such architecture applied for online signal processing is that the SRO estimation is executed on the asynchronous signals with growing ATD between them.Consequently, the requirement of similar frame contents necessary for the LPD model ( 4) is only fulfilled if the condition |τ ri [ℓ]| ≪ N is valid, i.e., as long as the aver- age ATD between z r [n] and z i [n] is well within the frame size N [37].Otherwise, SRO estimation (and also compensation) will collapse with time, making such architecture suitable only for short signal segments or small SROs [36].

Closed-loop synchronization
In offline signal processing, synchronization can be improved by applying the so-called multi-stage procedure with multiple closed-loop iterations of SRO estimation and compensation over the entire signal [40].This mechanism can be converted into a continuous feedbackcontrol loop comprising a controlled subsystem for SRO compensation followed by an online implementation of SRO estimation as shown in Fig. 4b.Since the subsystem for SRO estimation operates on the synchronized signals, it estimates a current residual SRO �ε ri [ℓ] between z r [n] and z i,S [n] after SRO compensation.Thus, the require- ment of similar frame content is always fulfilled here.
Compared to the open-loop structure, however, such In the steady state, the system is meant to approach � ε ri [ℓ] → 0 and ε ri [ℓ] → ε ri .Therefore, since SRO estimation is more precise for smaller SRO values as shown in [50], the closed-loop structure naturally ensures operation of SRO estimation at the optimal working point.In contrast to multi-stage processing, the resulting control architecture merely applies a single treatment of each signal frame, while efficiently diminishing SRO bias and uncertainty with time.

Design of controller based on internal model control (IMC) theory
The controller has to be developed for the frame-based rate f WS of the waveform-synchronization.As a discrete- time system, it is designed in the domain of the bilateral z-transform, where an impulse response of the controller From various types of control strategies, we suggest to use a controller based on IMC theory [60,61], while other designs are possible too.Therefore, an explicit model of the controlled system (plant) is required that consists of SRO compensation and estimation.Abstracting the underlying SRO from the audio signals, we can create a block diagram of the control loop as depicted in Fig. 5a.Here, the function of SRO compensation is described as a subtraction of the estimated SRO ε ri [ℓ] from the actual SRO ε ri [ℓ] .Fur- thermore, we suggest to use the DXCP-PhaT method [51] for residual SRO estimation, the dynamical behavior of which is characterized here with G DXCP (z) .Aiming at perfect signal synchronization that would be observed as � ε ri [ℓ] = 0 , the reference control signal w[ℓ] is defined as zero.The IMC control circuit implies a plant predictive model leg placed in parallel to the actual plant, where the SRO compensation simplifies to a "−1" multiplier and an approximation ĜDXCP (z) is used instead of the actual feeds back to an IMC filter G IMC (z) .The latter is designed for quadratic minimization of the control error, i.e., the residual SRO signal �ε ri [53].
In order to deal properly with feasibility of the control circuit, the optimal solution is extended by a lag element of order n f ( PT n f ) [62] with filter function where T WS = 1/f WS is the time shift between STFT frames, T IMC a desired time-constant of F IMC (z) and n IMC the order of F IMC (z) .Overall, the IMC filter therefore becomes A sophisticated DXCP-PhaT model G DXCP (z) as derived in [53] can be simplified regarding model order and complexity of the corresponding IMC controller to a minimum architecture (9)  parameterized by the dominant smoothing constant α 2 of DXCP-internal recursive averaging.The latter is used in DCXP for estimation of a secondary generalized crossspectral density [51] and is responsible for its dominant time-constant T DXCP = T WS /ln(1/α 2 ).Now, the system function of the final IMC-based controller G C (z) in Fig. 5b can be derived as where the architecture in Fig. 5b is an equivalent reorganization of the block diagram in Fig. 5a and the IMC filter from (10) with approximation (11) is used in (12a) for obtaining (12b).
Given the closed-loop synchronization unit Fig. 4b with an embedded DXCP-PhaT method for SRO estimation and the derived IMC-based controller, a gossiping approach for distributed network-wide synchronization can be developed in the next section.

Online distributed network-wide synchronization using closed-loop unit
Based on the pairwise synchronization, our concept of a synchronization gossip from [54] is introduced first.A buffer-based implementation of the closed-loop synchronization unit is then described to prepare the appropriate flow of information in the gossip.Finally, a topological organization of WASN by means of a minimum spanning tree is introduced here to support the acoustic connectivity of involved node pairs. (12a)

Concept of synchronization gossip
We consider a WASN with N WASN acoustic sensor nodes labeled with index i ∈ {0, . . .N WASN −1} .Among these, a root node r is always defined/chosen to be the global reference node whose sampling rate is equal to the reference sampling rate f r .In this kind of WASN, at least N WASN −1 unknown SROs have to be estimated for a successful network-wide signal synchronization.From graph-theoretical point of view [63], the topology of a WASN can be described as a directed tree denoted as − → T = (V, E) , where the vertex set V contains N WASN nodes and the edge set E consists of N WASN −1 network links [10,16,64].On such a tree, a network-wide time synchronization can be realized either in a centralized or in a distributed way.

Centralized synchronization
In contributions for waveform-based synchronization with more than two nodes, the centralized synchronization is considered implicitly [38,42,44,45].For this, all acquired signals are transmitted via a single-hop communication to the root node, where the entire synchronization takes place.The significant drawbacks here are a possible computational overload of the central node in a larger network and a simultaneous requirement of communication bandwidth [28].

Distributed synchronization
Here, on the contrary, the distributed scheme spreads the signal synchronization task over the network so that SRO of every non-reference node is estimated and compensated on the same node where the signal is acquired as it is proposed in publications with time stamp-based synchronization [29,31].Significant advantages of such a distributed scheme are the sharing of computational power required for synchronization and the scalability regarding communication bandwidth [10,30].

Network topologies and their properties
Three particular types of topologies for distributed synchronization are distinguished here: a star tree, a path tree and a rooted tree.Every topology can further be considered with two different edge directions either as an intree (edges oriented to the root) or as an out-tree (edges oriented away the root).Examples of out-tree topologies for N WASN = 5 placed in an isolated shoe-box room are depicted in Fig. 6: star-out-tree (SOT), path-out-tree (POT) and rooted-out-tree (ROT).The root node is highlighted with a bold circle.The direction of edges indicates a one-way out-flow of signals z i [n] from node i along the respective wireless links 8 .Accordingly, every WASN Sensor nodes organized in a certain topology can be characterized by the property of depth (or level).The depth of a node d i is defined as the length of its path to the root node, which itself has zero depth ( d r = 0 ).The tree depth is given by the depth of its deepest node.In the case of SOT, this tree depth is always one.For some node locations, however, the SOT topology may trail off the acoustic connectivity to the root.In those cases, a multi-hop POT potentially improves upon this problem but does so at the expense of maximizing the tree depth.In many situations, the multi-hop ROT constitutes a compromise between SOT and POT with good acoustic connectivity and intermediate tree depth.Still, the optimal choice of topology generally depends on the actual node locations at hand.

Proposed scheme for distributed synchronization
For waveform-based network-wide synchronization on all distributed topologies, we consider a server-less peerto-peer operation on node pairs, i.e., one sending node providing the reference signal z j [n] and the receiving node owning the respective non-reference signal z i [n] ; see Fig. 7.Moreover, we aim at continuous processing of z j [n] and z i [n] on finite buffers, and, hence, their asyn- chronous generation of data needs to be continuously aligned with an asynchronous resampler in the loop.The closed-loop synchronization unit introduced in Section 2.3 can be efficiently used for such pairwise distributed synchronization.However, the synchronization unit must be configured on every non-reference node in a slightly different manner dependent on the role of the respective node.Specifically, the i-th non-reference node is to be configured either as a leaf node (switch position S = 0 ) or as an intermediate node (switch position S = 1 ) according to Fig. 7.
In other words, each node receives a local reference signal one-way, either directly from the reference node or from a parent node.Next, the node synchronizes its own microphone signal z i [n] and provides the synchronized signal z i,S [n] to its children according to the network topology.By doing so, the signal synchronization is propagated network-wide and uses computational resources of the whole WASN.Naturally, the process of networkwide synchronization will accumulate more latency in deeper networks.The overall duration for the synchronization to propagate from the root node to the deepest leaf node is roughly composed of two contributions: the initialization phase of DXCP-PhaT and its time-constant T DXCP (cf.Section 2.3) multiplied by the tree depth 9 .To accelerate network-wide synchronization, a synchronization gossip on rooted trees with moderate tree depth would thus be favorable.
The proposed network-wide distributed synchronization, however, was initially developed for use in a static WASN in [54], i.e., not considering any dynamic network changes usually occurring in real WASNs.

Buffer-based realization of closed-loop (online) synchronization unit
Our implementation of closed-loop time synchronization makes use of multiple buffers.A block diagram of the buffer-based time synchronization implemented on the i-th sensor node is depicted in Fig. 8a, where the node obtains the global reference signal from the root node r via a single-hop link ( j = r ) and thus belongs to the first network level with node depth d i = 1 .From the esti- mated SRO values ε ji [ℓ] delivered by the IMC controller, a real-valued ATD estimate is obtained as in ( 5) under requirement of the same frame shift N s in both SRO esti- mation and compensation.However, since both subsystems work on time-domain input signals, the former are allowed to use different frame sizes.The proposed bufferbased implementation of SRO compensation is designed for a frame size equal to the frame shift N s .Hence, the size of required buffers is a simple multiple of N s .While the integer-valued ATD τ int ji [ℓ] from ( 6) is com- pensated using a sliding N s -long window that is appro- priately moved over the resampler buffer, the remaining fractional ATD τ frc ji [ℓ] from ( 7) is removed by applying the STFT resampling method [24,36] that is also implemented for the frame size N s .In order to provide for causal resa- mpling, the resampler buffer must introduce at least one frame delay, such that the sliding window (SW) is able to move to the right.We choose a resampler buffer length of 3 frames, where the second frame corresponds to the reference position of the SW.In order to compensate for the resulting delay of one frame, an equivalent delay is applied to the received signal z j [n] via the delay buffer.
For sensor nodes with a bigger distance to the root node, i.e., d i > 1 , the individual delays of buffer-based SRO com- pensation in preceding levels accumulate and must be compensated using an additional microphone delay buffer (MDB) with a depth-dependent length L MDB = d i frames as shown in Fig. 8b.Analogously to the delay buffer, the MDB appends to the sensor-own signal z i [n] a delay of d i − 1 frames.In other words, the local microphone signal must be passed through the MDB for causal alignment with the delayed reference signal received along the network route.

Network organization using MST
For accurate waveform-based synchronization, acoustic connectivity between z i [n] and z j [n] is essential [55].
Since the connectivity is primarily governed by the distance between nodes, the network topology should generally be configured so as to keep geometric distances between nodes at a minimum.

Minimum spanning tree (MST) as topology
We therefore consider the graph-theoretical MST to maximize acoustic coupling and coherence between node pairs.The MST connects all vertices in a graph without loops and with the minimum possible total edge weight, in this context given by the distance between nodes [65].As two prominent examples that make use of MST, [66,67] utilize the concept of MST for route discovery to minimize a total Euclidean distance between nodes for energy-efficient multi-hop communication and networkwide signal enhancement, respectively.In contrast to our previous work in [54], we therefore adopt the MST topology to organize our network.Because algorithms for MST rooting are based on relative sensor positions, we assume that the coordinates of all involved nodes are known up to a certain estimation error 10 .In a realistic scenario, such estimates could be provided by dedicated methods for network self-calibration [68][69][70][71]. 10As shown in Section 7.3, the proposed buffer-based synchronization unit can be also successfully used for distributed synchronization of dynamic WASNs organized in simpler topologies without any knowledge of node positions.

Optimal choice of the reference node
While the MST is considered as the optimal solution for connecting all nodes with regard to acoustic connectivity, a choice of the reference node r is further required to obtain an actual WASN topology.With the goal of keeping the tree depth as small as possible, we propose to assign this reference role to the node with the smallest average distance to its neighbors.Note that electing the reference node then determines the directions of all edges in the otherwise undirected MST.
An example of an MST in a network with 13 nodes is depicted in Fig. 9a.Identifying node 4 as the the optimal reference node in the aforementioned sense yields the WASN topology shown in Fig. 9b, where the depth of individual nodes is color-coded by means of the corresponding inward edges.

Maintaining waveform-based synchronization in dynamic WASNs
A dynamic WASN should be able to adapt to network changes, such as appearance or failure of network nodes or links, without the need to restart the synchronization procedure from scratch, thus avoiding another time-intensive convergence that can cause undesirable degradation in the performance of a network-wide signal processing.This can be achieved by appropriately adapting the network topology in response to observed changes, while maintaining an already achieved waveform synchronization state (MWS) of persistent nodes that was attained right before the topology change.In this section, we show how the network-wide distributed synchronization presented in Section 3 can be maintained in a dynamic WASN encountering four fundamental types of possible network modifications: For simplicity, we here restrict modifications to one node or communication link at a time and further assume that the WASN has reached a good synchronization state before any such change takes place.This assumption is deemed reasonable, as the initial convergence period takes relatively little time (as we shall see) when considering continued long-term WASN operation 11 .Furthermore, the type and time-point T c of a network change are required to be known for the respective treatment.Obtaining this knowledge is credited to the information basis as provided by an address resolution protocol (ARP) [72] or a network discovery protocol (NDP) [73] outside the scope of this paper.
In essence, our strategy for MWS then is to automatically generate an optimal MST network topology for any configuration of nodes and coordinates, respectively, and do so every time a network change occurs.This approach allows us to formulate a mostly universal MWS protocol for handling the various types of changes in the network, requiring only few case specific actions.A summary of the proposed algorithm for operating a dynamic WASN is provided in Algorithm 1 and described in the following line by line.

Network-wide protocol steps (lines 4-9)
To find an optimal network topology, we generate a graph representation of all nodes, where edges are weighted with the Euclidean distance between nodes (line 4).This Euclidean graph is first corrected by removing those edges that correspond to unavailable communication links between nodes.Next, we find the minimum spanning tree via repeated execution of Prim's algorithm [74], considering every node as a possible starting point (line 5), while retaining the choice for global reference if the respective node is still available.If not, a node with the smallest average distance to its neighborhood has to be discovered and appointed as a new reference node (lines 6-8).Finally, based on the reference node, the direction of all edges in the MST are determined (line 9).

Node-specific protocol steps (lines 11-16)
Because the synchronized signal of each node is systematically delayed in proportion to the level it resides in the network, as discussed in Section 3.2, the MDB of nodes whose depth has changed needs to be resized accordingly (lines 11-12); see Fig. 8b.If a node moves closer to the reference, the MDB size is reduced by discarding the most recent frames.In contrast, the MDB of nodes that moved further away from the reference is increased in size by appending zeros on the right side.In both cases, this mechanism inevitably leads to a small time glitch in the synchronized microphone signals with respect to the local reference signal.This, however, does not negatively impact the SRO estimation process, as will be shown in Section 5.3 below 12 .
In addition to the adjustment of MDB size and content, the (a) and (d) types of network change require further attention that are detailed in the following.
Change type (a): A node newly integrated into the WASN can usually rely on an already synchronized signal of its topological parent node as a (local) reference and hence synchronize its own microphone signal to it.Until the synchronization is converged, however, its output signal is still asynchronous and should not be utilized as local reference by its topological children nodes.We therefore temporarily freeze the SRO estimation process of any node that directly receives reference from a newly integrated node for a freezing time T f (lines [13][14].Dur- ing T f , the children of a newly integrated node discard a reference signal provided by it and hold their previous SRO estimate 13 .
Change type (d): As mentioned, failure of the reference sensor node requires appointing a new reference.Because this new reference node no longer receives a reference for itself, the previously explained method of freezing the SRO estimate is applied permanently (lines [15][16].By doing so, operation of the WASN can continue seamlessly and without the need to adjust to a significantly different reference sampling rate of the newly elected reference node.

Illustration of the proposed mechanism for dynamic WASN operation
In order to demonstrate the methods proposed in Sections 3 and 4 of this paper, we firstly create a synthetic dataset to simulate a WASN with an exemplary topology, which, after initial convergence, is subjected to one network modification of each type.Before examining the resulting effects on distributed SRO estimation for the initial and dynamic WASNs, we first discuss our procedure for generating the synthetic WASN data in a SINS apartment.A large-scale evaluation of the proposed methods is conducted in Section 6.

WASN simulation in a SINS apartment
With help of Paderbox and PaderWASN toolboxes [44], we simulate14 a WASN in an artificially generated SINS apartment [58].In our setup, a total number of 13 nodes, each equipped with a single microphone, are distributed in the apartment.It consists of a living room, a hall, a bedroom, a bathroom, and a toilet.Furthermore, three static acoustic sources (music H4, female speaker N6, male speaker B0) are placed in the living room, all of which are active for almost the total duration of simulated signals of 9 min.The locations of the acoustic sources 15 and all nodes of the acoustic sensor network are depicted in Fig. 10a, where the SINS apartment from Fig. 1 is depicted as shaded background.The room impulse responses between sources and nodes in this simulated environment are provided by the authors of [75] with reverberation time T 60 ≈ 700 ms .Node 9 participates in all WASN configurations to provide sufficient acoustic coupling between sensor nodes in the living room and outside 16 .The idea of this small-space WASN is a moderate set of proximity nodes with reasonable acoustic coherence and manageable wireless link for sustainable synchronization.Some critical nodes may temporarily leave the network and ideally return with continued synchronization to the momentary reference node (the time for network resynchronization may otherwise be in the order of 1-2 min as shown by Fig. 14 below) and new nodes shall gracefully integrate without disrupting the existing network.
All source signals exhibit a reference sampling rate of f r = 16 kHz .While a music source is downloaded from the Freesound datasets [76], clean speech signals are taken from the LibriSpeech corpus [77].The resulting microphone signals are superimposed by uncorrelated computer-generated sensor noise of constant power yielding a global signal-to-noise ratio (SNR) of around 33 dB averaged over all sensor nodes; see Fig. 10b.The SROs ε i of individual nodes are simulated by using an overlap-save method (OSM) for signal resampling [22] with FFT size N OSM = 2 13 , a frame size of N OSM /2 , a frame shift N OSM /4 , and a Hann analysis window.The ε i values are drawn from a uniform distribution on the interval [−100; 100] ppm except for ε 0 , which is set to zero.
The buffer-based closed-loop synchronization unit from Fig. 8 is implemented as described in Section 3.2.The parameters of the DXCP-PhaT, the IMC controller and the STFT resampler are given in Table 1.

Synchronization in the initial WASN
From the generated WASN environment, an initial WASN with N WASN = 5 nodes is drawn consisting of the nodes {0, 1, 6, 7, 9} as depicted in Fig. 10a.This initial WASN is used to demonstrate the behavior of our MWS protocol proposed in Section 4. While the node r = 0 is chosen as the refer- ence node, the nodes {1, 9} and {6, 7} represent the first and the second rank of network depth, respectively, with SRO  15 Moving sources are not in the scope of the presented analysis.The typical experience of a moving source is a temporary perturbation of the waveform-based SRO estimation when a specific trajectory induces time-varying time delay (i.e., the equivalent of SRO-based time drift) at the microphones [37].The precise analysis of the limitations is still an open research topic and the working assumption of spatially fixed acoustic sources is still very common in SRO estimation.Practically, the construction of realistic dynamic acoustic scenes for evaluation is already complicated by the computationally prohibitive simulation of time-varying room impulse responses, whereas the easier case of alternating sources does not impose a major problem [44].In real-time systems with real signals, we have observed that the estimation will stabilize to a new steady state when the sources halt to a new position. 16Further connections between nodes from the living room and other rooms are avoided in MST building to respect their potential acoustic decoupling.
Figure 10c provides an overview over the acoustic activity for each of the three sources over a limited timespan of 300 s .The first source H4 is playing music in order to provide for continuous acoustic excitation in the background, while the sources N6 and B0 correspond to female and male speakers, respectively, simulating a conversation in the living room.
For the initial WASN, Fig. 10d presents the convergence of SRO estimates after an initialization phase of DXCP-PhaT at the very beginning.The SRO trajectories nodes {1, 9} with depth 1 converge rather fast to their target values, depicted by the dashed lines of the respective color.Note that the SRO estimations of nodes {6, 7} initially take off in the wrong direc- tion, which is however appropriate with respect to their local parent node 9 during the transitional time period before its settling.The wrong SRO estimations may even overshot according to the time constants of the DXPC-PhaT measurement and the IMC controller and are consistently pulled into the right direction of their target values upon settling of their parent node 9. Overall, the initial WASN then achieves good synchronization state within the first 100 s.

Dynamic WASN modifications
In order to apply a network modification of each type to the initial WASN of Fig. 10a The modified topologies are depicted in Fig. 11 as a result of the network-wide processing steps of the proposed MWS protocol in Section 4. Taking a closer look at the modified topologies, it is plausible that all of them represent the desired MST under given constraints.Thus, the network topology remains optimal even after the network modification.
Figure 12 shows the SRO estimation of all involved nodes for each network modification type in subfigures using a freezing time T f = 100 s .This value of T f safely upper bounds the settling time of newly integrated nodes as will be shown in Section 6.2. Figure 12a firstly demonstrates the expected convergence of the newly integrated node 4 to its true SRO with respect to the reference, while the persistent nodes are obviously unaffected by the network modification.Figure 12b, c,  and d show that all persistent nodes in case of these network modifications maintain their SRO estimation state, which is especially evident from Fig. 12b, where all nodes remain in the modified WASN.Naturally, in Fig. 12c and d the SRO trajectories of discontinued nodes 7 and 0 disappear for t > T c .Most importantly, application of the proposed protocol avoids a timeconsuming reconvergence in (d).

Large-scale evaluation
For large-scale quantitative assessment, we describe the rendering of a richer database of dynamic WASN conditions.Our proposals from Sections 3 and 4 for network-wide SRO estimation and compensation are then evaluated on this data in terms of estimation precision, settling time, and synchronization accuracy.

Generation of database for dynamic WASN
Using the setup from Section 5.1, we now create random network modifications based on 50 random, unique initial WASN topologies.For the latter, we sample random numbers N WASN ∈ {4, 5, 6} from the entire set of 13 pos- sible sensor nodes of the simulated WASN environment and construct the MST-optimal topology as described in Section 3.3.To avoid any ill-conditioned links through walls, every node outside the living room connects to node 9, which is included in every topology.The time point of a network change T c is determined randomly from the interval T c ∈ [250, 290] s , such that sufficient simulation time is available for network-wide synchronization before and after the network modification.Network modifications of each type are then drawn as follows.For modification (a), the new node is sampled from the set of nodes not part of the initial WASN.For modification (b), one of the existing communication links is randomly disabled, however, maintaining the previously described bottleneck-role of node 9.For modification (c), one non-reference node from the initial WASN is randomly selected to be removed.Finally, for modification (d), the global reference node is removed from each initial WASN.

Network-wide SRO estimation
In order to examine the immediate effect of topology changes including the application of Algorithm 1 on the SRO estimation error of persistent nodes, Fig. 13 specifically compares the root-mean-square error RMSE ε of SRO within the last 10 s "before" topology changes (left) with that of the first 10 s after topology changes (middle) by boxplots, where one data-point corresponds to one of the initial WASN topologies.We firstly observe that RMSE ε before T c is very small with a median of only 0.04 ppm .This indicates that all topologies under inves- tigation were given enough time for initial convergence.Moreover, regardless of the specific type of network modification (a)-(d) occurring at T c , there is no sig- nificant increase in the RMSE ε values observed after T c .A number of outliers can be noticed, all of which, however, rest safely below a threshold of 1 ppm.Apart from that, the average RMSE ε in (d) appears to be slightly ele- vated compared to that of all other cases.This is due to the small SRO estimation error of the newly appointed reference node just before t = T c and it requires the duration of a network settling time T s after T c to propa- gate this slightly new reference sampling rate to all nodes.Overall, the MWS procedure in Algorithm 1 for handling the topology changes is successful in sustaining the SRO estimation accuracy of the persistent nodes.Figure 13 (right) then shows an extra boxplot of the RMSE ε of only the newly integrated "joined" nodes based on the last 10 s of the entirely simulated signal.With its overall similar RMSE ε distribution as compared to the initial convergence "before" topology change, we can once more conclude the successful handling of the related network change.
Figure 14 (left) depicts the corresponding settling time T s of the SRO estimation, which is here defined as the time period from initial synchronization startup until the temporal RMSE ε (t) falls below a threshold RMSE ε (t ≥ T s ) ≤ 1 ppm .In the diagram, the settling times of all initial WASNs are split by the depth of the involved nodes, which demonstrates a staggered nature of settling according to the synchronization gossip from the root to the leaves.Nodes located closest to the root naturally settle first, as they are directly connected to the given reference, while deeper nodes still rely on the ongoing settling at intermediate node depths (as illustrated by Fig. 10d).After initial settling of the entire network, any newly "joined" node, irrespective of its corresponding node depth, exhibits the fast settling time with median of about 50 s (right) as found for initial settling at depth d i = 1 (left) also.Of course, the actual settling times are also governed by the actual SRO of each node, which determines the spread of the boxplots.After 100 s, almost all of the newly "joined" nodes have attained synchronization, which determines our choice of the freezing-time parameter T f in the MWS protocol of Algorithm 1.

Network-wide signal synchronization
After SRO estimation and evaluation across the network, the related time synchronization of waveforms is eventually assessed in terms of an averaged mean-squared coherence (AMSC) [78] and a signal-to-synchronization-noise ratio 17   where Var(•) is an operator for signal variance and the waveform z i,r [n] refers to a synchronous representation of the actual node signal z i [n] at the sampling rate of the respective reference node r.The signal z i,AS [n] in the i- th node is determined by the resampled signal z i,S [n] from Fig. 8, but compensated for a residual time offset Var(z i,r ) Var(z i,AS − z i,r ) , ) that accumulates in the closed-loop synchronization unit due to transitional SRO estimation.
Firstly analyzing the initial WASN before a topology change, the resulting AMSC and SSNR values obtained within last 10 s before T c are presented in Fig. 15 (left).The results confirm poor signal synchronization of the raw asynchronous "async" signals, indicated by a median AMSC of only 0.15 and a median SSNR of about −3 dB .Outliers at AMSC = 1 do belong to the initial WASNs with node 0 in the role of a non-reference node with ε 0 = 0 ppm , while similar outliers are not visible in the SSNR due to axis limitations.For synchronized "sync" signals, however, the AMSC values appear to be very close to the maximum possible value of 1 and the SSNR assumes a reasonable median of about 12 dB with some variance.The moderate SSNR here is explained by the well-known sensitivity of the SSNR metric with respect to remaining small SRO and timing errors of signals.In summary, these results indicate good WASN synchronization just before the time point of network change T c .
Then, with dynamic network conditions (a) to (d) according to Section 6.1 and with the application of the MWS protocol of Algorithm 1, the distribution of resulting AMSC and SSNR values obtained on the persistent nodes within first 10 s after T c are shown in Fig. 15 (mid- dle).As a result of our coordinated treatment of the dynamic conditions, the signal synchronization attained before topology changes well sustains into the phase after the modification for the subset of persistent nodes with a median of 12 to 14 dB SSNR.As shown in Fig. 15 (right), the remaining subset of newly "joined" nodes evaluated within last 10 signal seconds of the simulation indicates a synchronization comparable to that of persistent nodes, i.e., with very good AMSC values and only a slight loss of SSNR once more being attributed to the sensitivity of this metric to small residual timing errors.

Ablation studies
Due to the absence of a reference approach that would operate precisely under the same dynamic network conditions as the proposed methods, this section investigates the requirement of certain processing steps and the robustness to assumptions made.Specifically, the Algorithm 1 for maintaining waveform synchronization is evaluated against several ablated versions of itself in Section 7.1.Then, the former assumption of topology changes after network convergence is abandoned for early network changes taking place in Section 7.2.Eventually, a fallback network configuration to operate without the knowledge of the sensor coordinates and consequently without MST is described in Section 7.3.

Ablation studies of the proposed method
We investigate the effects on SRO estimation when omitting parts of Algorithm 1, specifically (i) When not temporarily freezing children of newly integrated nodes (line 14) (ii) When not resizing the MDB of nodes whose depth has changed (line 12) and (iii) When not freezing the SRO estimation of newly elected reference nodes (line 16).
In doing so, we rely on the previously introduced network modifications (a) and (d) as shown by Figs.11 and 12 for which the former simulations are here repeated with ablations but otherwise under the identical conditions as before.Only for a considerable effect of ablation (ii) we have to reduce the DXCP-PhaT frame length to N = 2 11  to effectively increase the WASN's sensitivity to MDB size mismatches under the limited WASN size and considered time span.Figure 16 depicts the resulting SRO estimation over time after network modifications at T c = 200 s, as before, where ablation (i) is applied with network modification (a), while ablations (ii) and (iii) are applied with network modification (d).
Figure 16 (i) shows the contrast with the former Fig. 12a that the SRO estimation of node 9, as a child of the newly integrated node 4, degrades shortly after T c and only recovers upon convergence of node 4. Of course, the grandchildren of node 4 (i.e., nodes 6 and 7) are affected, too, although with a delay according to their depth within the MST.As known from Fig. 12a, temporarily freezing direct children of newly integrated nodes would alleviate this problem.
Figure 16 (ii) refers to a dynamic modification with a new reference node 9. Since the depth relationship of 6, 7, and 9 remains unchanged in this very example, their SRO estimation is apparently stable with time.However, node 1 severely degrades at about 75 s after the change time T c due to a modified depth relationship between node 1 and node 9 (formerly relayed by node 0) and with the corresponding mismatch of MDBs not resized properly, which eventually violates the assumption of similar content of the input signals for waveform-based SRO estimation.
Figure 16 (iii) finally depicts a contrast with former Fig. 12d when resampling of the newly elected reference node 9 is somewhat naively discontinued, which corresponds to resetting its SRO estimation to zero (instead of continued resampling with frozen SRO estimation according to Algorithm 1).Hence, all descendants of the new reference node (the entire WASN) are required to adjust to the new reference condition by reconvergence, which temporarily and unnecessarily presents an undetermined state of the sensor network.

Dynamic WASN with early network change
For clarity of the arrangement, a steady-state synchronization was assumed in Section 5.3 before any network change takes place and is being coordinated by the proposed MWS protocol in Algorithm 1.The steady-state assumption there was inherently reasonable, since it is less interesting to maintain the state of a WASN if its nodes have not yet converged.However, in practice an early network change may arise before convergence and the intention now is to show that convergence is not a strict requirement for the employment of the proposed methods.Figure 17 therefore considers a network modification (a), a newly integrated node 4 before convergence of the initial network.It turns out that the early network change does not cause any permanent complications when Algorithm 1 is applied.The acoustic sensor network only needs more time to settle (here around 2.5 min after the change) compared to the idealized case from Fig. 12a (only 1 min for new settling after the change).

Distributed buffer-based synchronization without knowledge of positions of the sensor nodes
In this section, the performance of the buffer-based online realization of distributed WASN synchronization from Section 3.2 is investigated for the case if no knowledge of the node coordinates is available.In such a scenario, it is neither possible to build the geometric MST topology nor to optimally choose the reference node as described in Section 3.3.Instead, we may fall back to a centralized SOT topology mentioned in Section 3.1 with a fixed or randomly chosen reference node.For the analysis, we rely on sampled sets of nodes as described in Section 6.1 and evaluate WASN performance attained in the steady state between time 190 and 200 secs (the same time span as used for previous Figs.13 (left) and 15 (left) with MST). Figure 18 summarizes the outcomes, where "MST" stands as an anchor for the previous results, "SOT" refers to star-out topology with node 9 always the reference, and "rSOT" instead uses a random reference node (newly sampled without special treatment of node 9).With the metrics at hand we do observe similar network performances for all configurations, with maybe marginally reduced RMSE of the SRO estimation and slightly advanced synchronization SSNR for the SOT topologies.This can be attributed to the minimum network depth of the SOT and thus an earlier and slightly better network convergence in the available simulation time, while the larger geometric distances between nodes connected along topology edges do not significantly impair the acoustic coupling in our small-space SINS environment.
In light of this ablation, the MST topology indeed has not proved superior in our simulated context, but we do see the reason in our relatively small-scale configuration and in the simulation of low-noise microphones.Conversely, we still see the necessity of local operation organized in an MST configuration (rather than SOT) when considering larger scenarios or use cases with increased requirements as of • A lower acoustic coherence between distant nodes, • An increased noise floor of low-cost microphones, • A limited wireless connectivity of distant nodes, • Larger number of sensor nodes in the network, • And a necessary decentralization for network robustness or distribution of computational load.
These requirements may appear in crowded indoor networks with numerous sensors or in large-scale outdoor networks, for instance, biosphere monitoring.It turns out that such immense diversity of WASNs has not been represented in our analysis of small-scale configurations yet.Still, it was our intention to demonstrate the utility of proposed methods, including the closed-loop synchronization unit and the dynamic MWS protocol, under several circumstances.

Conclusions
An online distributed waveform-based sampling-time synchronization for dynamic wireless acoustic sensor networks (WASNs) has been described in this paper and applied to a simulated smart home environment for evaluation.The essential system component is a buffer-based implementation of a closed-loop synchronization unit (with resampling and sampling-rate offset estimation in a loop) for any two nodes of the network.Our specific unit makes use of a double-cross-correlation processor for waveform-based estimation of sampling rate offset (SRO) and of a buffer-based SRO compensation by an STFT-based resampling method.This estimation and compensation in the closed-loop architecture are here coupled by an internal-model-control unit.The suggested pairwise node synchronization unit is then employed for distributed synchronization of WASNs organized in a rooted-tree topology with minimum spanning tree.Our paper has demonstrated how the synchronization gossip in this case propagates from the root to the leaves of the network.Eventually, a protocol for maintaining waveform-based synchronization has been proposed for scenarios with random modifications of the original WASN taking place.Our experimental evaluation in the environment of a simulated apartment with several connected rooms proved efficiency and robustness of the proposed system (for instance, against unknown sensor coordinates, early modification, and some of the ablations studied) for sustainable network-wide SRO estimation and signal synchronization in dynamic WASNs.

Fig. 4
Fig. 4 Time-synchronization of two nodes in a open-loop and b closed-loop architectures

Fig. 5
Fig. 5 Equivalent block diagrams of the closed-loop synchronization architecture in the domain of control signals: a IMC filter, b IMC controller

Fig. 8
Fig. 8 Block diagrams of buffer-based synchronization unit on the i-th sensor node: a single-hop linked to the reference node ( j = r , d i = 1 ); b connected only to a local reference ( j = r , d i > 1) (a) Appearance of new nodes, (b) Failure of communication links, (c) Failure of non-reference sensor nodes, (d) Failure of the reference sensor node.

Fig. 10 a
Fig. 10 a Setup of synthetic data generation with an initial WASN: root r = 0 and N WASN = 5 ; b individual SNRs observed on the sensor nodes; c activity pattern of the sources; and d SRO estimation in the initial static WASN , we choose the time point T c = 200 s after settling.Specifically, consider (a) The appearance of a new sensor node 4 (b) The failure of link between nodes 6 and 9 (c) The failure of the non-reference node 7 (d) The failure of the reference node 0

Fig. 11
Fig. 11 Modified topologies obtained after executing of the network-wide protocol steps of Algorithm 1 on the initial WASN from Fig. 10a: a appearance of a new node 4, b failure of communication link between node 6 and node 9, c failure of non-reference node 7, d failure of reference node 0

Fig. 12
Fig. 12 SRO estimation of WASN undergoing modifications from Fig. 11 at T c = 200 s : a appearance of a new node 4, b failure of communication link between node 6 and 9, c failure of non-reference node 7, d failure of reference node 0

Fig. 13 Fig. 14 Fig. 15
Fig.13 RMSE ε values for persistent nodes within last 10 s before T c (left) and within first 10 s after T c (middle) and for newly joined nodes within last 10 signal seconds (right)

Fig. 17
Fig.17 Early network change for the network modification (a) from Fig.11a

Fig. 18
Fig. 18 Influence of topology type on WASN synchronization with and without knowledge of positions of the sensor nodes: a RMSE values of SRO estimation, b SSNR values of SRO compensation

Table 1
Parameters of pairwise synchronization unit