Acoustic Signal Extraction and Enhancement

We aim in this project at unifying and extending state-of-the-art methodologies for source localization, signal extraction and enhancement, and metaparameter estimation by synthesizing their respective potentials both on a theoretical and practical level  for the specific goals in (wireless) acoustic sensor networks (ASNs).  Progress is expected from adopting a Bayesian perspective as far as possible and adequate, which should lead to 'informed' signal extraction and enhancement algorithms which optimally exploit prior knowledge, e.g., on source locations,  and on the statistics of the available observations, while still preserving the advantages of 'blind' algorithms, derived from the TRINICON and BENCH frameworks. Further reflecting its central role, this project derives additional metaparameters for use on Layer 3, and is tightly linked to  Layer 1 when addressing the ASN-specific tasks of sensor utility assessment and the decomposition of developed algorithms for distributed computing.

Acoustic signal extraction and enhancement in natural, ideally unconstrained acoustic scenes, forms the basis for understanding the acoustic scene addressed in Layer 3. For this purpose, this project initially assumes that Layer 1 can provide a network of acoustic sensors of known topology and of sufficient utility which are also sufficiently well synchronized for distributed sensing and signal processing, and based on this, tackles the following tasks:

  • (a) localization of an unknown number of point sources,
  • (b) extraction and enhancement of multiple target signals in a noisy and reverberant environment, including suppression of noise and interference and dereverberation,
  • (c) extraction of characteristic metaparameters supporting, e.g., signal classification and scene analysis in Layer 3,
  • (d) definition of criteria for assessing utility of individual sensors for
  • (e) decomposition of algorithms to suit distributed processing in ASNs.

Viewing the beforementioned tasks essentially as parameter estimation problems, we aim at  a unified Bayesian framework and expect to derive new and more efficient algorithms by including statistical prior knowledge and by addressing multiple tasks as a joint parameter estimation problem. Moreover, the Bayesian view will allow for a precise assessment of underlying assumptions in the problem formulations and thereby point to current limitations and inherent potential for improvement.

For a), the localization of desired point sources, we start from given statistical a priori knowledge on sensor locations. In a first step, candidate positions for targets are identified from observations of individual sensors. Assuming that a network of synchronized sensors with sufficient utility is established for each target, the actual localization algorithm will determine its position with the accuracy as required by the signal extraction algorithms. In this project we concentrate on Direction of Arrival (DoA) estimation and Time Difference of Arrival (TDoA) estimation based on the joint statistics (e.g., crosscorrelation) of synchronized sensor signal pairs, which form the basis of a broad range of source localization methods. Prior knowledge on the relevance of sources, e.g., based on previous signal classification in Project 4, in order to separate the targets from irrelevant interference will be integrated into the localization concept. 

The actual signal extraction and enhancement task b) involves separation of multiple targets from each other and from unwanted signals or signal components, including ba) suppression of interfering sources and noise,  bb) dereverberation of the target signals, and  bc) suppression or cancellation of acoustic echoes for sources for which references (e.g., loudspeaker signals) are available. Ideally, the processed signal for each target would be  identical to that captured directly at the target source in a quiet environment. Tasks ba) and bb) will form the core part of the project and are addressed in three work packages, while bc) is planned for the second phase of the project. On the algorithmic level, b) essentially requires the estimation of an optimum linear time-variant MIMO filtering scheme, assuming that nonlinearities in electroacoustic transducers can be disregarded. For ba) and bb), this MIMO system will exploit the source signals' diversity in space, time and frequency for separation and suppression. Depending on the amount of prior information in the respective domains (e.g., location, spectral envelope, activity over time of the sources, respectively) the algorithms for determining optimum filter parameters will be supervised or blind to varying degrees. Blindness can  be reduced by incorporating statistical models for the source signals or the acoustic channels.

This problem description obviously suggests the use of Bayesian estimation techniques in order to incorporate as much prior knowledge as possible in a very flexible manner. To this end, we will study two generic concepts which have demonstrated their potential in traditional multichannel acoustic signal
processing, but not yet for ASNs: For one, the generic TRINICON ('TRIple N Independent component analysis for CONvolutive mixtures) framework for blind MIMO signal processing of multiple independent source signals allows to exploit Nonstationarity, Nonwhiteness, and Nongaussianity ('TRIple N') of the
source signals for parameter estimation and as such is especially well suited for localization, separation, and dereverberation of speech and other acoustic signals. While it has already been shown that TRINICON can benefit from additional prior information and can safely be expected to be applicable to
the given ASN scenario, the expected benefits of a Bayesian formulation of TRINICON are still unexplored. The second concept to be considered, the Blind EqualizatioN and CHannel  identification (BENCH) is a source and system identification method which is already known in a Bayesian formulation and efficient EM-type algorithms are  available. In this project it should  be generalized from a narrowband to a broadband signal model and also be extended to handle more than a single point source. Ideally, TRINICON and BENCH will be unified under the Bayesian paradigm.

The extraction of metaparameters c), such as Signal-to-Noise  Ratio (SNR), Signal-to-Distortion Ratio (SDR), Coherent-to-Diffuse  Ratio (CDR), or reverberation time T60, will then be based on the signal processing algorithms for a) and b).

To efficiently implement the resulting algorithms in ASNs, the sensors supporting the processing of individual sources should be optimally chosen. While for a) to c) it was assumed that this sensor set is given, we investigate in d) criteria for assessing the usefulness ('utility') of candidate sensors for a given algorithm in a given scenario. As a precondition, these candidates must have been selected as sufficiently useful in terms of the criteria of Layer 1 (see Project 1). The desired algorithm-specific criteria must be based on sufficiently short observation intervals to allow tracking of moving sources and time-varying acoustic scenarios.