Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Die Universität Paderborn im Februar 2023 Bildinformationen anzeigen

Die Universität Paderborn im Februar 2023

Foto: Universität Paderborn, Hannah Brauckhoff

Dr.-Ing. Jörg Schmalenströer

Dr.-Ing. Jörg Schmalenströer

Nachrichtentechnik (NT)

Akademischer Oberrat - (Auftrags-)Forschung & Lehre

+49 5251 60-3623
+49 5251 60-3627
Pohlweg 47-49
33098 Paderborn
Honors & Distinctions
  • 2013: Forschungspreis der Universität Paderborn: Verlässliche Navigation in Gebäuden
  • 2017:  "Top 10% Paper Award" at IEEE Multimedia Signal Processing Workshop, Luton, U.K.. Paper title: Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming, by. J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, and R. Haeb-Umbach
  • Verlässliche Navigation in Gebäuden
    2013 - 2014
    Forschungspreis der Universität Paderborn
  • Bonuspoint: Entwicklung eines Algorithmus zur Positionsbestimmung für das Fujitsu Forum 2014
    Oct. 2014 - Dec. 2014
    Fujitsu Technology Solutions GmbH - Product Development Group / Innovation Lab
  • Big Data Real Time Shopping Assistant: Entwicklung eines Client-Server-basierten Ansatzes zur Positionsbestimmung mittels Bluetooth Low Energy
    Mar. 2015 - Dec. 2015
    Fujitsu Technology Solutions GmbH - Product Development Group / Innovation Lab
  • Entwicklung einer Bluetooth Low Energy Kommunikationsplattform
    Nov. 2015 - Nov. 2016
    Hörmann KG Antriebstechnik
  • Verteilte akustische Signalverarbeitung über funkbasierte Sensornetzwerke
    2016 - 2019
    Deutsche Forschungsgemeinschaft Forschungsgruppe: DFG FOR 2457 "Akustische Sensornetzwerke"
Dr.-Ing. Jörg Schmalenströer
11/2013 - heute

Akademischer Oberrat im Fachgebiet Nachrichtentechnik

04/2010 - 11/2013

Akademischer Rat im Fachgebiet Nachrichtentechnik



Thema: Akustische Szenenanalyse für die ambiente Kommunikation im vernetzten Haus

05/2004 - 03/2010

Wissenschaftlicher Mitarbeiter am Fachgebiet Nachrichtentechnik

Forschungsschwerpunkt: Ambiente Kommunikation im vernetzten Haus im EU-Projekt Amigo

10/1999 - 05/2004

Studium der Elektrotechnik an der Universität Paderborn

Liste im Research Information System öffnen


Neural Network Based Carrier Frequency Offset Estimation From Speech Transmitted Over High Frequency Channels

J. Heitkämper, J. Schmalenstroeer, R. Haeb-Umbach, in: Proceedings of the 30th European Signal Processing Conference (EUSIPCO), 2022

The intelligibility of demodulated audio signals from analog high frequency transmissions, e.g., using single-sideband (SSB) modulation, can be severely degraded by channel distortions and/or a mismatch between modulation and demodulation carrier frequency. In this work a neural network (NN)-based approach for carrier frequency offset (CFO) estimation from demodulated SSB signals is proposed, whereby a task specific architecture is presented. Additionally, a simulation framework for SSB signals is introduced and utilized for training the NNs. The CFO estimator is combined with a speech enhancement network to investigate its influence on the enhancement performance. The NN-based system is compared to a recently proposed pitch tracking based approach on publicly available data from real high frequency transmissions. Experiments show that the NN exhibits good CFO estimation properties and results in significant improvements in speech intelligibility, especially when combined with a noise reduction network.

Data-driven Time Synchronization in Wireless Multimedia Networks

H. Afifi, H. Karl, T. Gburrek, J. Schmalenstroeer, in: 2022 International Wireless Communications and Mobile Computing (IWCMC), IEEE, 2022


On Synchronization of Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling Rate Offsets and Speaker Changes

T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022


Informed vs. Blind Beamforming in Ad-Hoc Acoustic Sensor Networks for Meeting Transcription

T. Gburrek, J. Schmalenstroeer, J. Heitkaemper, R. Haeb-Umbach, in: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), IEEE, 2022


A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network

T. Gburrek, C. Boeddeker, T. von Neumann, T. Cord-Landwehr, J. Schmalenstroeer, R. Haeb-Umbach, arXiv, 2022



Iterative Geometry Calibration from Distance Estimates for Wireless Acoustic Sensor Networks

T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021


Online Estimation of Sampling Rate Offsets in Wireless Acoustic Sensor Networks with Packet Loss

A. Chinaev, G. Enzner, T. Gburrek, J. Schmalenstroeer, in: 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 1-5

Open Range Pitch Tracking for Carrier Frequency Difference Estimation from HF Transmitted Speech

J. Schmalenstroeer, J. Heitkaemper, J. Ullmann, R. Haeb-Umbach, in: 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 1-5

On Source-Microphone Distance Estimation Using Convolutional Recurrent Neural Networks

T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: Speech Communication; 14th ITG-Symposium, 2021, pp. 1-5

Geometry calibration in wireless acoustic sensor networks utilizing DoA and distance information

T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, EURASIP Journal on Audio, Speech, and Music Processing (2021)

Due to the ad hoc nature of wireless acoustic sensor networks, the position of the sensor nodes is typically unknown. This contribution proposes a technique to estimate the position and orientation of the sensor nodes from the recorded speech signals. The method assumes that a node comprises a microphone array with synchronously sampled microphones rather than a single microphone, but does not require the sampling clocks of the nodes to be synchronized. From the observed audio signals, the distances between the acoustic sources and arrays, as well as the directions of arrival, are estimated. They serve as input to a non-linear least squares problem, from which both the sensor nodes’ positions and orientations, as well as the source positions, are alternatingly estimated in an iterative process. Given one set of unknowns, i.e., either the source positions or the sensor nodes’ geometry, the other set of unknowns can be computed in closed-form. The proposed approach is computationally efficient and the first one, which employs both distance and directional information for geometry calibration in a common cost function. Since both distance and direction of arrival measurements suffer from outliers, e.g., caused by strong reflections of the sound waves on the surfaces of the room, we introduce measures to deemphasize or remove unreliable measurements. Additionally, we discuss modifications of our previously proposed deep neural network-based acoustic distance estimator, to account not only for omnidirectional sources but also for directional sources. Simulation results show good positioning accuracy and compare very favorably with alternative approaches from the literature.

A Database for Research on Detection and Enhancement of Speech Transmitted over HF links

J. Heitkaemper, J. Schmalenstroeer, V. Ion, R. Haeb-Umbach, in: Speech Communication; 14th ITG-Symposium, 2021, pp. 1-5


Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments

J. Heitkaemper, J. Schmalenströer, R. Haeb-Umbach, in: INTERSPEECH 2020 Virtual Shanghai China, 2020

Speech activity detection (SAD), which often rests on the fact that the noise is "more'' stationary than speech, is particularly challenging in non-stationary environments, because the time variance of the acoustic scene makes it difficult to discriminate speech from noise. We propose two approaches to SAD, where one is based on statistical signal processing, while the other utilizes neural networks. The former employs sophisticated signal processing to track the noise and speech energies and is meant to support the case for a resource efficient, unsupervised signal processing approach. The latter introduces a recurrent network layer that operates on short segments of the input speech to do temporal smoothing in the presence of non-stationary noise. The systems are tested on the Fearless Steps challenge database, which consists of the transmission data from the Apollo-11 space mission. The statistical SAD achieves comparable detection performance to earlier proposed neural network based SADs, while the neural network based approach leads to a decision cost function of 1.07% on the evaluation set of the 2020 Fearless Steps Challenge, which sets a new state of the art.

Deep Neural Network based Distance Estimation for Geometry Calibration in Acoustic Sensor Network

T. Gburrek, J. Schmalenstroeer, A. Brendel, W. Kellermann, R. Haeb-Umbach, in: European Signal Processing Conference (EUSIPCO), 2020

We present an approach to deep neural network based (DNN-based) distance estimation in reverberant rooms for supporting geometry calibration tasks in wireless acoustic sensor networks. Signal diffuseness information from acoustic signals is aggregated via the coherent-to-diffuse power ratio to obtain a distance-related feature, which is mapped to a source-to-microphone distance estimate by means of a DNN. This information is then combined with direction-of-arrival estimates from compact microphone arrays to infer the geometry of the sensor network. Unlike many other approaches to geometry calibration, the proposed scheme does only require that the sampling clocks of the sensor nodes are roughly synchronized. In simulations we show that the proposed DNN-based distance estimator generalizes to unseen acoustic environments and that precise estimates of the sensor node positions are obtained.


MARVELO - A Framework for Signal Processing in Wireless Acoustic Sensor Networks

H. Afifi, J. Schmalenstroeer, J. Ullmann, R. Haeb-Umbach, H. Karl, in: Speech Communication; 13th ITG-Symposium, 2018, pp. 1-5

Signal processing in WASNs is based on a software framework for hosting the algorithms as well as on a set of wireless connected devices representing the hardware. Each of the nodes contributes memory, processing power, communication bandwidth and some sensor information for the tasks to be solved on the network. In this paper we present our MARVELO framework for distributed signal processing. It is intended for transforming existing centralized implementations into distributed versions. To this end, the software only needs a block-oriented implementation, which MARVELO picks-up and distributes on the network. Additionally, our sensor node hardware and the audio interfaces responsible for multi-channel recordings are presented.

Benchmarking Neural Network Architectures for Acoustic Sensor Networks

J. Ebbers, J. Heitkaemper, J. Schmalenstroeer, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany, 2018

Due to their distributed nature wireless acoustic sensor networks offer great potential for improved signal acquisition, processing and classification for applications such as monitoring and surveillance, home automation, or hands-free telecommunication. To reduce the communication demand with a central server and to raise the privacy level it is desirable to perform processing at node level. The limited processing and memory capabilities on a sensor node, however, stand in contrast to the compute and memory intensive deep learning algorithms used in modern speech and audio processing. In this work, we perform benchmarking of commonly used convolutional and recurrent neural network architectures on a Raspberry Pi based acoustic sensor node. We show that it is possible to run medium-sized neural network topologies used for speech enhancement and speech recognition in real time. For acoustic event recognition, where predictions in a lower temporal resolution are sufficient, it is even possible to run current state-of-the-art deep convolutional models with a real-time-factor of 0:11.

Efficient Sampling Rate Offset Compensation - An Overlap-Save Based Approach

J. Schmalenstroeer, R. Haeb-Umbach, in: 26th European Signal Processing Conference (EUSIPCO 2018), 2018

Distributed sensor data acquisition usually encompasses data sampling by the individual devices, where each of them has its own oscillator driving the local sampling process, resulting in slightly different sampling rates at the individual sensor nodes. Nevertheless, for certain downstream signal processing tasks it is important to compensate even for small sampling rate offsets. Aligning the sampling rates of oscillators which differ only by a few parts-per-million, is, however, challenging and quite different from traditional multirate signal processing tasks. In this paper we propose to transfer a precise but computationally demanding time domain approach, inspired by the Nyquist-Shannon sampling theorem, to an efficient frequency domain implementation. To this end a buffer control is employed which compensates for sampling offsets which are multiples of the sampling period, while a digital filter, realized by the wellknown Overlap-Save method, handles the fractional part of the sampling phase offset. With experiments on artificially misaligned data we investigate the parametrization, the efficiency, and the induced distortions of the proposed resampling method. It is shown that a favorable compromise between residual distortion and computational complexity is achieved, compared to other sampling rate offset compensation techniques.

Insights into the Interplay of Sampling Rate Offsets and MVDR Beamforming

J. Schmalenstroeer, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany, 2018

It has been experimentally verified that sampling rate offsets (SROs) between the input channels of an acoustic beamformer have a detrimental effect on the achievable SNR gains. In this paper we derive an analytic model to study the impact of SRO on the estimation of the spatial noise covariance matrix used in MVDR beamforming. It is shown that a perfect compensation of the SRO is impossible if the noise covariance matrix is estimated by time averaging, even if the SRO is perfectly known. The SRO should therefore be compensated for prior to beamformer coefficient estimation. We present a novel scheme where SRO compensation and beamforming closely interact, saving some computational effort compared to separate SRO adjustment followed by acoustic beamforming.

The RWTH/UPB System Combination for the CHiME 2018 Workshop

M. Kitza, W. Michel, C. Boeddeker, J. Heitkaemper, T. Menne, R. Schlüter, H. Ney, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India, 2018

This paper describes the systems for the single-array track and the multiple-array track of the 5th CHiME Challenge. The final system is a combination of multiple systems, using Confusion Network Combination (CNC). The different systems presented here are utilizing different front-ends and training sets for a Bidirectional Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced by enhancements provided by Paderborn University [1]. The back-end has been implemented using RASR [2] and RETURNN [3]. Additionally, a system combination including the hypothesis word graphs from the system of the submission [1] has been performed, which results in the final best system.

Front-End Processing for the CHiME-5 Dinner Party Scenario

C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India, 2018

This contribution presents a speech enhancement system for the CHiME-5 Dinner Party Scenario. The front-end employs multi-channel linear time-variant filtering and achieves its gains without the use of a neural network. We present an adaptation of blind source separation techniques to the CHiME-5 database which we call Guided Source Separation (GSS). Using the baseline acoustic and language model, the combination of Weighted Prediction Error based dereverberation, guided source separation, and beamforming reduces the WER by 10:54% (relative) for the single array track and by 21:12% (relative) on the multiple array track.

Fast and Accurate Audio Resampling for Acoustic Sensor Networks by Polyphase-Farrow Filters with FFT Realization

J. Schmalenstroeer, A. Chinaev, G. Enzner, in: Speech Communication; 13th ITG-Symposium, 2018, pp. 1-5

Arbitrary sampling rate conversion has already received considerable attention in the past, but still lacks an equivalent representation of the effective time-dilation process in the block frequency domain. Good sampling rate converters in the time domain have been known, for instance, in terms of time-varying 'Sinc' or fixed 'Farrow' polynomial filters. The former can deliver nearly exact conversion at high complexity, while the latter has pronounced computational efficiency with limited accuracy. Only recently, it was shown that a composite 'polyphase Farrow' form with high resampling precision can be implemented with quasi-fixed filters that operate at the input sampling rate. We therefore propose to capitalize from that fixed-filter architecture in that we translate the polyphase-Farrow filters into an equivalent FFT-based overlap-save form. Experimental evaluation and comparison with other state-of-the art frequency-domain approaches then proves currently the best price-performance ratio of the proposed algorithm. It is thus an ideal candidate for the new framework of acoustic sensor networks that critically rests upon fast and accurate alignment of autonomous sampling processes.


Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming

J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, R. Haeb-Umbach, in: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), 2017

Multi-channel speech enhancement algorithms rely on a synchronous sampling of the microphone signals. This, however, cannot always be guaranteed, especially if the sensors are distributed in an environment. To avoid performance degradation the sampling rate offset needs to be estimated and compensated for. In this contribution we extend the recently proposed coherence drift based method in two important directions. First, the increasing phase shift in the short-time Fourier transform domain is estimated from the coherence drift in a Matched Filterlike fashion, where intermediate estimates are weighted by their instantaneous SNR. Second, an observed bias is removed by iterating between offset estimation and compensation by resampling a couple of times. The effectiveness of the proposed method is demonstrated by speech recognition results on the output of a beamformer with and without sampling rate offset compensation between the input channels. We compare MVDR and maximum-SNR beamformers in reverberant environments and further show that both benefit from a novel phase normalization, which we also propose in this contribution.

Building or Enclosure Termination Closing and/or Opening Apparatus, and Method for Operating a Building or Enclosure Termination

F. Jacob, J. Schmalenstroeer. Building or Enclosure Termination Closing and/or Opening Apparatus, and Method for Operating a Building or Enclosure Termination, Patent WO2018/077610A. 2017.

The invention relates to a building or enclosure termination opening and/or closing apparatus having communication signed or encrypted by means of a key, and to a method for operating such. To allow simple, convenient and secure use by exclusively authorised users, the apparatus comprises: a first and a second user terminal, with secure forwarding of a time-limited key from the first to the second user terminal being possible. According to an alternative, individual keys are generated by a user identification and a secret device key.


Investigations into Bluetooth Low Energy Localization Precision Limits

J. Schmalenstroeer, R. Haeb-Umbach, in: 24th European Signal Processing Conference (EUSIPCO 2016), 2016

In this paper we study the influence of directional radio patterns of Bluetooth low energy (BLE) beacons on smartphone localization accuracy and beacon network planning. A two-dimensional model of the power emission characteristic is derived from measurements of the radiation pattern of BLE beacons carried out in an RF chamber. The Cramer-Rao lower bound (CRLB) for position estimation is then derived for this directional power emission model. With this lower bound on the RMS positioning error the coverage of different beacon network configurations can be evaluated. For near-optimal network planing an evolutionary optimization algorithm for finding the best beacon placement is presented.


Aligning training models with smartphone properties in WiFi fingerprinting based indoor localization

M.K. Hoang, J. Schmalenstroeer, R. Haeb-Umbach, in: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015


A Gossiping Approach to Sampling Clock Synchronization in Wireless Acoustic Sensor Networks

J. Schmalenstroeer, P. Jebramcik, R. Haeb-Umbach, in: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014

"In this paper we present an approach for synchronizing the sampling clocks of distributed microphones over a wireless network. The proposed system uses a two stage procedure. It first employs a two-way message exchange algorithm to estimate the clock phase and frequency difference between two nodes and then uses a gossiping algorithmto estimate a virtual master clock, to which all sensor nodes synchronize. Simulation results are presented for networks of different topology and size, showing the effectiveness of our approach."

A combined hardware-software approach for acoustic sensor network synchronization

J. Schmalenstroeer, P. Jebramcik, R. Haeb-Umbach, Signal Processing (2014), pp. -

Abstract In this paper we present an approach for synchronizing a wireless acoustic sensor network using a two-stage procedure. First the clock frequency and phase differences between pairs of nodes are estimated employing a two-way message exchange protocol. The estimates are further improved in a Kalman filter with a dedicated observation error model. In the second stage network-wide synchronization is achieved by means of a gossiping algorithm which estimates the average clock frequency and phase of the sensor nodes. These averages are viewed as frequency and phase of a virtual master clock, to which the clocks of the sensor nodes have to be adjusted. The amount of adjustment is computed in a specific control loop. While these steps are done in software, the actual sampling rate correction is carried out in hardware by using an adjustable frequency synthesizer. Experimental results obtained from hardware devices and software simulations of large scale networks are presented.

Online Observation Error Model Estimation for Acoustic Sensor Network Synchronization

J. Schmalenstroeer, W. Zhao, R. Haeb-Umbach, in: 11. ITG Fachtagung Sprachkommunikation (ITG 2014), 2014

"Acoustic sensor network clock synchronization via time stamp exchange between the sensor nodes is not accurate enough for many acoustic signal processing tasks, such as speaker localization. To improve synchronization accuracy it has therefore been proposed to employ a Kalman Filter to obtain improved frequency deviation and phase offset estimates. The estimation requires a statistical model of the errors of the measurements obtained from the time stamp exchange algorithm. These errors are caused by random transmission delays and hardware effects and are thus network specific. In this contribution we develop an algorithm to estimate the parameters of the measurement error model alongside the Kalman filter based sampling clock synchronization, employing the Expectation Maximization algorithm. Simulation results demonstrate that the online estimation of the error model parameters leads only to a small degradation of the synchronization performance compared to a perfectly known observation error model."


A Hidden Markov Model for Indoor User Tracking Based on WiFi Fingerprinting and Step Detection

M.K. Hoang, J. Schmalenstroeer, C. Drueke, D.H. Tran Vu, R. Haeb-Umbach, in: 21th European Signal Processing Conference (EUSIPCO 2013), 2013

In this paper we present a modified hidden Markov model (HMM) for the fusion of received signal strength index (RSSI) information of WiFi access points and relative position information which is obtained from the inertial sensors of a smartphone for indoor positioning. Since the states of the HMM represent the potential user locations, their number determines the quantization error introduced by discretizing the allowable user positions through the use of the HMM. To reduce this quantization error we introduce â??pseudoâ?? states, whose emission probability, which models the RSSI measurements at this location, is synthesized from those of the neighboring states of which a Gaussian emission probability has been estimated during the training phase. The experimental results demonstrate the effectiveness of this approach. By introducing on average two pseudo states per original HMM state the positioning error could be significantly reduced without increasing the training effort.

Server based indoor navigation using RSSI and inertial sensor information

M.K. Hoang, S. Schmitz, C. Drueke, D.H.T. Vu, J. Schmalenstroeer, R. Haeb-Umbach, in: Positioning Navigation and Communication (WPNC), 2013 10th Workshop on, 2013, pp. 1-6

In this paper we present a system for indoor navigation based on received signal strength index information of Wireless-LAN access points and relative position estimates. The relative position information is gathered from inertial smartphone sensors using a step detection and an orientation estimate. Our map data is hosted on a server employing a map renderer and a SQL database. The database includes a complete multilevel office building, within which the user can navigate. During navigation, the client retrieves the position estimate from the server, together with the corresponding map tiles to visualize the user's position on the smartphone display.

DoA-Based Microphone Array Position Self-Calibration Using Circular Statistic

F. Jacob, J. Schmalenstroeer, R. Haeb-Umbach, in: 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), 2013, pp. 116-120

In this paper we propose an approach to retrieve the absolute geometry of an acoustic sensor network, consisting of spatially distributed microphone arrays, from reverberant speech input. The calibration relies on direction of arrival measurements of the individual arrays. The proposed calibration algorithm is derived from a maximum-likelihood approach employing circular statistics. Since a sensor node consists of a microphone array with known intra-array geometry, we are able to obtain an absolute geometry estimate, including angles and distances. Simulation results demonstrate the effectiveness of the approach.

Sampling Rate Synchronisation in Acoustic Sensor Networks with a Pre-Trained Clock Skew Error Model

J. Schmalenstroeer, R. Haeb-Umbach, in: 21th European Signal Processing Conference (EUSIPCO 2013), 2013

In this paper we present a combined hardware/software approach for synchronizing the sampling clocks of an acoustic sensor network. A first clock frequency offset estimate is obtained by a time stamp exchange protocol with a low data rate and computational requirements. The estimate is then postprocessed by a Kalman filter which exploits the specific properties of the statistics of the frequency offset estimation error. In long term experiments the deviation between the sampling oscillators of two sensor nodes never exceeded half a sample with a wired and with a wireless link between the nodes. The achieved precision enables the estimation of time difference of arrival values across different hardware devices without sharing a common sampling hardware.

A Novel Initialization Method for Unsupervised Learning of Acoustic Patterns in Speech (FGNT-2013-01)

O. Walter, J. Schmalenstroeer, R. Haeb-Umbach, 2013

In this paper we present a novel initialization method for unsupervised learning of acoustic patterns in recordings of continuous speech. The pattern discovery task is solved by dynamic time warping whose performance we improve by a smart starting point selection. This enables a more accurate discovery of patterns compared to conventional approaches. After graph-based clustering the patterns are employed for training hidden Markov models for an unsupervised speech acquisition. By iterating between model training and decoding in an EM-like framework the word accuracy is continuously improved. On the TIDIGITS corpus we achieve a word error rate of about 13 percent by the proposed unsupervised pattern discovery approach, which neither assumes knowledge of the acoustic units nor of the labels of the training data.


Microphone Array Position Self-Calibration from Reverberant Speech Input

F. Jacob, J. Schmalenstroeer, R. Haeb-Umbach, in: International Workshop on Acoustic Signal Enhancement (IWAENC 2012), 2012

In this paper we propose an approach to retrieve the geometry of an acoustic sensor network consisting of spatially distributed microphone arrays from unconstrained speech input. The calibration relies on Direction of Arrival (DoA) measurements which do not require a clock synchronization among the sensor nodes. The calibration problem is formulated as a cost function optimization task, which minimizes the squared differences between measured and predicted observations and additionally avoids the existence of minima that correspond to mirrored versions of the actual sensor orientations. Further, outlier measurements caused by reverberation are mitigated by a Random Sample Consensus (RANSAC) approach. The experimental results show a mean positioning error of at most 25 cm even in highly reverberant environments.

Smartphone-Based Sensor Fusion for Improved Vehicular Navigation

O. Walter, J. Schmalenstroeer, A. Engler, R. Haeb-Umbach, in: 9th Workshop on Positioning Navigation and Communication (WPNC 2012), 2012

In this paper we present a system for car navigation by fusing sensor data on an Android smartphone. The key idea is to use both the internal sensors of the smartphone (e.g., gyroscope) and sensor data from the car (e.g., speed information) to support navigation via GPS. To this end we employ a CAN-Bus-to-Bluetooth adapter to establish a wireless connection between the smartphone and the CAN-Bus of the car. On the smartphone a strapdown algorithm and an error-state Kalman filter are used to fuse the different sensor data streams. The experimental results show that the system is able to maintain higher positioning accuracy during GPS dropouts, thus improving the availability and reliability, compared to GPS-only solutions.


Investigations into Features for Robust Classification into Broad Acoustic Categories

J. Schmalenstroeer, M. Bartek, R. Haeb-Umbach, in: 37. Deutsche Jahrestagung fuer Akustik (DAGA 2011), 2011

In this paper we present our experimental results about classifying audio data into broad acoustic categories. The reverberated sound samples from indoor recordings are grouped into four classes, namely speech, music, acoustic events and noise. We investigated a total of 188 acoustic features and achieved for the best configuration a classification accuracy better than 98\%. This was achieved by a 42-dimensional feature vector consisting of Mel-Frequency Cepstral Coefficients, an autocorrelation feature and so-called track features that measure the length of ''traces'' of high energy in the spectrogram. We also found a 4-feature configuration with a classification rate of about 90\% allowing for broad acoustic category classification with low computational effort.

Unsupervised learning of acoustic events using dynamic time warping and hierarchical K-means++ clustering

J. Schmalenstroeer, M. Bartek, R. Haeb-Umbach, in: Interspeech 2011, 2011

In this paper we propose to jointly consider Segmental Dynamic Time Warping and distance clustering for the unsupervised learning of acoustic events. As a result, the computational complexity increases only linearly with the dababase size compared to a quadratic increase in a sequential setup, where all pairwise SDTW distances between segments are computed prior to clustering. Further, we discuss options for seed value selection for clustering and show that drawing seeds with a probability proportional to the distance from the already drawn seeds, known as K-means++ clustering, results in a significantly higher probability of finding representatives of each of the underlying classes, compared to the commonly used draws from a uniform distribution. Experiments are performed on an acoustic event classification and an isolated digit recognition task, where on the latter the final word accuracy approaches that of supervised training.

Unsupervised Geometry Calibration of Acoustic Sensor Networks Using Source Correspondences

J. Schmalenstroeer, F. Jacob, R. Haeb-Umbach, M. Hennecke, G.A. Fink, in: Interspeech 2011, 2011

In this paper we propose a procedure for estimating the geometric configuration of an arbitrary acoustic sensor placement. It determines the position and the orientation of microphone arrays in 2D while locating a source by direction-of-arrival (DoA) estimation. Neither artificial calibration signals nor unnatural user activity are required. The problem of scale indeterminacy inherent to DoA-only observations is solved by adding time difference of arrival (TDOA) measurements. The geometry calibration method is numerically stable and delivers precise results in moderately reverberated rooms. Simulation results are confirmed by laboratory experiments.


Online Diarization of Streaming Audio-Visual Data for Smart Environments

J. Schmalenstroeer, R. Haeb-Umbach, IEEE Journal of Selected Topics in Signal Processing (2010), 4(5), pp. 845-856

For an environment to be perceived as being smart, contextual information has to be gathered to adapt the system's behavior and its interface towards the user. Being a rich source of context information speech can be acquired unobtrusively by microphone arrays and then processed to extract information about the user and his environment. In this paper, a system for joint temporal segmentation, speaker localization, and identification is presented, which is supported by face identification from video data obtained from a steerable camera. Special attention is paid to latency aspects and online processing capabilities, as they are important for the application under investigation, namely ambient communication. It describes the vision of terminal-less, session-less and multi-modal telecommunication with remote partners, where the user can move freely within his home while the communication follows him. The speaker diarization serves as a context source, which has been integrated in a service-oriented middleware architecture and provided to the application to select the most appropriate I/O device and to steer the camera towards the speaker during ambient communication.


A hierarchical approach to unsupervised shape calibration of microphone array networks

M. Hennecke, T. Ploetz, G.A. Fink, J. Schmalenstroeer, R. Haeb-Umbach, in: IEEE/SP 15th Workshop on Statistical Signal Processing (SSP 2009), 2009, pp. 257-260

Microphone arrays represent the basis for many challenging acoustic sensing tasks. The accuracy of techniques like beamforming directly depends on a precise knowledge of the relative positions of the sensors used. Unfortunately, for certain use cases manually measuring the geometry of an array is not feasible due to practical constraints. In this paper we present an approach to unsupervised shape calibration of microphone array networks. We developed a hierarchical procedure that first performs local shape calibration based on coherence analysis and then employs SRP-PHAT in a network calibration method. Practical experiments demonstrate the effectiveness of our approach especially for highly reverberant acoustic environments.

Fusing Audio and Video Information for Online Speaker Diarization

J. Schmalenstroeer, M. Kelling, V. Leutnant, R. Haeb-Umbach, in: Interspeech 2009, 2009

In this paper we present a system for identifying and localizingspeakers using distant microphone arrays and a steerablepan-tilt-zoom camera. Audio and video streams are processedin real-time to obtain the diarization information {grqq}who speakswhen and where'' with low latency to be used in advanced videoconferencing systems or user-adaptive interfaces. A key featureof the proposed system is to first glean information about thespeaker{\rq}s location and identity from the audio and visual datastreams separately and then to fuse these data in a probabilisticframework employing the Viterbi algorithm. Here, visual evidenceof a person is utilized through a priori state probabilities,while location and speaker change information are employedvia time-variant transition probablities. Experiments show thatvideo information yields a substantial improvement comparedto pure audio-based diarization.

Audio-Visual Data Processing for Ambient Communication

J. Schmalenstroeer, V. Leutnant, R. Haeb-Umbach, in: 1st International Workshop on Distributed Computing in Ambient Environments within 32nd Annual Conference on Artificial Intelligence, 2009


Amigo Context Management Service with Applications in Ambient Communication Scenarios

J. Schmalenstroeer, V. Leutnant, R. Haeb-Umbach, in: AMI-07 - European Conference on Ambient Intelligence, 2007

Projekt Amigo - Sprachsignalverarbeitung im vernetzten Haus

J. Schmalenstroeer, E. Warsitz, R. Haeb-Umbach, in: 33. Deutsche Jahrestagung fuer Akustik (DAGA 2007), 2007

Zweistufige Sprache/Pause-Detektion in stark gestoerter Umgebung

E. Warsitz, R. Haeb-Umbach, J. Schmalenstroeer, in: 33. Deutsche Jahrestagung fuer Akustik (DAGA 2007), 2007


Online Speaker Change Detection by Combining BIC with Microphone Array Beamforming

J. Schmalenstroeer, R. Haeb-Umbach, in: Interspeech 2006, 2006

In this paper we consider the problem of detecting speaker changes in audio signals recorded by distant microphones. It is shown that the possibility to exploit the spatial separation of speakers more than makes up the degradation in detection accuracy due to the increased source-to-sensor distance compared to close-talking microphones. Speaker direction information is derived from the filter coefficients of an adaptive Filter-and-Sum Beamformer and is combined with BIC analysis. The experimental results reveal significant improvements compared to BIC-only change detection, be it with the distant or close-talking microphone.


Liste im Research Information System öffnen

Die Universität der Informationsgesellschaft