Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Paderborn University in spring. Show image information

Paderborn University in spring.

Photo: Paderborn University, Kamil Glabica.

Dr.-Ing. Jörg Schmalenströer

Dr.-Ing. Jörg Schmalenströer

Communications Engineering

Academic Senior Councillor - (Contract)-Research & Teaching

+49 5251 60-3623
+49 5251 60-3627
Pohlweg 47-49
33098 Paderborn
Honors & Distinctions
  • 2013: Forschungspreis der Universität Paderborn: Verlässliche Navigation in Gebäuden
  • 2017:  "Top 10% Paper Award" at IEEE Multimedia Signal Processing Workshop, Luton, U.K.. Paper title: Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming, by. J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, and R. Haeb-Umbach
  • Reliable Navigation in Buildings
    2013 - 2014
    Forschungspreis der Universität Paderborn
  • Bonuspoint: Development of a Position Estimation Algorithm for the Fujitsu Forum 2014
    Oct. 2014 - Dec. 2014
    Fujitsu Technology Solutions GmbH - Product Development Group / Innovation Lab
  • Big Data Real Time Shopping Assistant: Development of a Client-Server-based Method for Postion Estimation
    Mar. 2015 - Dec. 2015
    Fujitsu Technology Solutions GmbH - Product Development Group / Innovation Lab
  • Development of a Bluetooth Low Energy Communication Platform
    Nov. 2015 - Nov. 2016
    Hörmann KG Antriebstechnik
  • Distributed Acoustic Signal Processing over Wireless Sensor Networks
    2016 - 2019
    Deutsche Forschungsgemeinschaft Forschungsgruppe: DFG FOR 2457 "Acoustic Sensor Networks"
Dr.-Ing. Jörg Schmalenströer
11/2013 - today

Akademischer Oberrat im Fachgebiet Nachrichtentechnik

04/2010 - 11/2013

Akademischer Rat im Fachgebiet Nachrichtentechnik



Thema: Akustische Szenenanalyse für die ambiente Kommunikation im vernetzten Haus

05/2004 - 03/2010

Wissenschaftlicher Mitarbeiter am Fachgebiet Nachrichtentechnik

Forschungsschwerpunkt: Ambiente Kommunikation im vernetzten Haus im EU-Projekt Amigo

10/1999 - 05/2004

Studium der Elektrotechnik an der Universität Paderborn

Open list in Research Information System


Deep Neural Network based Distance Estimation for Geometry Calibration in Acoustic Sensor Network

T. Gburrek, J. Schmalenstroeer, A. Brendel, W. Kellermann, R. Haeb-Umbach, in: European Signal Processing Conference (EUSIPCO), 2020


Insights into the Interplay of Sampling Rate Offsets and MVDR Beamforming

J. Schmalenstroeer, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany, 2018

It has been experimentally verified that sampling rate offsets (SROs) between the input channels of an acoustic beamformer have a detrimental effect on the achievable SNR gains. In this paper we derive an analytic model to study the impact of SRO on the estimation of the spatial noise covariance matrix used in MVDR beamforming. It is shown that a perfect compensation of the SRO is impossible if the noise covariance matrix is estimated by time averaging, even if the SRO is perfectly known. The SRO should therefore be compensated for prior to beamformer coefficient estimation. We present a novel scheme where SRO compensation and beamforming closely interact, saving some computational effort compared to separate SRO adjustment followed by acoustic beamforming.

Front-End Processing for the CHiME-5 Dinner Party Scenario

C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India, 2018

This contribution presents a speech enhancement system for the CHiME-5 Dinner Party Scenario. The front-end employs multi-channel linear time-variant filtering and achieves its gains without the use of a neural network. We present an adaptation of blind source separation techniques to the CHiME-5 database which we call Guided Source Separation (GSS). Using the baseline acoustic and language model, the combination of Weighted Prediction Error based dereverberation, guided source separation, and beamforming reduces the WER by 10:54% (relative) for the single array track and by 21:12% (relative) on the multiple array track.

Fast and Accurate Audio Resampling for Acoustic Sensor Networks by Polyphase-Farrow Filters with FFT Realization

J. Schmalenstroeer, A. Chinaev, G. Enzner, in: Speech Communication; 13th ITG-Symposium, 2018, pp. 1-5

Arbitrary sampling rate conversion has already received considerable attention in the past, but still lacks an equivalent representation of the effective time-dilation process in the block frequency domain. Good sampling rate converters in the time domain have been known, for instance, in terms of time-varying 'Sinc' or fixed 'Farrow' polynomial filters. The former can deliver nearly exact conversion at high complexity, while the latter has pronounced computational efficiency with limited accuracy. Only recently, it was shown that a composite 'polyphase Farrow' form with high resampling precision can be implemented with quasi-fixed filters that operate at the input sampling rate. We therefore propose to capitalize from that fixed-filter architecture in that we translate the polyphase-Farrow filters into an equivalent FFT-based overlap-save form. Experimental evaluation and comparison with other state-of-the art frequency-domain approaches then proves currently the best price-performance ratio of the proposed algorithm. It is thus an ideal candidate for the new framework of acoustic sensor networks that critically rests upon fast and accurate alignment of autonomous sampling processes.

Benchmarking Neural Network Architectures for Acoustic Sensor Networks

J. Ebbers, J. Heitkaemper, J. Schmalenstroeer, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany, 2018

Due to their distributed nature wireless acoustic sensor networks offer great potential for improved signal acquisition, processing and classification for applications such as monitoring and surveillance, home automation, or hands-free telecommunication. To reduce the communication demand with a central server and to raise the privacy level it is desirable to perform processing at node level. The limited processing and memory capabilities on a sensor node, however, stand in contrast to the compute and memory intensive deep learning algorithms used in modern speech and audio processing. In this work, we perform benchmarking of commonly used convolutional and recurrent neural network architectures on a Raspberry Pi based acoustic sensor node. We show that it is possible to run medium-sized neural network topologies used for speech enhancement and speech recognition in real time. For acoustic event recognition, where predictions in a lower temporal resolution are sufficient, it is even possible to run current state-of-the-art deep convolutional models with a real-time-factor of 0:11.

MARVELO - A Framework for Signal Processing in Wireless Acoustic Sensor Networks

H. Afifi, J. Schmalenstroeer, J. Ullmann, R. Haeb-Umbach, H. Karl, in: Speech Communication; 13th ITG-Symposium, 2018, pp. 1-5

Signal processing in WASNs is based on a software framework for hosting the algorithms as well as on a set of wireless connected devices representing the hardware. Each of the nodes contributes memory, processing power, communication bandwidth and some sensor information for the tasks to be solved on the network. In this paper we present our MARVELO framework for distributed signal processing. It is intended for transforming existing centralized implementations into distributed versions. To this end, the software only needs a block-oriented implementation, which MARVELO picks-up and distributes on the network. Additionally, our sensor node hardware and the audio interfaces responsible for multi-channel recordings are presented.

Efficient Sampling Rate Offset Compensation - An Overlap-Save Based Approach

J. Schmalenstroeer, R. Haeb-Umbach, in: 26th European Signal Processing Conference (EUSIPCO 2018), 2018

Distributed sensor data acquisition usually encompasses data sampling by the individual devices, where each of them has its own oscillator driving the local sampling process, resulting in slightly different sampling rates at the individual sensor nodes. Nevertheless, for certain downstream signal processing tasks it is important to compensate even for small sampling rate offsets. Aligning the sampling rates of oscillators which differ only by a few parts-per-million, is, however, challenging and quite different from traditional multirate signal processing tasks. In this paper we propose to transfer a precise but computationally demanding time domain approach, inspired by the Nyquist-Shannon sampling theorem, to an efficient frequency domain implementation. To this end a buffer control is employed which compensates for sampling offsets which are multiples of the sampling period, while a digital filter, realized by the wellknown Overlap-Save method, handles the fractional part of the sampling phase offset. With experiments on artificially misaligned data we investigate the parametrization, the efficiency, and the induced distortions of the proposed resampling method. It is shown that a favorable compromise between residual distortion and computational complexity is achieved, compared to other sampling rate offset compensation techniques.

The RWTH/UPB System Combination for the CHiME 2018 Workshop

M. Kitza, W. Michel, C. Boeddeker, J. Heitkaemper, T. Menne, R. Schlüter, H. Ney, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India, 2018

This paper describes the systems for the single-array track and the multiple-array track of the 5th CHiME Challenge. The final system is a combination of multiple systems, using Confusion Network Combination (CNC). The different systems presented here are utilizing different front-ends and training sets for a Bidirectional Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced by enhancements provided by Paderborn University [1]. The back-end has been implemented using RASR [2] and RETURNN [3]. Additionally, a system combination including the hypothesis word graphs from the system of the submission [1] has been performed, which results in the final best system.


Building or Enclosure Termination Closing and/or Opening Apparatus, and Method for Operating a Building or Enclosure Termination

F. Jacob, J. Schmalenstroeer. Building or Enclosure Termination Closing and/or Opening Apparatus, and Method for Operating a Building or Enclosure Termination, Patent WO2018/077610A. 2017.

The invention relates to a building or enclosure termination opening and/or closing apparatus having communication signed or encrypted by means of a key, and to a method for operating such. To allow simple, convenient and secure use by exclusively authorised users, the apparatus comprises: a first and a second user terminal, with secure forwarding of a time-limited key from the first to the second user terminal being possible. According to an alternative, individual keys are generated by a user identification and a secret device key.

Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming

J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, R. Haeb-Umbach, in: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), 2017

Multi-channel speech enhancement algorithms rely on a synchronous sampling of the microphone signals. This, however, cannot always be guaranteed, especially if the sensors are distributed in an environment. To avoid performance degradation the sampling rate offset needs to be estimated and compensated for. In this contribution we extend the recently proposed coherence drift based method in two important directions. First, the increasing phase shift in the short-time Fourier transform domain is estimated from the coherence drift in a Matched Filterlike fashion, where intermediate estimates are weighted by their instantaneous SNR. Second, an observed bias is removed by iterating between offset estimation and compensation by resampling a couple of times. The effectiveness of the proposed method is demonstrated by speech recognition results on the output of a beamformer with and without sampling rate offset compensation between the input channels. We compare MVDR and maximum-SNR beamformers in reverberant environments and further show that both benefit from a novel phase normalization, which we also propose in this contribution.


Investigations into Bluetooth Low Energy Localization Precision Limits

J. Schmalenstroeer, R. Haeb-Umbach, in: 24th European Signal Processing Conference (EUSIPCO 2016), 2016

In this paper we study the influence of directional radio patterns of Bluetooth low energy (BLE) beacons on smartphone localization accuracy and beacon network planning. A two-dimensional model of the power emission characteristic is derived from measurements of the radiation pattern of BLE beacons carried out in an RF chamber. The Cramer-Rao lower bound (CRLB) for position estimation is then derived for this directional power emission model. With this lower bound on the RMS positioning error the coverage of different beacon network configurations can be evaluated. For near-optimal network planing an evolutionary optimization algorithm for finding the best beacon placement is presented.


Aligning training models with smartphone properties in WiFi fingerprinting based indoor localization

M.K. Hoang, J. Schmalenstroeer, R. Haeb-Umbach, in: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015


A Gossiping Approach to Sampling Clock Synchronization in Wireless Acoustic Sensor Networks

J. Schmalenstroeer, P. Jebramcik, R. Haeb-Umbach, in: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014

"In this paper we present an approach for synchronizing the sampling clocks of distributed microphones over a wireless network. The proposed system uses a two stage procedure. It first employs a two-way message exchange algorithm to estimate the clock phase and frequency difference between two nodes and then uses a gossiping algorithmto estimate a virtual master clock, to which all sensor nodes synchronize. Simulation results are presented for networks of different topology and size, showing the effectiveness of our approach."

A combined hardware-software approach for acoustic sensor network synchronization

J. Schmalenstroeer, P. Jebramcik, R. Haeb-Umbach, Signal Processing (2014), pp. -

Abstract In this paper we present an approach for synchronizing a wireless acoustic sensor network using a two-stage procedure. First the clock frequency and phase differences between pairs of nodes are estimated employing a two-way message exchange protocol. The estimates are further improved in a Kalman filter with a dedicated observation error model. In the second stage network-wide synchronization is achieved by means of a gossiping algorithm which estimates the average clock frequency and phase of the sensor nodes. These averages are viewed as frequency and phase of a virtual master clock, to which the clocks of the sensor nodes have to be adjusted. The amount of adjustment is computed in a specific control loop. While these steps are done in software, the actual sampling rate correction is carried out in hardware by using an adjustable frequency synthesizer. Experimental results obtained from hardware devices and software simulations of large scale networks are presented.

Online Observation Error Model Estimation for Acoustic Sensor Network Synchronization

J. Schmalenstroeer, W. Zhao, R. Haeb-Umbach, in: 11. ITG Fachtagung Sprachkommunikation (ITG 2014), 2014

"Acoustic sensor network clock synchronization via time stamp exchange between the sensor nodes is not accurate enough for many acoustic signal processing tasks, such as speaker localization. To improve synchronization accuracy it has therefore been proposed to employ a Kalman Filter to obtain improved frequency deviation and phase offset estimates. The estimation requires a statistical model of the errors of the measurements obtained from the time stamp exchange algorithm. These errors are caused by random transmission delays and hardware effects and are thus network specific. In this contribution we develop an algorithm to estimate the parameters of the measurement error model alongside the Kalman filter based sampling clock synchronization, employing the Expectation Maximization algorithm. Simulation results demonstrate that the online estimation of the error model parameters leads only to a small degradation of the synchronization performance compared to a perfectly known observation error model."


Sampling Rate Synchronisation in Acoustic Sensor Networks with a Pre-Trained Clock Skew Error Model

J. Schmalenstroeer, R. Haeb-Umbach, in: 21th European Signal Processing Conference (EUSIPCO 2013), 2013

In this paper we present a combined hardware/software approach for synchronizing the sampling clocks of an acoustic sensor network. A first clock frequency offset estimate is obtained by a time stamp exchange protocol with a low data rate and computational requirements. The estimate is then postprocessed by a Kalman filter which exploits the specific properties of the statistics of the frequency offset estimation error. In long term experiments the deviation between the sampling oscillators of two sensor nodes never exceeded half a sample with a wired and with a wireless link between the nodes. The achieved precision enables the estimation of time difference of arrival values across different hardware devices without sharing a common sampling hardware.

A Hidden Markov Model for Indoor User Tracking Based on WiFi Fingerprinting and Step Detection

M.K. Hoang, J. Schmalenstroeer, C. Drueke, D.H. Tran Vu, R. Haeb-Umbach, in: 21th European Signal Processing Conference (EUSIPCO 2013), 2013

In this paper we present a modified hidden Markov model (HMM) for the fusion of received signal strength index (RSSI) information of WiFi access points and relative position information which is obtained from the inertial sensors of a smartphone for indoor positioning. Since the states of the HMM represent the potential user locations, their number determines the quantization error introduced by discretizing the allowable user positions through the use of the HMM. To reduce this quantization error we introduce â??pseudoâ?? states, whose emission probability, which models the RSSI measurements at this location, is synthesized from those of the neighboring states of which a Gaussian emission probability has been estimated during the training phase. The experimental results demonstrate the effectiveness of this approach. By introducing on average two pseudo states per original HMM state the positioning error could be significantly reduced without increasing the training effort.

Server based indoor navigation using RSSI and inertial sensor information

M.K. Hoang, S. Schmitz, C. Drueke, D.H.T. Vu, J. Schmalenstroeer, R. Haeb-Umbach, in: Positioning Navigation and Communication (WPNC), 2013 10th Workshop on, 2013, pp. 1-6

In this paper we present a system for indoor navigation based on received signal strength index information of Wireless-LAN access points and relative position estimates. The relative position information is gathered from inertial smartphone sensors using a step detection and an orientation estimate. Our map data is hosted on a server employing a map renderer and a SQL database. The database includes a complete multilevel office building, within which the user can navigate. During navigation, the client retrieves the position estimate from the server, together with the corresponding map tiles to visualize the user's position on the smartphone display.

DoA-Based Microphone Array Position Self-Calibration Using Circular Statistic

F. Jacob, J. Schmalenstroeer, R. Haeb-Umbach, in: 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), 2013, pp. 116-120

In this paper we propose an approach to retrieve the absolute geometry of an acoustic sensor network, consisting of spatially distributed microphone arrays, from reverberant speech input. The calibration relies on direction of arrival measurements of the individual arrays. The proposed calibration algorithm is derived from a maximum-likelihood approach employing circular statistics. Since a sensor node consists of a microphone array with known intra-array geometry, we are able to obtain an absolute geometry estimate, including angles and distances. Simulation results demonstrate the effectiveness of the approach.

A Novel Initialization Method for Unsupervised Learning of Acoustic Patterns in Speech (FGNT-2013-01)

O. Walter, J. Schmalenstroeer, R. Haeb-Umbach, 2013

In this paper we present a novel initialization method for unsupervised learning of acoustic patterns in recordings of continuous speech. The pattern discovery task is solved by dynamic time warping whose performance we improve by a smart starting point selection. This enables a more accurate discovery of patterns compared to conventional approaches. After graph-based clustering the patterns are employed for training hidden Markov models for an unsupervised speech acquisition. By iterating between model training and decoding in an EM-like framework the word accuracy is continuously improved. On the TIDIGITS corpus we achieve a word error rate of about 13 percent by the proposed unsupervised pattern discovery approach, which neither assumes knowledge of the acoustic units nor of the labels of the training data.


Smartphone-Based Sensor Fusion for Improved Vehicular Navigation

O. Walter, J. Schmalenstroeer, A. Engler, R. Haeb-Umbach, in: 9th Workshop on Positioning Navigation and Communication (WPNC 2012), 2012

In this paper we present a system for car navigation by fusing sensor data on an Android smartphone. The key idea is to use both the internal sensors of the smartphone (e.g., gyroscope) and sensor data from the car (e.g., speed information) to support navigation via GPS. To this end we employ a CAN-Bus-to-Bluetooth adapter to establish a wireless connection between the smartphone and the CAN-Bus of the car. On the smartphone a strapdown algorithm and an error-state Kalman filter are used to fuse the different sensor data streams. The experimental results show that the system is able to maintain higher positioning accuracy during GPS dropouts, thus improving the availability and reliability, compared to GPS-only solutions.

Microphone Array Position Self-Calibration from Reverberant Speech Input

F. Jacob, J. Schmalenstroeer, R. Haeb-Umbach, in: International Workshop on Acoustic Signal Enhancement (IWAENC 2012), 2012

In this paper we propose an approach to retrieve the geometry of an acoustic sensor network consisting of spatially distributed microphone arrays from unconstrained speech input. The calibration relies on Direction of Arrival (DoA) measurements which do not require a clock synchronization among the sensor nodes. The calibration problem is formulated as a cost function optimization task, which minimizes the squared differences between measured and predicted observations and additionally avoids the existence of minima that correspond to mirrored versions of the actual sensor orientations. Further, outlier measurements caused by reverberation are mitigated by a Random Sample Consensus (RANSAC) approach. The experimental results show a mean positioning error of at most 25 cm even in highly reverberant environments.


Unsupervised learning of acoustic events using dynamic time warping and hierarchical K-means++ clustering

J. Schmalenstroeer, M. Bartek, R. Haeb-Umbach, in: Interspeech 2011, 2011

In this paper we propose to jointly consider Segmental Dynamic Time Warping and distance clustering for the unsupervised learning of acoustic events. As a result, the computational complexity increases only linearly with the dababase size compared to a quadratic increase in a sequential setup, where all pairwise SDTW distances between segments are computed prior to clustering. Further, we discuss options for seed value selection for clustering and show that drawing seeds with a probability proportional to the distance from the already drawn seeds, known as K-means++ clustering, results in a significantly higher probability of finding representatives of each of the underlying classes, compared to the commonly used draws from a uniform distribution. Experiments are performed on an acoustic event classification and an isolated digit recognition task, where on the latter the final word accuracy approaches that of supervised training.

Unsupervised Geometry Calibration of Acoustic Sensor Networks Using Source Correspondences

J. Schmalenstroeer, F. Jacob, R. Haeb-Umbach, M. Hennecke, G.A. Fink, in: Interspeech 2011, 2011

In this paper we propose a procedure for estimating the geometric configuration of an arbitrary acoustic sensor placement. It determines the position and the orientation of microphone arrays in 2D while locating a source by direction-of-arrival (DoA) estimation. Neither artificial calibration signals nor unnatural user activity are required. The problem of scale indeterminacy inherent to DoA-only observations is solved by adding time difference of arrival (TDOA) measurements. The geometry calibration method is numerically stable and delivers precise results in moderately reverberated rooms. Simulation results are confirmed by laboratory experiments.

Investigations into Features for Robust Classification into Broad Acoustic Categories

J. Schmalenstroeer, M. Bartek, R. Haeb-Umbach, in: 37. Deutsche Jahrestagung fuer Akustik (DAGA 2011), 2011

In this paper we present our experimental results about classifying audio data into broad acoustic categories. The reverberated sound samples from indoor recordings are grouped into four classes, namely speech, music, acoustic events and noise. We investigated a total of 188 acoustic features and achieved for the best configuration a classification accuracy better than 98\%. This was achieved by a 42-dimensional feature vector consisting of Mel-Frequency Cepstral Coefficients, an autocorrelation feature and so-called track features that measure the length of ''traces'' of high energy in the spectrogram. We also found a 4-feature configuration with a classification rate of about 90\% allowing for broad acoustic category classification with low computational effort.


Online Diarization of Streaming Audio-Visual Data for Smart Environments

J. Schmalenstroeer, R. Haeb-Umbach, IEEE Journal of Selected Topics in Signal Processing (2010), 4(5), pp. 845-856

For an environment to be perceived as being smart, contextual information has to be gathered to adapt the system's behavior and its interface towards the user. Being a rich source of context information speech can be acquired unobtrusively by microphone arrays and then processed to extract information about the user and his environment. In this paper, a system for joint temporal segmentation, speaker localization, and identification is presented, which is supported by face identification from video data obtained from a steerable camera. Special attention is paid to latency aspects and online processing capabilities, as they are important for the application under investigation, namely ambient communication. It describes the vision of terminal-less, session-less and multi-modal telecommunication with remote partners, where the user can move freely within his home while the communication follows him. The speaker diarization serves as a context source, which has been integrated in a service-oriented middleware architecture and provided to the application to select the most appropriate I/O device and to steer the camera towards the speaker during ambient communication.


Audio-Visual Data Processing for Ambient Communication

J. Schmalenstroeer, V. Leutnant, R. Haeb-Umbach, in: 1st International Workshop on Distributed Computing in Ambient Environments within 32nd Annual Conference on Artificial Intelligence, 2009

A hierarchical approach to unsupervised shape calibration of microphone array networks

M. Hennecke, T. Ploetz, G.A. Fink, J. Schmalenstroeer, R. Haeb-Umbach, in: IEEE/SP 15th Workshop on Statistical Signal Processing (SSP 2009), 2009, pp. 257-260

Microphone arrays represent the basis for many challenging acoustic sensing tasks. The accuracy of techniques like beamforming directly depends on a precise knowledge of the relative positions of the sensors used. Unfortunately, for certain use cases manually measuring the geometry of an array is not feasible due to practical constraints. In this paper we present an approach to unsupervised shape calibration of microphone array networks. We developed a hierarchical procedure that first performs local shape calibration based on coherence analysis and then employs SRP-PHAT in a network calibration method. Practical experiments demonstrate the effectiveness of our approach especially for highly reverberant acoustic environments.

Fusing Audio and Video Information for Online Speaker Diarization

J. Schmalenstroeer, M. Kelling, V. Leutnant, R. Haeb-Umbach, in: Interspeech 2009, 2009

In this paper we present a system for identifying and localizingspeakers using distant microphone arrays and a steerablepan-tilt-zoom camera. Audio and video streams are processedin real-time to obtain the diarization information {grqq}who speakswhen and where'' with low latency to be used in advanced videoconferencing systems or user-adaptive interfaces. A key featureof the proposed system is to first glean information about thespeaker{\rq}s location and identity from the audio and visual datastreams separately and then to fuse these data in a probabilisticframework employing the Viterbi algorithm. Here, visual evidenceof a person is utilized through a priori state probabilities,while location and speaker change information are employedvia time-variant transition probablities. Experiments show thatvideo information yields a substantial improvement comparedto pure audio-based diarization.


Amigo Context Management Service with Applications in Ambient Communication Scenarios

J. Schmalenstroeer, V. Leutnant, R. Haeb-Umbach, in: AMI-07 - European Conference on Ambient Intelligence, 2007

Projekt Amigo - Sprachsignalverarbeitung im vernetzten Haus

J. Schmalenstroeer, E. Warsitz, R. Haeb-Umbach, in: 33. Deutsche Jahrestagung fuer Akustik (DAGA 2007), 2007

Zweistufige Sprache/Pause-Detektion in stark gestoerter Umgebung

E. Warsitz, R. Haeb-Umbach, J. Schmalenstroeer, in: 33. Deutsche Jahrestagung fuer Akustik (DAGA 2007), 2007


Online Speaker Change Detection by Combining BIC with Microphone Array Beamforming

J. Schmalenstroeer, R. Haeb-Umbach, in: Interspeech 2006, 2006

In this paper we consider the problem of detecting speaker changes in audio signals recorded by distant microphones. It is shown that the possibility to exploit the spatial separation of speakers more than makes up the degradation in detection accuracy due to the increased source-to-sensor distance compared to close-talking microphones. Speaker direction information is derived from the filter coefficients of an adaptive Filter-and-Sum Beamformer and is combined with BIC analysis. The experimental results reveal significant improvements compared to BIC-only change detection, be it with the distant or close-talking microphone.


Open list in Research Information System

The University for the Information Society