|
Course #50
Digital Speech Transmission - Enhancement Coding and Error
Concealment
October 25-28, 2010. Barcelona, Spain
INSTRUCTOR
Professor
Peter Vary, RWTH, Aachen University of Technology, Germany
TECHNOLOGY FOCUS
Digital signal processing technologies play an important role for the success of speech communication devices- be it GSM and UMTS mobile telephones, digital hearing aids, or human-machine interfaces. Within the evolution of systems, the improvement of the speech quality will remain one of the most important objectives.
The focus of this tutorial course is on advanced signal processing algorithms, which help to mitigate physical and technological limitations. A comprehensive understanding of the fundamental issues, standards, and trends is provided, taking into account the specific conditions due to audio-bandwidth limitation, acoustic background noise, interfering acoustic echo signals, bit-rate restrictions, and residual transmission errors from the radio channel.
COURSE CONTENT
The course covers theory and practice of signal processing algorithms including recent trends such as the conversion of networks and terminals to 7 kHz wideband transmission not only by using true wide-band coding but also by artificial bandwidth extension of telephone speech:
- Speech Coding: Overview of old and new speech coding standards ETSI, 3GPP, ITU
- Error Concealment: Soft decoding and iterative source channel decoding
- Bandwidth Extension: With and without side information
- Noise Reduction: Single- and Multi-Microphone Techniques
- Acoustic Echo Cancellation: Time- and frequency-domain, adaptive postfiltering
The course is based on the book "Digital Speech Transmission: Enhancement Coding and Error Concealment" by P. Vary and R. Martin, Wiley, 2006. Furthermore, the coding and processing techniques are demonstrated by many audio examples.
Monday
SPEECH CODING
The first part of the course deals with speech encoding. State-of-the-art concepts are discussed. The signal processing aspects of quantization, differential waveform coding, linear prediction, and especially the concepts of Code Exited Linear Prediction (CELP) are explained.
Speech Production Model
- Speech Production
- Digital Filter Structures for Speech Production
- Psycho-Acoustics
Linear Prediction
- Vocal Tract Models and Short-Term Prediction
- Optimum Prediction
- Spectral Flatness Measure
- Block-Adaptive Linear Prediction
- Levinson/Durbin Algorithm
- Long-Term Prediction
Quantization
- Uniform and Non-uniform Quantization
- Optimal Quantization
- Adaptive Quantization
- Vector Quantization
Speech Coding
- Model-Based Predictive Coding
- Adaptive Differential Pulse Code Modulation (ADPCM)
- Noise Shaping Open Loop and Closed Loop Prediction
- Code Excited Linear Prediction (CELP)
- Quantization and Line Spectral Frequencies (LSF)
- Audio Examples: Quantization, Coding, Noise Shaping
Tuesday
Post Filtering
- Short-term Post Filter
- Long-term Post Filter
- Tilt Compensation
Speech Quality Assessment
Mean Opinion Score (MOS)
Modulated Noise Reference Unit (MNRU, CCITT)
Objective Quality Measures (PESQ)
Speech Coding Standards
The most relevant speech codec standards are discussed and demonstrated by audio examples. Recent developments such as the adaptive multirate codec, Adaptive Multi-Rate (AMR), narrowband, wideband, and wideband+ for GSM and UMTS or variable rate coding for Internet telephony are explained.
ITU: G.721/G.726, G.722, G.723, G.728, G.729, G.729.1, G.711.1, G.718 (EV-VBR)
GSM & UMTS: Full Rate, Half Rate, Enhanced Full Rate, AMR, AMR-WB, AMR-WB+
ERROR CONCEALMENT and SOFT DECISION SOURCE DECODING
Wireless speech transmission systems usually include channel coding for error protection.
However, due to temporarily adverse channel conditions quite frequently residual bit errors remain. The negative effects of these errors can be reduced by error concealment, exploiting both residual source redundancy and information about the instantaneous quality of the transmission channel:
- Frame Substitution and Standard Solutions (GSM)
- Soft-bits and Log. Likelihood Values
- Soft-Decision Speech Decoding by Parameter Estimation
- A Priori Knowledge and A Posteriori Probabilities
- Graceful Degradation by Soft Decoding
- Joint and Iterative Source-Channel (De-) Coding
- Audio Examples: PCM, ADPCM, GSM
Wednesday
BANDWIDTH EXTENSION
In the long run, the audio bandwidth of the telephone networks and terminals will be extended to 7 kHz wideband transmission. This will require new codecs at both sides of the transmission link. In the probably very long transition period many terminals have not yet been equipped with the wideband capability. In this situation, the quality of the received narrowband speech (3.4 kHz) may be improved by means of artificial bandwidth extension:
- Source Filter Model
- Extension of the Excitation Signal
- Extension of the Spectral Envelope
- Statistical Estimation Based on a Markov State Model
- Bandwidth Extension with Side Information
- Implementation and Performance Evaluation
NOISE REDUCTION AND BEAMFORMING
If the signal is degraded by acoustic background noise and a loudspeaker signal, various speech enhancement methods can be applied prior to the speech encoding and transmission. We will discuss state-of-the-art algorithms for noise reduction and evaluate new proposals such as super-Gaussian speech models and psycho-acoustic aspects. Noise suppression schemes using only one single microphone or several microphones are presented.
Single and Dual Channel Noise Reduction
- Wiener Filter
- Speech Enhancement in the DFT Domain
- Noise Estimation Techniques and Minimum Statistics
- More Sophisticated Suppression Rules
- Conditional MMSE- and MAP-Estimation
- Estimation of Complex DFT-Coefficients
- Estimation of Real Valued DFT-Amplitudes
- Estimation with Super-Gaussian Models
- Noise Suppression Exploiting Masking
- Soft Weighting
- Dual Channel Noise Cancellation
- Coherence Function and Theoretical Limits
- Dual Channel Noise Suppression
- Noise Suppression Using Small Microphone Arrays
- Audio Examples
Thursday
Multi-channel Noise Reduction
- Spatial Sampling of Sound Fields
- Beamforming
- Performance Measures
- Fixed Beamformers
- Multi-channel Wiener Filter and Postfilter
- Adaptive Beamformers
- Generalized Side-lobe Canceller
ACOUSTIC ECHO CONTROL
The key algorithms for acoustic echo control used for hands-free communication
are explained, especially for the combination of echo cancellation
with adaptive post-filtering.
- LMS and NLMS Time Domain Cancellation
- Convergence Analysis and Control
- Echo Cancellation and Postfiltering
- Joint Residual Echo Cancellation and Noise Reduction
- Frequency Domain Method and Block Processing
- Additional Measure
- Stereophonic Acoustic Echo Control
- Audio Examples

Figure: Digital Speech Transmission and Enhancement
|