CEI-Europe Advanced Science and Technology Education

Course #50

Digital Speech Transmission - Enhancement Coding and Error Concealment

October 25-28, 2010. Barcelona, Spain 

INSTRUCTOR
Professor Peter Vary, RWTH, Aachen University of Technology, Germany


TECHNOLOGY FOCUS 
Digital signal processing technologies play an important role for the success of speech communication devices- be it GSM and UMTS mobile telephones, digital hearing aids, or human-machine interfaces. Within the evolution of systems, the improvement of the speech quality will remain one of the most important objectives. 

The focus of this tutorial course is on advanced signal processing algorithms, which help to mitigate physical and technological limitations. A comprehensive understanding of the fundamental issues, standards, and trends is provided, taking into account the specific conditions due to audio-bandwidth limitation, acoustic background noise, interfering acoustic echo signals, bit-rate restrictions, and residual transmission errors from the radio channel. 

COURSE CONTENT 
The course covers theory and practice of signal processing algorithms including recent trends such as the conversion of networks and terminals to 7 kHz wideband transmission not only by using true wide-band coding but also by artificial bandwidth extension of telephone speech: 

  • Speech Coding: Overview of old and new speech coding standards ETSI, 3GPP, ITU
  • Error Concealment: Soft decoding and iterative source channel decoding 
  • Bandwidth Extension: With and without side information 
  • Noise Reduction: Single- and Multi-Microphone Techniques 
  • Acoustic Echo Cancellation: Time- and frequency-domain, adaptive postfiltering 

The course is based on the book "Digital Speech Transmission: Enhancement Coding and Error Concealment" by P. Vary and R. Martin, Wiley, 2006. Furthermore, the coding and processing techniques are demonstrated by many audio examples.

Monday 

SPEECH CODING 
The first part of the course deals with speech encoding. State-of-the-art concepts are discussed. The signal processing aspects of quantization, differential waveform coding, linear prediction, and especially the concepts of Code Exited Linear Prediction (CELP) are explained. 

Speech Production Model 

  • Speech Production
  • Digital Filter Structures for Speech Production
  • Psycho-Acoustics 

Linear Prediction 

  • Vocal Tract Models and Short-Term Prediction
  • Optimum Prediction
  • Spectral Flatness Measure 
  • Block-Adaptive Linear Prediction
  • Levinson/Durbin Algorithm 
  • Long-Term Prediction

Quantization 

  • Uniform and Non-uniform Quantization
  • Optimal Quantization 
  • Adaptive Quantization
  • Vector Quantization 

Speech Coding

  • Model-Based Predictive Coding
  • Adaptive Differential Pulse Code Modulation (ADPCM)
  • Noise Shaping Open Loop and Closed Loop Prediction
  • Code Excited Linear Prediction (CELP) 
  • Quantization and Line Spectral Frequencies (LSF) 
  • Audio Examples: Quantization, Coding, Noise Shaping


Tuesday 

Post Filtering 

  • Short-term Post Filter
  • Long-term Post Filter
  • Tilt Compensation 

Speech Quality Assessment 

Mean Opinion Score (MOS)

Modulated Noise Reference Unit (MNRU, CCITT)

Objective Quality Measures (PESQ) 

Speech Coding Standards 
The most relevant speech codec standards are discussed and demonstrated by audio examples. Recent developments such as the adaptive multirate codec, Adaptive Multi-Rate (AMR), narrowband, wideband, and wideband+ for GSM and UMTS or variable rate coding for Internet telephony are explained. 
ITU: G.721/G.726, G.722, G.723, G.728, G.729, G.729.1, G.711.1, G.718 (EV-VBR)
GSM & UMTS: Full Rate, Half Rate, Enhanced Full Rate, AMR, AMR-WB, AMR-WB+ 

ERROR CONCEALMENT and SOFT DECISION SOURCE DECODING
Wireless speech transmission systems usually include channel coding for error protection. 
However, due to temporarily adverse channel conditions quite frequently residual bit errors remain. The negative effects of these errors can be reduced by error concealment, exploiting both residual source redundancy and information about the instantaneous quality of the transmission channel: 

  • Frame Substitution and Standard Solutions (GSM)
  • Soft-bits and Log. Likelihood Values
  • Soft-Decision Speech Decoding by Parameter Estimation
  • A Priori Knowledge and A Posteriori Probabilities 
  • Graceful Degradation by Soft Decoding
  • Joint and Iterative Source-Channel (De-) Coding 
  • Audio Examples: PCM, ADPCM, GSM 


Wednesday 

BANDWIDTH EXTENSION 
In the long run, the audio bandwidth of the telephone networks and terminals will be extended to 7 kHz wideband transmission. This will require new codecs at both sides of the transmission link. In the probably very long transition period many terminals have not yet been equipped with the wideband capability. In this situation, the quality of the received narrowband speech (3.4 kHz) may be improved by means of artificial bandwidth extension: 

  • Source Filter Model 
  • Extension of the Excitation Signal
  • Extension of the Spectral Envelope
  • Statistical Estimation Based on a Markov State Model 
  • Bandwidth Extension with Side Information 
  • Implementation and Performance Evaluation 

NOISE REDUCTION AND BEAMFORMING
If the signal is degraded by acoustic background noise and a loudspeaker signal, various speech enhancement methods can be applied prior to the speech encoding and transmission. We will discuss state-of-the-art algorithms for noise reduction and evaluate new proposals such as super-Gaussian speech models and psycho-acoustic aspects. Noise suppression schemes using only one single microphone or several microphones are presented. 

Single and Dual Channel Noise Reduction

  • Wiener Filter
  • Speech Enhancement in the DFT Domain
  • Noise Estimation Techniques and Minimum Statistics
  • More Sophisticated Suppression Rules 
  • Conditional MMSE- and MAP-Estimation
  • Estimation of Complex DFT-Coefficients
  • Estimation of Real Valued DFT-Amplitudes 
  • Estimation with Super-Gaussian Models
  • Noise Suppression Exploiting Masking
  • Soft Weighting
  • Dual Channel Noise Cancellation
  • Coherence Function and Theoretical Limits
  • Dual Channel Noise Suppression
  • Noise Suppression Using Small Microphone Arrays 
  • Audio Examples


Thursday 

Multi-channel Noise Reduction

  • Spatial Sampling of Sound Fields 
  • Beamforming
  • Performance Measures
  • Fixed Beamformers
  • Multi-channel Wiener Filter and Postfilter
  • Adaptive Beamformers 
  • Generalized Side-lobe Canceller

ACOUSTIC ECHO CONTROL 
The key algorithms for acoustic echo control used for hands-free communication 
are explained, especially for the combination of echo cancellation 
with adaptive post-filtering. 

  • LMS and NLMS Time Domain Cancellation
  • Convergence Analysis and Control
  • Echo Cancellation and Postfiltering
  • Joint Residual Echo Cancellation and Noise Reduction
  • Frequency Domain Method and Block Processing 
  • Additional Measure
  • Stereophonic Acoustic Echo Control
  • Audio Examples


Figure: Digital Speech Transmission and Enhancement

Course Rate:  4-day course

Regular Course Fee: EUR 2490

Early Registration Course Fee: EUR 2240
This applies to firm registrations received 2 months before course start. 

University Student and Faculty Rate:
Two university participants are welcome to attend for one course fee if payment is to be made from university funds.

Deliverables:
The course fee covers tuition, course material, and the day conference packages (morning/afternoon refreshments, lunches etc.) paid on your behalf to the course venue. Accommodation is not included.

Payment should be made before course start.