Optimization of a blind upmix system for the reproduction of tv- and movie-sound
The Diploma thesis is locked till the 16th december 2016.
With consumer acceptance of the Digital Versatile Disc (DVD), launched in 1995, surround sound systems have been widespread in private households.
A majority of today's music however, is not produced in multichannel format. TV content such as television series and old films are only available in two-channel stereo audio.
In order to utilize media with two-channel audio with surround sound systems, a blind-upmix-system focusing on playback of music content was developed at Fraunhofer IIS.
This thesis discusses how the upmix-system was adapted for playback of TV and movie-content. An important design criterion was to achieve pristine sound playback of speech coming from the center channel.
The basic concept of the proposed approach consists of varying the sound parameters of the upmixer over time. A speech detection system determines at which time the input signal of the upmixer contains speech. On the basis of these speech segments, a fade between two sound settings is executed. The settings are tuned to the playback of speech and music, atmospheres respectively.
First, a pattern recognition system was adopted especially for the detection of speech in TV- and movie-audio. The developed additions comprise a pre-processing of the signals, which intends to reduce noise with the help of spectral weighting. Additionally, stereo features were defined to utilize interchannel coherence and interchannel level differences of the signal. Post-processing was designed, to use an additional classifier which is trained at runtime. Finally, envelope segmentation with adaptive background level calculation for the post-processing of estimated speech segments was designed and implemented.
Several algorithms for the computation of a control-function of the upmixer's sound parameters were implemented and tested.
Listening tests showed that the quality of speech playback was significantly improved by the developed additions. Furthermore, it was shown, that sound performance with respect to the positioning of sound sources and sound quality of speech can be improved significantly by fading between two sound settings instead of remaining in a single static sound setting.
The fading between two sound settings was not perceived by several experienced listeners. Listening tests were also carried out by experienced sound-engineers, who either did not perceive the fading at all, or perceived it to be of minimal annoyance in most cases.