Ermittlung eines robusten Feature-Sets zur Klassifikation von Sprache unter Stressbedingungen
Diploma thesis (2.307 KB)
The objective of this diploma thesis is the selection and evaluation of appropriate low-level features and derived feature characteristics for automated recognition and classification of speech under varying emotions and mental stress levels. Importance is attached to obtaining results which are applicable to speech under a broad spectrum of stress types and independent of the language spoken. For this purpose, speech data from an English database of speech under stress (SUSAS) is analyzed as well as a German database of emotional speech (Emo-DB) and an English corpus of non-prompted air traffic control speech (ATCOSIM).
Basic features are extracted using the speech analysis software Praat; including pitch, intensity, F1/F2 frequency and bandwidth, harmonicity, MFCCs, and properties of the glottal source spectrum. Further processing steps, implemented in MATLAB, comprise a phoneme boundary and class detection with subsequent feature extraction utilizing the phoneme grid as a new time base. These additional features include phoneme durations and a feature based on the nonlinear Teager Energy Operator (TEO).
The discriminative power of single features is estimated by means of appropriate statistical tests on the derived characteristics. This results in a feature ranking list for a selected combination of two emotional classes, from which the best performing set of features is then determined iteratively. Using this feature set, a supervised classification method (k-nearest neighbours) is employed in a cross-validation process. Its outcome is the percentage of correctly assigned emotional classes, which is taken as a measure of performance. Finally, a "shared" feature set is found by intersecting optimum feature sets of individual experiments.
For acted emotional speech, results of up to 98% correct classification rate (CCR) are achieved using individual feature sets, which are degraded by not more than 12% when taking the shared feature set for classification. Workload level classification performance reaches up to 70% CCR for individual feature sets and likewise degrades by 12% maximum when using the shared set, what ends up in rather moderate classification rates around 60% CCR though.