Skip to content

News Arts and Science Teaching Media Library Services IEM - intern Contact
  You are not logged in Link icon Log in
You are here: Home » Kunst & Forschung » Signalverarbeitung » Prosody-based automatic segmentation of speech into sentences

Automatische Satzsegmentierung von Sprache unter Verwendung von prosodischen Merkmalen

Florian Pausch

Segmentation of speech into sentences plays an important role as a first step in several speech processing fields. Automatic Speech Recognition (ASR) algorithms mostly produce just a stream of non-structured words without detecting the hidden structure in spoken language. However, natural language processing devices often have a strong need for sentence-like units to work properly. Apart from, it is very time-consuming to label huge speech data amounts by hand. Thus, it is necessary to develop an algorithm which analyzes broadcast speech corpora databases (e.g.: Aix-MARSEC) and outputs sentence boundaries using prosodic features.

The algorithm can be described as following: At the beginning, an adaptive, energy-based voice-activity-detector (VAD) is used to gather all active regions and calculate the pause lengths and intensity as first features. These blocks are then used as input for a pitch estimation algorithm. To assess tendencies at the region boundaries it is needful to calculate an optimal (in the least-squares sense) piecewise polynomial approximation and then calculate different prosodic features (f0-rise/fall, f0-gradient: : :). Consequently, the extracted features are combined in a decision tree to determine the sentence boundaries.

Florian Pausch    type: TI-Project    state: running     Date: 12.10.2010

Last modified 29.06.2011