Skip to content

Aktuelles Kunst und Forschung Lehre Mediathek Services & Info IEM - intern Contact
  Sie sind nicht eingeloggt. Link icon Log in
Sie sind hier: Startseite » Kunst & Forschung » Signalverarbeitung » Prosody-based automatic segmentation of speech into sentences

Automatische Satzsegmentierung von Sprache unter Verwendung von prosodischen Merkmalen

Florian Pausch

Segmentation of speech into sentences plays an important role as a first step in several speech processing fields. Automatic Speech Recognition (ASR) algorithms mostly produce just a stream of non-structured words without detecting the hidden structure in spoken language. However, natural language processing devices often have a strong need for sentence-like units to work properly. Apart from, it is very time-consuming to label huge speech data amounts by hand. Thus, it is necessary to develop an algorithm which analyzes broadcast speech corpora databases (e.g.: Aix-MARSEC) and outputs sentence boundaries using prosodic features.

The algorithm can be described as following: At the beginning, an adaptive, energy-based voice-activity-detector (VAD) is used to gather all active regions and calculate the pause lengths and intensity as first features. These blocks are then used as input for a pitch estimation algorithm. To assess tendencies at the region boundaries it is needful to calculate an optimal (in the least-squares sense) piecewise polynomial approximation and then calculate different prosodic features (f0-rise/fall, f0-gradient: : :). Consequently, the extracted features are combined in a decision tree to determine the sentence boundaries.

Florian Pausch    Typ: TI-Projekt    Status: Projekt läuft     Datum: 12.10.2010

Zuletzt verändert: 29.06.2011