Audio coding through musical thumbnailing
Diploma Thesis (2.041 KB)
Due to fast increasing of information and, respectively, the steadily gaining number of information sources it gets harder and harder to keep track of all the information. New information retrieval is to be developed to make the pursuit easier, and thus new coding methods are to be found. MPEG-7 format shows scenarios that emphasise ‘feature extraction’ and ‘labelling’ in audio coding. One goal of MPEG-7 is audio coding based on few small musical sequences (thumbnails) to represent a piece of music.
De-coding re-assembles the tabular ordered basic elements to re-compose the original signal. The aim is to find out how music can be divided into small pieces that can bring reduction in the data flow when the (coded) music signal is transmitted from one place to the other, e.g., via internet.
Considering that in music as an event based phenomenon, which underlies a constant change, whether it is a new instrument setting or just a new note played, we can assume such a change to mark the beginning of a new structural part in the music. So the first step should be to find all the moments of such transitions during the musical piece. This will be done by a procedure called onset-detection, where onsets are referred to be the beginning of a musical note or a new musical part. Onset detection can be done in several ways using different approaches, where either changes in temporal features or spectral features, respectively changes in this properties can be used.
Once the onsets, which form a sort of sketchy rhythmical division for the audio-signal, are found, between each pair of onsets a frame is defined. This yields to the music being divided into a set of (longer or shorter) frames. For each of this frames features are extracted to the attempt of finding repeated parts (e.g., verse/chorus) of the song, which are considered to be the parts that will mostly help reducing the data-flow.
For this pursuit three approaches will be taken in consideration: • Repetitions of certain rhythmical patterns (e.g., when having a halftime groove in the verse and normal time in the chorus) can be identified by comparing the timing of onsets (with respect to some inaccuracy in the onset function) without taking in consideration the features of the frames. Parts with similar rhythmical content are considered as related structural parts.
• One other approach is to search for similarities in the timbre of the musical piece. An example for this approach are the Mel-Frequency Cepstral Coefficients (MFCC). They are calculated for each frame, and return each a set (vector) of perceptually based spectral features. This vectors will be correlated to each other in order to find the so called similarity matrix, out of which similarity, and therefore relative structural parts can be found and related
• The third approach to find within-piece similarities goes by analysing musical similarities such as harmony and pitch. Such a search can be done, for example, by extracting the chroma feature, which means the cyclic attribute of pitch detection. That is, each tone (or pitch) in western music has a certain octave and within this octave it lies on one of twelve semitones. Now the energy of all notes belonging to the same chroma levels (i.e., the same tone over all octaves) is summed up to get the overall energy of the tone. In this way, e.g., chords can be identified and similarity of chords in several frames yields related structural parts.
If enough equal parts where found in the musical piece, compression can be achieved by sending the repeated part just once, and while playing the repetition already sending the next new part or transmitting more detailed data.