Automatic Melody Transcription

Amir Rahimzadeh

A monophonic music transcription system shall be implemented in MATLAB. The planned system can be described in short as follows:

1) The system gets an audio waveform (pcm-coded) selected by the user

2) The audio waveform is analyzed in different ways (time- /frequency-/musical-domain)

3) The system generates a musical score representation of the analyzed audio signal

The analysis stage consists of two blocks, the onset detection stage and the F0-estimator respectively. The onset detection stage searches for transient regions in the waveform which might correspond to note beginnings. The F0-estimator tries to find out the fundamental frequency (F0) of a played note.

Both blocks use time-domain-techniques (to find periodicities) as well as spectral-methods (to find spectral novelty) to analyze and describe the content of musical audio signals. Information shall be combined in multiple ways to make the system more robust. The last and challenging step will be to apply some basic musical knowledge to the musical score (generated from the information of the analysis stage) in order to gain control over very unlikely tone progressions and be able to exclude them automatically.

In the following the analysis is described more in detail. The onset detection stage will work on redundancy removed meta-data or also called features which are calculated every 2-5 ms. Features might be any combination of spectral– or temporal information (e.g. spectral flux, envelope of the signal) but they have to be chosen properly in order to reflect the desired musical information contained in the audio waveform. In other words the ideal feature for onset detection has a small value most of the time and reaches large values only at note beginnings. Then onset times can be extracted by simple peak picking.

The F0-estimator will work on the basis of auto-difference-functions which show a close relationship to the auto-correlation function but are reported to be less sensitive to amplitude changes. These functions are used to detect signal-inherent periodicities, attributable to harmonic sounds. The main difference between auto-correlation function and auto-difference-function is that the former shows a peak while the latter shows a valley at points of highest correlation. Moreover the F0-estimator is expected to improve the detection of soft / tonal onsets which are likely to be missed by the Onset detection stage.