Applications of a constant-q transform in music processing
External Supervisor: Dr. Anssi Klapuri, Centre for Digital Music, Queen Mary University of London
The constant-Q transform (CQT) refers to a time-frequency representation where the frequency bins are geometrically (logarithmically) spaced and the Q-factors (ratios of the center frequencies to bandwidths) of all bins are equal.
In effect, this means that the frequency resolution is better for low frequencies and the time resolution is better for high frequencies. The CQT is essentially a wavelet transform, but the term CQT is preferred since it underlines the fact that transforms with relatively high Q-factors, equivalent to 12–96 bins per octave, are considered. This renders many of the conventional wavelet transform techniques inadequate; for example methods based on iterated filterbanks would require filtering the input signal hundreds of times.
From a musical and perceptual viewpoint, a time-frequency transform with a logarithmic frequency bin spacing is very advantageous. This is in sharp contrast with the conventional discrete Fourier transform (DFT) which has linearly spaced frequency bins and therefore cannot satisfy the varying time and frequency resolution requirements over the wide range of audible frequencies.
The main reasons why the CQT has not widely replaced the DFT in audio signal processing are that it is computationally more intensive and that the CQT lacks an inverse transform. Both of these problems where addressed in a preceding work, which marks the starting point for this thesis.
The aim of this project is
• to evaluate the sparsity of the CQT representation as opposed to the sparsity of the DFT representation for audio signals
• to implement pitchshifting in the CQT domain
• to implement sinusoid+noise modeling using the CQT
• to evaluate the applicability of the CQT in audio signal processing in general
• to further improve the quality of the inverse transform of the CQT