Skip to content

News Arts and Science Teaching Media Library Services IEM - intern Contact
  You are not logged in Link icon Log in
You are here: Home » Kunst & Forschung » Publikationen » IEM - Reports » IEM-Report 01/98 A Non-Linear Functional Model of the Spectral Analysis Performed in the Peripheral Auditory System

A Non-Linear Functional Model of the Spectral Analysis Performed in the Peripheral Auditory System

M. Pflueger, R. Hoeldrich, W. Riedler

Institute of Electronic Music, Jakoministrasse 3-5, A-8010 Graz, Austria
Dept. of Communications and Wave Propagation, Inffeldgasse 12, A-8010 Graz, Austria


Abstract

To perform the initial frequency analysis in models of human hearing (e.g. preprocessor for speech recognition systems, loudness models etc.), a non-linear model of the peripheral auditory system is proposed.

1. Introduction

The proposed model consists of two main stages: the outer-middle ear filter and the level-dependent spectral analysis. The magnitude response of the outer-middle ear filter is estimated from the equal loudness contours [1]. The spectral analysis is calculated with a bank of overlapping bandpass filters. These so-called 'auditory filters' provide a functional representation of the operation performed by the inner ear.
The most popular method for estimating the shapes of auditory filters is the notched-noise method. This method is based on the assumptions of the power spectrum model of masking [2].
To approximate the auditory filters, Patterson [3] suggested a family of simple functions called roex' (rounded exponentials). The simplest of these expressions is called the roex(p) filter shape:

The parameter p determines the bandwidth and the slopes of the skirts. For asymmetric filters, the parameter p has different values for the lower (pl) and upper (pu) branches.
Glasberg and Moore [4] proposed the parameters pl and pu as follows:

These auditory filters are approximately symmetric on a linear frequency scale when the input levels of the filters are equivalent to a level of L = 51 dB/ERB at 1 kHz (pl(fc) = pu(fc) = p(fc). The low-frequency skirts of these auditory filters rise with increasing levels (L > 51 dB/ERB) and become steeper with decreasing levels (L < 51 dB/ERB). The upper-frequency skirts are level independent (cf. Figure 1).

The magnitude responses of gammatone filters (GTF) with order 4 are comparable with the auditory filters given in Eq. (1) and (2) for L = 51 dB/ERB. A gammatone filter is described by its impulse response, which is called the gammatone function g(t):

The parameter B mainly determines the bandwidth of the filter. The order n mainly determines the slope of the skirts. The center frequency of the filter is fc [5].
The gammatone function is similar to impulse responses obtained from cat experiments using the revcor technique [6]. For that reason, the gammatone filter can be considered as a link between physiological and psychoacoustic data. Detailed descriptions of the motivation for the gammatone filter are summarized in: [5], [7], [8], [9] and [10].

2. Outer-middle ear filter

Above 1kHz, the magnitude response of the outer-middle ear filter is similar to the inverted absolute threshold curve. This is based on the assumption that the inner ear is equally sensitive to all frequencies above 1 kHz. The absolute threshold curve is probably influenced by the low frequency internal noise of the inner ear and therefore does not reflect the transmission through the outer and middle ear below 1 kHz. It is assumed, that the transmission through the outer and middle ear below 1kHz is reflected in the inverted shape of an equal-loudness contour at a high loudness level [1].
Several authors have found considerable deviations from the equal-loudness level contours described in ISO 226 [11]. These results led us to develop a modified phon-contour at a loudness level of 100 phon. This modification is an approximation of recent re-eximinations [12]. The outer-middle ear filter is parameterized and can be adapted to any recent re-examinations easily.

3. Spectral analysis

The implementation of the non-linear auditory filters is derived from an 8th order linear gammatone filter consisting of 4 biquad sections with 4 identical complex conjugate pole pair positions [13]:

This linear gammatone filter becomes a non-linear filter by moving the pole locations in the transfer functions H2(z) - H4(z) as a function of the control level L (dB/ERB). Two different non-linear filters are investigated, All-Pole (nlAPGTF) and One-Zero (nlOZGTF) gammatone filters. The One-Zero gammatone filter has one zero at z = 1 and the All-Pole gammatone filter is a pure resonance filter.
The complex conjugate pole position of a single linear filter section is (cf. Eq. (5)):

For the level-dependent displacement, the radius of the poles is held constant (rLin = const.). As a result, only the parameters a1,i of the transfer functions H2(z) - H4(z) must be modified. The distance between the poles and the unit circle becomes larger with increasing pole-angles jLin = ±2pfcT and the transfer function's sensitivity to pole-angle shifts decreases. Therefore, the pole-angle shift should increase proportionally to jLin.

To obtain the proper pole-angle shift, jLin is multiplied by a level-dependent factor K:

The factor K is obtained from the level-dependent parameter kj and the radius rLin of the corresponding pole (Eq. (7b)). This equation causes an additional shift of the pole-angle at very high frequencies (fc >> 1 kHz with fs = 44,1 kHz). rLin ~ 1 results in K ~ kj .

Combining Eq. (6) and (7a,b), the pole positions of the non-linear filter sections become:

The level dependence of the parameter kj is approximated with arctan-functions (Eq. (9) and (10), the indices 1,2,3,4 refer to the four filter sections).

    (9)
  • kj for nlAPGTF:
    kj1(L) = 1
    kj2(L) = 0,9898 - 0,0066arctan(0,6(L-85))
    kj3(L) = 0,893 - 0,075arctan(0,1(L-70))
    kj4(L) = 1,0749 + 0,0165arctan(0,4(L-55))

    (10)

  • kj for nlOZGTF:
    kj1(L) = 1
    kj2(L) = 0,9789 - 0,014arctan(0,2(L-80))
    kj3(L) = 0,8673 - 0,093arctan(0,1(L-70))
    kj4(L) = 1,0606 + 0,025arctan(0,1(L-68))

The parameter kj is independent of the filter center frequency. For low control levels L, the pole locations are approximately the same as for linear gammatone filters.
The coefficients of the numerator are not affected by the pole-shifts:

(11)
nlAPGTF:
b0,i=1, b1,i=0, b2,i=0
i = 1,2,3,4

(12)
nlOZGTF:
b0,1=1, b1,1=-1, b2,1=0
b0,i=1, b1,i=0, b2,i=0
i = 2,3,4

The denominator coefficients a2,i are also level-independent:

The coefficients a1,i contain the level-dependent parameters kj,i:

The magnitude response of the first filter section H1(z) has to be normalized to 0 dB at fc, because its output might be used for the determination of the control level L. This gain normalization is achieved with the amplification factor V1.

In contrast to H1(z), the filter sections 2,3,4 cannot be separately normalized to 0 dB at fc, because the maxima of the magnitude responses are moved with the pole-shifts. Therefore, an amplification factor Vcom (Eq. (16)) is obtained from the combined transfer function Hcom(z) = H2(z) * H3(z) * H4(z). V1 and Vcom normalize the entire filter cascade to 0 dB at fc.
Vcom can be used for both filter types because the transfer functions H2(z)-H4(z) are identical for nlAPGTF and nlOZGTF.

Figure 1 shows roex(p) auditory filter shapes according to Eq. (1) and (2). For comparison, the magnitude responses of the nlAPGTF and the nlOZGTF are shown in Figure 2 and 3.


Figure 1: Roex(p) filter shapes according to Eq. (1) and (2) with fc=100, 300, 1k, 3k, 10k Hz and L = 50, 60, 70, 80, 90 dB/ERB


Figure 2: nlAPGTF magnitude responses with:
fc = 100, 300, 1k, 3k, 10, Hz and L = 50, 60, 70, 80, 90 dB/ERB


Figure 3: nlOZGTF magnitude responses with fc = 100, 300, 1k, 3k, 10k Hz and L = 50, 60, 70, 80, 90 dB/ERB

4. Conclusion

The proposed non-linear filter offers a basic concept for a level-dependent calculation of the time-frequency analysis performed in the peripheral auditory system. The filter order is level independent, which simplifies its implementation.
The output of the first filter section can be used to determine the control level for an input-level dependent control because H1(z) is not affected by the parameterization. This is very efficient because an additional signal path can be avoided.
The group delay time at the resonance frequency decreases with increasing control levels. This corresponds to the auditory system because high-level signals are perceived earlier than low-level signals. Nevertheless, this effect is too small to model psychoacoustic data.
Using a dynamic compression/expansion within the calculation of the control level, different control levels can be assigned to the shown magnitude responses.
Due to the 'missing' zero at z = 1, the low-frequency branch of the nlAPGTF results in a horizontal line the longer the distance to the center frequency is. For that reason, the nlOZGTF appears to be a better candidate for modelling auditory filters.

Acknowledgements

This work was funded by the Austrian Science Foundation (FWF, project number: P11159-TEC).

References

[1] Moore B.C.J. and Glasberg B.R., A Revision of Zwicker's Loudness Model', Acustica - acta acustica, Vol. 82, 1996, 335-345
[2] Patterson R.D., Moore B.C.J.: Auditory Filters and excitation patterns as representations of frequency resolution', In: Moore B.C.J. (Ed.), Frequency Selectivity in Hearing, Academic, London, 1986, 123-177
[3] Patterson R.D., Nimmo-Smith I., Weber D.L., Milroy R.: The deterioration of hearing with age: Frequency selectivity, the critical ratio , the audiogram, and speech threshold', J. Acoust. Soc. Am. 72 (6), 1982, 1788-1803
[4] Glasberg B.R., Moore B.C.J.: Derivation of auditory filter shapes from notched-noise data', Hear. Res. 47, 1990, 103-138
[5] Patterson R.D.: The sound of a sinusoid: Spectral models', J. Acoust. Soc. Am. 96 (3), 1994, 1409-1418
[6] Carney L.H., Yin C.T.: Temporal coding of resonances by low-frequency auditory nerve fibres: single fibre responses and a population model', J. Neurophysiology, 60, 1988, 1653-1677
[7] Patterson R.D., Nimmo-Smith I., Holdsworth J., Rice P.: An efficient auditory filterbank based on the gammatone function', meeting of the IOC Speech Group on Auditory Modelling at RSRE, 1987
[8] Patterson R.D., Holdsworth J.: A functional model of neural activity patterns and auditory images', In Advances in Speech, Hearing and Language Processing, (W.A. Ainsworth, ed.) Vol 3. JAI Press, London
[9] Cooke M.: Modelling Auditory Processing and Organisation', Dissertation, University Sheffield, 1991, Cambridge University Press, 1993
[10] Lyon R.F.: The All-Pole Gammatone Filter and Auditory Models', Internet, temporäre Unterlagen für Computational Models of Signal Processing in the Auditory System', Forum Acusticum 1996, Antwerpen
[11] Acoustics - Normal Equal-Loudness Level Contours', In ISO 226 (E), 1987, 20-27
[12] Suzuki Y., Sone T.: Frequency Characteristics of Loudness Perception: Principles and Applications', In Sixth Oldenburg Symposium on Psychological Acoustics, 1993, 193-221
[13] Slaney M.: An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank', Apple Computer Technical Report #35, Apple Computer Inc., 1993

© 2000, zuletzt geändert am 26. Jänner 2000.


Last modified 10.04.2003