A Non-Linear Functional Model of the Spectral Analysis Performed in the Peripheral Auditory System
Institute of Electronic Music, Jakoministrasse 3-5, A-8010 Graz, Austria
Dept. of Communications and Wave Propagation, Inffeldgasse 12, A-8010 Graz, Austria
Abstract
To perform the initial frequency analysis in models of human hearing (e.g. preprocessor for speech recognition systems, loudness models etc.), a non-linear model of the peripheral auditory system is proposed.
1. Introduction
The proposed model consists of two main
stages: the outer-middle ear filter and the
level-dependent spectral analysis. The
magnitude response of the outer-middle ear
filter is estimated from the equal loudness
contours [1]. The spectral analysis is
calculated with a bank of overlapping
bandpass filters. These so-called 'auditory
filters' provide a functional representation of
the operation performed by the inner ear.
The most popular method for estimating the
shapes of auditory filters is the notched-noise
method. This method is based on the
assumptions of the power spectrum model of
masking [2].
To approximate the auditory filters, Patterson
[3] suggested a family of simple functions
called roex' (rounded exponentials). The
simplest of these expressions is called the
roex(p) filter shape:
The parameter p determines the bandwidth
and the slopes of the skirts. For asymmetric
filters, the parameter p has different values for
the lower (pl) and upper (pu) branches.
Glasberg and Moore [4] proposed the
parameters pl and pu as follows:
These auditory filters are approximately symmetric on a linear frequency scale when the input levels of the filters are equivalent to a level of L = 51 dB/ERB at 1 kHz (pl(fc) = pu(fc) = p(fc). The low-frequency skirts of these auditory filters rise with increasing levels (L > 51 dB/ERB) and become steeper with decreasing levels (L < 51 dB/ERB). The upper-frequency skirts are level independent (cf. Figure 1).
The magnitude responses of gammatone filters (GTF) with order 4 are comparable with the auditory filters given in Eq. (1) and (2) for L = 51 dB/ERB. A gammatone filter is described by its impulse response, which is called the gammatone function g(t):
The parameter B mainly determines the
bandwidth of the filter. The order n mainly
determines the slope of the skirts. The center
frequency of the filter is fc [5].
The gammatone function is similar to impulse
responses obtained from cat experiments
using the revcor technique [6]. For that
reason, the gammatone filter can be
considered as a link between physiological and
psychoacoustic data. Detailed descriptions of
the motivation for the gammatone filter are
summarized in: [5], [7], [8], [9] and [10].
2. Outer-middle ear filter
Above 1kHz, the magnitude response of the
outer-middle ear filter is similar to the
inverted absolute threshold curve. This is
based on the assumption that the inner ear is
equally sensitive to all frequencies above 1
kHz. The absolute threshold curve is probably
influenced by the low frequency internal noise
of the inner ear and therefore does not reflect
the transmission through the outer and middle
ear below 1 kHz. It is assumed, that the
transmission through the outer and middle ear
below 1kHz is reflected in the inverted shape
of an equal-loudness contour at a high
loudness level [1].
Several authors have found considerable
deviations from the equal-loudness level
contours described in ISO 226 [11]. These
results led us to develop a modified phon-contour at a loudness level of 100 phon. This
modification is an approximation of recent re-eximinations [12]. The outer-middle ear filter
is parameterized and can be adapted to any
recent re-examinations easily.
3. Spectral analysis
The implementation of the non-linear auditory filters is derived from an 8th order linear gammatone filter consisting of 4 biquad sections with 4 identical complex conjugate pole pair positions [13]:
This linear gammatone filter becomes a non-linear filter by moving the pole locations in the
transfer functions H2(z) - H4(z) as a function
of the control level L (dB/ERB). Two
different non-linear filters are investigated,
All-Pole (nlAPGTF) and One-Zero
(nlOZGTF) gammatone filters. The One-Zero
gammatone filter has one zero at z = 1 and the
All-Pole gammatone filter is a pure resonance
filter.
The complex conjugate pole position of a
single linear filter section is (cf. Eq. (5)):
For the level-dependent displacement, the radius of the poles is held constant (rLin = const.). As a result, only the parameters a1,i of the transfer functions H2(z) - H4(z) must be modified. The distance between the poles and the unit circle becomes larger with increasing pole-angles jLin = ±2pfcT and the transfer function's sensitivity to pole-angle shifts decreases. Therefore, the pole-angle shift should increase proportionally to jLin.
To obtain the proper pole-angle shift, jLin is multiplied by a level-dependent factor K:
The factor K is obtained from the level-dependent parameter kj and the radius rLin of the corresponding pole (Eq. (7b)). This equation causes an additional shift of the pole-angle at very high frequencies (fc >> 1 kHz with fs = 44,1 kHz). rLin ~ 1 results in K ~ kj .
Combining Eq. (6) and (7a,b), the pole positions of the non-linear filter sections become:
The level dependence of the parameter kj is approximated with arctan-functions (Eq. (9) and (10), the indices 1,2,3,4 refer to the four filter sections).
- (9)
- kj for nlAPGTF:
kj1(L) = 1
kj2(L) = 0,9898 - 0,0066arctan(0,6(L-85))
kj3(L) = 0,893 - 0,075arctan(0,1(L-70))
kj4(L) = 1,0749 + 0,0165arctan(0,4(L-55))(10)
-
kj for nlOZGTF:
kj1(L) = 1
kj2(L) = 0,9789 - 0,014arctan(0,2(L-80))
kj3(L) = 0,8673 - 0,093arctan(0,1(L-70))
kj4(L) = 1,0606 + 0,025arctan(0,1(L-68))
The parameter kj is independent of the filter
center frequency. For low control levels L, the
pole locations are approximately the same as
for linear gammatone filters.
The coefficients of the numerator are not
affected by the pole-shifts:
(11)
nlAPGTF:
b0,i=1, b1,i=0, b2,i=0
i = 1,2,3,4
(12)
nlOZGTF:
b0,1=1, b1,1=-1, b2,1=0
b0,i=1, b1,i=0, b2,i=0
i = 2,3,4
The denominator coefficients a2,i are also level-independent:
The coefficients a1,i contain the level-dependent parameters kj,i:
The magnitude response of the first filter section H1(z) has to be normalized to 0 dB at fc, because its output might be used for the determination of the control level L. This gain normalization is achieved with the amplification factor V1.
In contrast to H1(z), the filter sections 2,3,4
cannot be separately normalized to 0 dB at fc,
because the maxima of the magnitude
responses are moved with the pole-shifts.
Therefore, an amplification factor Vcom (Eq.
(16)) is obtained from the combined transfer
function Hcom(z) = H2(z) * H3(z) * H4(z). V1
and Vcom normalize the entire filter cascade to
0 dB at fc.
Vcom can be used for both filter types because
the transfer functions H2(z)-H4(z) are identical
for nlAPGTF and nlOZGTF.
Figure 1 shows roex(p) auditory filter shapes according to Eq. (1) and (2). For comparison, the magnitude responses of the nlAPGTF and the nlOZGTF are shown in Figure 2 and 3.
Figure 1: Roex(p) filter shapes according to Eq. (1) and (2)
with fc=100, 300, 1k, 3k, 10k Hz and L = 50, 60, 70, 80, 90 dB/ERB
Figure 2: nlAPGTF magnitude responses with:
fc = 100, 300, 1k, 3k, 10, Hz and L = 50, 60, 70, 80, 90 dB/ERB
Figure 3: nlOZGTF magnitude responses with fc = 100, 300, 1k, 3k, 10k Hz and L = 50, 60,
70, 80, 90 dB/ERB
4. Conclusion
The proposed non-linear filter offers a basic
concept for a level-dependent calculation of
the time-frequency analysis performed in the
peripheral auditory system. The filter order is
level independent, which simplifies its
implementation.
The output of the first filter section can be
used to determine the control level for an
input-level dependent control because H1(z) is
not affected by the parameterization. This is
very efficient because an additional signal path
can be avoided.
The group delay time at the resonance
frequency decreases with increasing control
levels. This corresponds to the auditory
system because high-level signals are
perceived earlier than low-level signals.
Nevertheless, this effect is too small to model
psychoacoustic data.
Using a dynamic compression/expansion
within the calculation of the control level,
different control levels can be assigned to the
shown magnitude responses.
Due to the 'missing' zero at z = 1, the low-frequency branch of the nlAPGTF results in a
horizontal line the longer the distance to the
center frequency is. For that reason, the
nlOZGTF appears to be a better candidate for
modelling auditory filters.
Acknowledgements
This work was funded by the Austrian Science Foundation (FWF, project number: P11159-TEC).
References
[1] Moore B.C.J. and Glasberg B.R., A Revision
of Zwicker's Loudness Model', Acustica - acta
acustica, Vol. 82, 1996, 335-345
[2] Patterson R.D., Moore B.C.J.: Auditory Filters
and excitation patterns as representations of
frequency resolution', In: Moore B.C.J. (Ed.),
Frequency Selectivity in Hearing, Academic,
London, 1986, 123-177
[3] Patterson R.D., Nimmo-Smith I., Weber D.L.,
Milroy R.: The deterioration of hearing with
age: Frequency selectivity, the critical ratio ,
the audiogram, and speech threshold', J.
Acoust. Soc. Am. 72 (6), 1982, 1788-1803
[4] Glasberg B.R., Moore B.C.J.: Derivation of
auditory filter shapes from notched-noise data',
Hear. Res. 47, 1990, 103-138
[5] Patterson R.D.: The sound of a sinusoid:
Spectral models', J. Acoust. Soc. Am. 96 (3),
1994, 1409-1418
[6] Carney L.H., Yin C.T.: Temporal coding of
resonances by low-frequency auditory nerve
fibres: single fibre responses and a population
model', J. Neurophysiology, 60, 1988, 1653-1677
[7] Patterson R.D., Nimmo-Smith I., Holdsworth
J., Rice P.: An efficient auditory filterbank
based on the gammatone function', meeting of
the IOC Speech Group on Auditory Modelling
at RSRE, 1987
[8] Patterson R.D., Holdsworth J.: A functional
model of neural activity patterns and auditory
images', In Advances in Speech, Hearing and
Language Processing, (W.A. Ainsworth, ed.)
Vol 3. JAI Press, London
[9] Cooke M.: Modelling Auditory Processing and
Organisation', Dissertation, University
Sheffield, 1991, Cambridge University Press,
1993
[10] Lyon R.F.: The All-Pole Gammatone Filter
and Auditory Models', Internet, temporäre
Unterlagen für Computational Models of
Signal Processing in the Auditory System',
Forum Acusticum 1996, Antwerpen
[11] Acoustics - Normal Equal-Loudness Level
Contours', In ISO 226 (E), 1987, 20-27
[12] Suzuki Y., Sone T.: Frequency Characteristics
of Loudness Perception: Principles and
Applications', In Sixth Oldenburg Symposium
on Psychological Acoustics, 1993, 193-221
[13] Slaney M.: An Efficient Implementation of the
Patterson-Holdsworth Auditory Filter Bank',
Apple Computer Technical Report #35, Apple
Computer Inc., 1993
© 2000, zuletzt geändert am 26. Jänner 2000.