Temporal Proximity Masking
As we have already seen, sounds separated by short time intervals have an
effect on each other. Forwards (post) and backwards (pre) masking occur for
two sounds in quick time succession. A quiet sound immediately following or
6.3 Auditory Scene Analysis
99
preceding a loud sound can be masked even though it is clearly in a space of
its own. It is as if our brain is distracted by the dominant sound and forgets
about the lesser one. These aren’t quite the same going forwards and backwards
in time. Forwards masking happens around 100ms to 200ms afterwards, while
backwards masking only works for 30ms to 100ms before.
SECTION 6.3
Auditory Scene Analysis
How do we make sense of complex sound scenes? An example given by one of
the pioneers of
auditory scene analysis
, Albert Bregman (1990), is to imagine
hearing the sound of dinner plates sliding over each other and then falling to
the ground with some rolling around and some breaking. Afterwards you can
answer specific questions such as “How many plates were there?” “How far did
they fall?” “Did all the plates break?” “How big were the plates?” and so on.
Applications of auditory scene analysis might be fire alarms where we can aug-
ment heat and smoke detectors with microphones that can detect for the sound
of fire, baby alarms that can discriminate distress from contented gurglings, or
intruder alarms that recognise human footsteps. This kind of work is a branch
of artificial intelligence (AI) sometimes called
machine listening
, with interest-
ing work being conducted at MIT Media Lab and Queen Mary College London.
The fire and baby alarms were project suggestions for my DSP students. As
sound designers we find auditory scene analysis valuable from a constructionist
point of view. Knowledge about how the human brain deconstructs sound can
be used in reverse to engineer sounds with the intended effects.
Segregation
Complex pressure waveforms arriving at the ears may have no obvious time
domain boundaries that indicate individual events or causes. To break a com-
plex sound apart we employ several strategies simultaneously. The first of these
is
segregation
, itself composed of several substrategies that attempt to identify
individual objects or events within a composite stream of information. Several
simultaneous sources, such as a car engine, speaking voices, and background
music, will all have frequency components that overlap in time. The frequencies
themselves are not constant, but move in
trajectories
or
gestures
. A trajectory
is a motion in a high-dimensional space but can be thought of in a simpler way
as a path in lower dimensions, say, as a squiggly line in 3D.
Although a trajectory may intersect with other trajectories from other
sounds, it is usually obvious from looking at the direction of the lines before
and after crossing which one is which. In computer vision we first perform
edge detection to find the boundaries of objects. However, some objects will
be behind others, so to overcome this loss of information resulting from partial
occlusion the computer must “connect the dots” and make inferences about
lines that are implicit in the pattern. In auditory scene analysis we call this
guessing or interpolation of lost features
closure
. Of course we naturally do
this as humans, and similarly our auditory faculties are able piece together
Do'stlaringiz bilan baham: |