A naturally ordered geometric model of sound
inspired by colour theory
Stephen Barrass
CSIRO Division of Information Technology, PO Box 664, Canberra, ACT 2601
Telephone (06) 275 0938, Fax (06) 257 1052, Email stephen.barrass@csis.dit.csiro.au
Abstract
The hue, saturation, lightness (HSL) colour model is an intuitive interface for interaction with the
millions of colours which are available in high quality colour processes. This interface makes it possible
to specify colours in a natural manner, to manipulate the subjective dimensions of colour independently,
to use geometric paths to specify ordered sequences of colours, and to use a system which does not
depend on a particular colour output device.
This paper proposes a naturally ordered geometric model of sound, derived from considerations of the
form and properties of the HSL colour model. The recognition that both hue and timbre are categorical
perceptions is the link which allows the creation of a cycle of timbres which mimics the hue circle. There
is no consistent ordering of timbre but it is hypothesised that a subset of timbres may be arranged in a
circle by a pair of underlying perceptual axes. The Timbre Circle, in which the most similar timbres are
adjacent, and the most dissimilar lie diagonally opposite, is investigated by implementing several such
sequences based on dimensions of timbre identified in the psychoacoustic research studies of von
Bismarck, Slawson and Grey. A complex timbre attribute consisting of a spectral and temporal
component is proposed as the basis for an extension of the Timbre Circle with an independent naturally
ordered radial component to form the Timbre Wheel. Brightness is identified in many timbre spaces as
the most important aspect, and has been shown to have a perceptual metric. An example of a Timbre
Wheel in which brightness is the radial component is implemented and various sequences such as a
diagonal and a spiral are used to confirm perceptible ordering and smooth transition through the centre
which supports the polar geometry. The sound model is completed by choosing pitch over loudness as a
vertical axis, analogous with the lightness axis in the colour model. The result is called the Timbre,
Brightness, Pitch (TBP) sound model. The generality of the model is demonstrated by fitting Gaver’s
“everyday” synthesis algorithm within the framework.
The TBP sound model is illustrated with a SoundChooser graphical user interface similar to the colour
chooser found in many colour applications.
Keywords Colour, Sound, Perception, Synaesthesia, User Interface
45.
Introduction
Humans can distinguish millions of different colours. To choose a particular colour, or sequence of
colours, from this multitude would seem to be a daunting task. However there is a compass which
enables intuitive orientation and navigation through the realm of colour, called the HSL colour
model[40]. The properties of the HSL colour model are due to a basis in the human perception of colour.
The model presents colours as a geometric solid which has axes aligned with the subjective attributes hue, saturation and lightness [40]. Interaction with the colours is natural because the ordering is
observable and systematic. Independent changes may be made in each subjective attribute. Lines,
spirals, slices, and volumes can be taken from the solid to create ordered colour sequences. The HSL
coordinates are a transportable way of specifying colours without reference to the physical mechanics of
producing the colour.
Humans can also distinguish millions of different sounds. The standard interface to sound in computer
and electronic music systems consists of a palette of around 200 verbally described timbres (e.g. muted
trumpet, pizzicato strings), and a number of parameters which allow local variations in the timbre (e.g.
reverb depth, vibrato rate). The sound palette is a useful interface because it provides access to a limited
range of timbres which may be manipulated in constrained ways. However the strengths of the palette
method are also its weakness. There are many areas of science and music where there is a need to access
a much wider diversity of sounds. Unfortunately there is not a comprehensive model of sound perception
which can provide the basis for a natural sound ordering system. The specification of sounds is ad-hoc
and device specific. Sounds are often specified and compared in terms of physical parameters which do
not have predictable effects on what is heard. Structure represented by progressions along device or
physical dimensions can be distorted or obfuscated by perceptual interactions and non-linearities. The
lack of a systematic and observable organisation for timbres makes it difficult to select ordered
sequences. These are serious impediments to the application of high quality sound in the
human-computer interface.
This paper proposes a perceptual model of sound which will address the issues of natural specification,
the manipulation of subjective dimensions, a geometric interface, and transportable coordinates. The
sound model is derived from considerations of the form and properties of the HSL colour model.
46.
The HSL Colour Model
Colour perception has attracted considerable research effort because of its importance in arts, crafts and
industries such as painting, dying, lighting, printing, television and computing. The HSL colour model is
part of a theory of colour which has been extensively developed through psychological experiments on
human perception. The subjective dimensions of colour are hue, saturation and lightness which covary
with the physical attributes of wavelength, spectral content and intensity of the visible electromagnetic
spectrum. The arrangement of colours is called natural because changes in colours are systematic and
observable due to the alignment of the axes of the model with human perception. The appearance of a
colour may be varied independently in each dimension without affecting the other dimensions, so for
example a green hue may be made lighter or darker without altering its green-ness. Arranging the
model in a geometric way allows colours to be compared in terms of position - the further two colours are
apart the greater the difference in their appearance. It also enable the use of geometric paths for
specifying ordered sequences of colours, such as lines, circles and spirals. The polar cyclindrical geometry
of the HSL colour model is shown in Figure 79. This geometric system consists of a chromatic wheel,
made up of a hue angle (H) and a saturation radius (S), fixed upon a vertical lightness axis (L).
grey axis
Lightness (L)
Saturation (S)
Hue (H)
chromatic wheel
Figure 79. The HSL colour model
47.
Sound Perception
The subjective attributes of sounds are loudness, pitch, timbre and duration [42]. At first glance it seems
that a simple sound model may be formed from a combination of these four dimensions. However
“timbre” is a catch-all label for the many variations in sounds which differentiate a guitar from a piano,
or identify a person from their voice, or allow the recognition that an object is made of metal rather than
wood when it is hit. The perception of timbre covaries with amplitude and frequency modulation, attack
time, spectral content and many other objective aspects of acoustic events. These variations are heard in
subjective changes described by words such as roughness, brightness, or thickness. Because timbre is a
multidimensional perception all timbres cannot be mixed from a small number of primary components,
as is the case with colour. This is a significant problem because timbre will not collapse neatly into a low
dimensional space. However it is possible to systematise a subset of timbres in a way that is observable,
and this system can provide a foundation for a sound model.
47.1
Timbre models
There have been a variety of research efforts which have attempted to more clearly define the subjective
components of timbre, and to extract the most salient dimensions.
47.1.1 Spectral Timbre
Early studies in timbre perception were of steady sounds which had spectral components that did not
change over time. Ramps, blocks, trapezoids and humps were among the spectral envelopes that Von
Bismarck[45] applied to harmonic and white noise sources. Subjects were asked to rate the resulting
sounds against 30 verbal scales consisting of opposite meaning pairs such as hard-soft, sharp-dull,
violent-gentle, dark-light, rough-smooth, coarse-fine, dirty-clean, thin-thick, compact-scattered,
empty-full, solid-hollow. Four main dimensions which spanned 90% of the variation were found by factor
analysis. Sharpness was the most dominant factor, followed by compactness. The compactness
dimension showed a clear discrimination between sounds with harmonic sources and those with noise
sources. Sharpness was related to the centre of gravity of the spectral envelope, with a progression from
sounds with dominant low harmonics to sounds where the upper harmonics were emphasised. In further
experiments Von Bismarck demonstrated that sharpness can be doubled and halved in a similar fashion
to loudness and pitch [46].
The shape of localised regions of the spectrum can be described by parameterised formants. Formants
provide a means to specify and manipulate complex spectra more simply than operating on individual
spectral components. Pollard and Jansson proposed a tristimulus formant model [37] of timbre in which
the three vertices of a triangle represent three frequency regions - the fundamental, the mid frequency
partials (2,3,4) and the residual upper partials. Spoken vowels fall into unique locations in this space
which indicates that it may be perceptually significant. Slawson [43] proposed that a two dimensional
F1-F2 plane based on the two most significant formants is fundamental to timbre perception, and that
timbres are naturally ordered in this space. He also attempted to show that a path representing a timbre
sequence in this space can be moved or transposed whilst maintaining its perceptual order.
The Multidimensional Scaling (MDS) technique allows comparison scales to be used to construct spatial
structures where quantitative distance is a measure of qualitative variables. This technique was applied
by Plomp [42] to find the dimensions of timbre. His results showed that three orthogonal axes accounted
for 90.4% of the variation in the 9 steady state musical instrument timbres used in the experiment. A
difficulty with the MDS method is that the dimensions found may not have a direct physical
interpretation. Plomp used a principle components analysis of the band rate spectra of the sounds as a
means of ascertaining the meaning of the MDS results. One of Plomps conclusions was that sharpness is
the main attribute of timbre, and that it is related to the centre of gravity of loudness on a frequency
scale in which critical bandwidths have equal lengths.
47.1.2 Temporal timbre
In many sounds the spectral components are not constant over time, but change in number, amplitude
and relationship to each other over the course of the event. The importance of the temporal changes was
shown by Wedin and Goude [9] who found 33% successful identification of a musical instrument from a
sample versus 20% from just the steady state portion of the sample. Saldhana and Corso [9] found 45%
correct identification of a sampled instrument from the attack transient plus part of the steady state
versus 32% for steady state plus decay transient.
An MDS study of complete musical instrument samples was made by Wessel [39] who found a temporal
and a spectral dimension to be the most important. The temporal dimension was grouped by instrument
family: trumpet-trombone-French horn, oboe-bassoon-clarinet, and violin-viola-cello. In the spectral
dimension sounds with most energy in the low harmonics were at one extreme, and those with energy
concentrated in the upper harmonics at the other. In another MDS study Grey [39] equalised the
loudness, pitch and duration of each timbre by re-synthesising 16 musical instruments. He found that a
3 dimensional space was required to explain the results. The cartesian space consisted of a temporal
plane defined by two orthogonal temporal dimensions, and a vertical spectral dimension. Graphic 3d
visualisations of the results were used to show where the instruments were positioned relative to each
other in the space. Grey analysed the results with respect to the amplitude, frequency, time
spectrograms of the data points. His conclusion was that the Y axis related to spectral energy
distribution, whilst the X and Z axes relate to temporal properties of timbre, covarying with
synchronicity in the development of upper harmonics and the presence of low-energy high frequency
noise during the attack segment.
The temporal development of the spectrum over time is used as a cue to the material type in the
“everyday” synthesis developed by Gaver [38]. This algorithm is an attempt to control sounds along the
dimensions of materials and events, and is modelled upon a mallet hitting a bar made of wood or metal.
The rate of decay of the upper partials defines the material - wood is damped and these partials decay
rapidly whereas in metal they continue to ring much longer. The spectral brightness of the sound
conveys the hardness of the mallet, and the fundamental frequency indicates the length of the bar.
48.
From Colour Model to Sound Model
In this section a model of sound is derived by considering the components of the HSL colour model as a
framework. A perceptually organised cycle of timbres is constructed using the hue circle as a template. A
naturally ordered and separable radial component of timbre is added which has a similar role to the
saturation component of the colour model. The cyclindrical polar geometric model is completed by a
vertical axis which is analogous with the lightness axis in colour.
48.1
The Hue Circle and the Timbre Circle
The central role of the hue circle in the colour model leads to the consideration of a similar structure as
the basis for the sound model. The hue circle consists of a continuous cycle of hues designated by an
angle from 0 to 360 degrees. The arrangement, shown in Figure 80, has complementary hues
diametrically opposite each other, and this is achieved by positioning the hues according to the
orthogonal red-green and blue-yellow axes of colour opponent theory.
Yellow
S
Yellow-Green
H
Green
Orange
Red
Magenta
Cyan
Blue
Figure 80. The hue circle
Although the hue circle is continuous, the perception of hue is categorical. This is important in the
application of colour to represent data in maps, graphs and visualisations where hue is used to separate
distinct regions or display classification data rather than for smooth gradients or continuous variables.
Humans do not have a consistent intuition of a natural ordering for hue, and the ordering of hue
sequences must be explicitly remembered. Like hue, timbre is also a categorical quality which has no
natural order. The categorical nature of hue and timbre perception leads to the proposal that timbre may
assume a similar role to hue in perceptual presentations, and a circle of timbres, analogous with the hue
circle, may provide the foundation for a sound model. This arrangement, which we call the Timbre
Circle, has the most similar timbres adjacent to each other and the most dissimilar timbres
diametrically opposite each other. To create a cycle the sequence should be seamless - there should be no
apparent beginning or end when the it is continuously repeated. A Timbre Circle which meets these
requirements can be constructed using an underlying orthogonal basis for ordering, and the salient
dimensions identified in timbre research provide a number of possible options. Timbre circles based on
the research of Von Bismarck, Slawson and Grey are shown in Figure 81. The Von Bismarck and
Slawson Timbre Circles are spectrally based, the Grey Timbre Circle is temporally based
.
Onset
noise
F2 high
Sharp
Timbre
Timbre
T
Scattered
Compact
Timbre
T
T
F1 high
F1 low
Spread
attack
F2 low
Dull
Slawson
Von Bismarck
Synchronous
attack
No onset
noise
Grey
Figure 81. Timbre Circles ordered by underlying orthogonal axes
48.2
Is timbre really circular?
To ascertain whether timbre circularity is a real phenomenon the Timbre Circles shown in Figure 81
were synthesised. A circular path was traced through each coordinate system and a judgement of
whether the sequence really did sound cyclic was made by listening for smoothness and seamlessness as
it was repeated. The effectiveness of the underlying axes as a basis for perceptual ordering was judged
by similarity of adjacent timbres and dissimilarity of opposite timbres.
The equipment used was a Sun Sparcstation 10 unix workstation which includes 16 bit, 44.8 kHz audio
as standard hardware. Csound [44] was used for sound synthesis and processing. An “instrument” was
implemented for each of the timbre spaces. Each instrument had an interface controlled by X,Y
parameters denoting a position in a cartesian coordinate system corresponding with timbre bases.
The author found that repetitions of the resulting sequences did sound cyclic in each case, supporting the
viability of the Timbre Circle. It was possible to hear “complementary” timbres as distinctly different
sounds lying opposite each other at angles all around the circle, not just at the 0 and 90 degree positions
aligned with the underlying axes. The implementation of each instrument is briefly described below to
allow the reader to confirm the phenomenon.
48.2.1 Von Bismarck
The ordering axes for the Von Bismarck Timbre Circle are compact/scattered and dull/bright. The
instrument is implemented as a source/filter. The source consists of bands of noise centred at the first 20
harmonic frequencies. The X axis linearly controls the width of each noise band in the range 0.1 Hz to
f0/2 Hz, so that at the compact end the trend is toward a pulse train and at the scattered end it is a band
of noise. The brightness of the source is adjusted by the Y axis which linearly controls the centre
frequency of a 2nd order bandpass filter with bandwidth bw = 5f0 in the range f0 to f20.
48.2.2 Slawson
The ordering axes for the Slawson Timbre Circle are based on the F1-F2 formant space. The X,Y
coordinates each control the centre frequency of a spectral formant region. The X axis linearly controls
the position of the spectral peak in the formant region f0 to f5. The Y axis linearly controls the position of
the spectral peak in the formant region f6 to f10.
48.2.3 Grey
The ordering axes for the Grey Timbre Circle come from the temporal dimensions identified in Grey’s
MDS study of timbre. The X axis has endpoints synchronous/spread and linearly controls the rise times
of the upper harmonics in the range 0 to 0.3 seconds. The Y axis linearly controls the intensity of 0.1
seconds of an inharmonic high frequency onset noise which is mixed with the sound.
48.3
The Chromatic Wheel and the Timbre Wheel
Colour chromaticness is a complex attribute consisting of hue and saturation. The chromatic wheel is
made up of the hue circle filled in by the saturation moderator which radiates outward from the grey
centre. Saturation has a natural order from grey through pastel shades to the most colourful hue. A
similar scheme for sound can be created by identifying a timbre moderator which has a natural order
and which is independent from the axes which order the Timbre Circle. This radial component should
cause a smooth and seamless transition through the centre of the wheel in any direction, so that all
radial sequences tend toward a single point at the centre. We consider the spectral and temporal
dimensions of timbre as orthogonal components which make up a complex attribute analogous to
chromaticness of colour. If one of these attributes is the basis for the timbre circle the other provides the
independent radial modifier. The two possible forms of the Timbre Wheel are shown in Figure 82.
S Spectral
T
Temporal
T Temporal
S Spectral
Figure 82. Two forms of the Timbre Wheel based on a complex timbre attribute
The temporal Grey Timbre Circle and a spectral radial modifier will now be further developed to explore
the viability of the Timbre Wheel. The spectral attribute which is orthogonal to the temporal plane in
Grey’s timbre space corresponds with Bregman’s [36] rough definition of brightness as “the balance of
low and high partials in the spectrum”. Brightness satisfies the requirement for a perceptual order as
shown by the ordering of timbres along the spectral dimension in Grey’s space, and the psychoacoustic
sharpness scales constructed by Von Bismarck [46]. Brightness satisfies the requirement for a smooth
transition across the centre of the wheel because the dullest sound is defined by a sinusoid at the
fundamental and this is an atomic component of most spectra. There is an analogy between the “dull”
point in this Timbre Wheel and the “grey” point in the chromatic wheel.
The Grey Timbre Wheel was put to the test by extending the Csound implementation of the Grey Timbre
Circle with a radial Brightness parameter which controlled a low pass filter and so attenuated the upper
harmonics and shifted the balance point in the spectrum. The phenomenon was examined using
sequences shown in Figure 83. A spiral path was created which rotated through the timbres and
smoothly decreased in perceived brightness. Diagonal paths at several different angles demonstrated a
smooth transition across the central axis of the wheel.
B
B
T
T
Figure 83. Timbre sequences used to confirm the properties of the Timbre Wheel
48.4
A naturally ordered geometric model of sound
Having established a Timbre Wheel the sound model can be completed by choosing an auditory
counterpart to the lightness axis in the HSL colour model. The lightness axis is sometimes called the
grey axis because all the shades of grey from black to white lie on it. In the sound model this axis will
consist of all the dull points, which makes for a synaesthetically consistent analogy between greyness
and dullness. The vertical axis must be separable from timbre and must have a natural order.
Loudness and pitch are both potential candidates with low/high and down/up height associations
commonly used to describe relative values. Pitch was chosen to complete the system because calibrated
pitch scales are common on sound output devices, and because loudness has the drawbacks that it can be
physiologically damaging, it is context sensitive, it can override other aspects of a sound, and most
devices have a manual loudness adjustment which can affect the dynamic range.
The TBP sound model, shown in Figure 84, consists of a complex Timbre Wheel made up of a temporal
timbre angle (T), a spectral brightness radius (B), which is fixed upon a vertical dull axis ordered by
pitch height (P).
Pitch (P)
dull axis
Brightness (B)
Timbre (T)
Timbre Wheel
Figure 84. The TBP sound model
The generality of this framework is demonstrated by applying it to represent the timbre space defined by
Gaver’s everyday synthesis algorithm. The source timbres are metal and wood and these are temporally
defined by the rate of decay of the harmonics [38]. These two materials can be assigned opposite
positions in a temporal Timbre Circle which has rate of decay as an underlying axis of arrangement. The
hardness of the mallet corresponds with spectral brightness and hence the radial Brightness component,
whilst the length of the bar correlates with the Pitch dimension. Thus Gaver’s algorithm is represented
by a vertical slice through the TBP model.
49.
The SoundChooser
The SoundChooser is a graphical user interface to the TBP sound model which is designed to parallel the
HSL colour chooser used for picking colours in many computer applications. The widget is illustrated in
Figure 85, and consists of a dial with an arm which rotates through 360 degrees. This arm can be
directly manipulated to select a timbre angle, or can be set specifically with a numeric entry box. On the
dial arm is a bead which represents the radial brightness. This bead may be directly manipulated or set
using a slider or a numeric entry. Pitch height is controlled with a vertical slider, or a numeric entry. A
“play” button activates the current sound and a “cycle” button causes the dial to rotate and generate a
timbre circle sequence. Different timbre circles can be selected with the numeric entry widget at the top
of the panel, for example the Von Bismarck Timbre Circle is 1, the Slawson Timbre Circle is 2 and the
Grey Timbre Circle is 3.
The graphical user interface enables spatial interactions with the TBP sound model. The sounds are
arranged in an intuitive manner because the axes are perceptually aligned. This arrangement makes it
easy to search for particular sounds, to remember where sounds are, and to compare sounds.The user
interface of the SoundChooser was implemented with tk/tcl [41] and the coordinates were sent to the
Csound instrument through a unix pipe.
Figure 85. SoundChooser graphical interface to the TBP sound model
50.
Conclusions
The Timbre, Brightness, Pitch (TBP) sound model was developed to have properties similar to those of
the HSL colour model. The sound model was founded on the recognition of a link between the categorical
nature of timbre and hue perception. The Timbre Circle was defined and the concept was confirmed by
generating circular timbre sequences using Csound instruments. The Timbre Circle was extended with a
radial Brightness modifier to create the Timbre Wheel, and this was tested by creating a timbre spiral
sequence. Pitch was chosen over loudness as the vertical axis and a SoundChooser widget was
implemented as an interface for selecting sounds.
The advantages of the TBP sound model are:
• Natural specification, comparison and matching - Timbre, Brightness, and Pitch are
perceptually separable attributes of sounds.
• Natural order - the Timbre Circle is ordered by an underlying perceptually orthogonal
basis which arranges complementary timbres diametrically opposite each other. The
Brightness and Pitch axes both have a natural order.
• Independent control of perceptually aligned parameters - Timbre, Brightness, and Pitch
can be changed independently in the TBP model.
• Geometric interface - the 3d sound solid provides the opportunity for spatial interaction
with sounds.
• Transportability - the TBP model may be used to specify sounds in natural terms rather
than device coordinates.
50.1
Limitations
The colour model provides access to the entire range of perceptible colours. This is not the case with the
TBP sound model. This limitation is due to the multidimensional nature of timbre. The selection of the
axes which underly the Timbre Circle constrains the range of timbres to those which can be described in
terms of those axes. Access to the greatest possible range of sounds can be enabled by using the most
perceptually salient axes. The value of the Timbre Circle is that it allows timbres which are spanned by
the nominated axes to be ordered in terms of those axes, irrespective of how they vary in other aspects.
50.2
Further work
To present structure in sound it is necessary to preserve relationships between points in the sound
space. We intend to use the TBP sound model as the basis for a perceptually linearised sound space
which will provide a framework for mapping scientific data to sound.
51.
Acknowledgements
This work was funded by a postgraduate research scholarship from the Commonwealth Scientific and
Industrial Research Organisation, Division of Information Technology, Australia. I would like to thank
Dr. Phil Robertson of the CSIRO for his support, guidance, and expertise. I would like to thank Mr.
David Worrall of the Australian Centre for Arts and Technology for his perspective and advice.
52.
References
[36]
Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organisation of Sound, The MIT Press, Cambridge.
[37]
Fletcher N.H and Rossing T.D. (1991). The Physics of Musical Instruments, Springer-Verlag, New York.
[38]
Gaver W.W. (1993). Synthesizing Auditory Icons, Conference Proceedings of InterCHI `93, pp 228-235, ACM
SigCHI 1993.
[39]
Grey J.M. (1975). Exploration of Musical Timbre, Phd. Thesis, Report No. STAN-M-2, CCRMA Dept. of Music,
Stanford University.
[40]
Hunt, R.G.W. (1987). Measuring Colour, Ellis Horwood, Chichester.
[41]
Ousterhout J.K. (1994), Tcl and the Tk Toolkit, Addison-Wesley, Reading, Mass.
[42]
Plomp, R. (1976). Aspects of Tone Sensation, Academic Press, London.
[43]
Slawson A.W. (1968). “Vowel quality and musical timbre as functions of spectrum envelope and fundamental
frequency”, Journal of Acoustical Society of America, 43, pp 87-101.
[44]
Vercoe B. (1991), CSOUND, A Manual for the Audio Processing System and Supporting Programs,
Media Laboratory, M.I.T, Cambridge, Mass.
[45]
von Bismarck G. (1974). “Timbre of Steady State Sounds : a factorial investigation of its verbal attributes”,
Acustica 30, pp 146-159.
[46]
von Bismarck G. (1974). “Sharpness as an Attribute of the Timbre of Steady Sounds”, Acustica 30, pp 159.