www.fgks.org   »   [go: up one dir, main page]

Academia.eduAcademia.edu
A naturally ordered geometric model of sound inspired by colour theory Stephen Barrass CSIRO Division of Information Technology, PO Box 664, Canberra, ACT 2601 Telephone (06) 275 0938, Fax (06) 257 1052, Email stephen.barrass@csis.dit.csiro.au Abstract The hue, saturation, lightness (HSL) colour model is an intuitive interface for interaction with the millions of colours which are available in high quality colour processes. This interface makes it possible to specify colours in a natural manner, to manipulate the subjective dimensions of colour independently, to use geometric paths to specify ordered sequences of colours, and to use a system which does not depend on a particular colour output device. This paper proposes a naturally ordered geometric model of sound, derived from considerations of the form and properties of the HSL colour model. The recognition that both hue and timbre are categorical perceptions is the link which allows the creation of a cycle of timbres which mimics the hue circle. There is no consistent ordering of timbre but it is hypothesised that a subset of timbres may be arranged in a circle by a pair of underlying perceptual axes. The Timbre Circle, in which the most similar timbres are adjacent, and the most dissimilar lie diagonally opposite, is investigated by implementing several such sequences based on dimensions of timbre identified in the psychoacoustic research studies of von Bismarck, Slawson and Grey. A complex timbre attribute consisting of a spectral and temporal component is proposed as the basis for an extension of the Timbre Circle with an independent naturally ordered radial component to form the Timbre Wheel. Brightness is identified in many timbre spaces as the most important aspect, and has been shown to have a perceptual metric. An example of a Timbre Wheel in which brightness is the radial component is implemented and various sequences such as a diagonal and a spiral are used to confirm perceptible ordering and smooth transition through the centre which supports the polar geometry. The sound model is completed by choosing pitch over loudness as a vertical axis, analogous with the lightness axis in the colour model. The result is called the Timbre, Brightness, Pitch (TBP) sound model. The generality of the model is demonstrated by fitting Gaver’s “everyday” synthesis algorithm within the framework. The TBP sound model is illustrated with a SoundChooser graphical user interface similar to the colour chooser found in many colour applications. Keywords Colour, Sound, Perception, Synaesthesia, User Interface 45. Introduction Humans can distinguish millions of different colours. To choose a particular colour, or sequence of colours, from this multitude would seem to be a daunting task. However there is a compass which enables intuitive orientation and navigation through the realm of colour, called the HSL colour model[40]. The properties of the HSL colour model are due to a basis in the human perception of colour. The model presents colours as a geometric solid which has axes aligned with the subjective attributes hue, saturation and lightness [40]. Interaction with the colours is natural because the ordering is observable and systematic. Independent changes may be made in each subjective attribute. Lines, spirals, slices, and volumes can be taken from the solid to create ordered colour sequences. The HSL coordinates are a transportable way of specifying colours without reference to the physical mechanics of producing the colour. Humans can also distinguish millions of different sounds. The standard interface to sound in computer and electronic music systems consists of a palette of around 200 verbally described timbres (e.g. muted trumpet, pizzicato strings), and a number of parameters which allow local variations in the timbre (e.g. reverb depth, vibrato rate). The sound palette is a useful interface because it provides access to a limited range of timbres which may be manipulated in constrained ways. However the strengths of the palette method are also its weakness. There are many areas of science and music where there is a need to access a much wider diversity of sounds. Unfortunately there is not a comprehensive model of sound perception which can provide the basis for a natural sound ordering system. The specification of sounds is ad-hoc and device specific. Sounds are often specified and compared in terms of physical parameters which do not have predictable effects on what is heard. Structure represented by progressions along device or physical dimensions can be distorted or obfuscated by perceptual interactions and non-linearities. The lack of a systematic and observable organisation for timbres makes it difficult to select ordered sequences. These are serious impediments to the application of high quality sound in the human-computer interface. This paper proposes a perceptual model of sound which will address the issues of natural specification, the manipulation of subjective dimensions, a geometric interface, and transportable coordinates. The sound model is derived from considerations of the form and properties of the HSL colour model. 46. The HSL Colour Model Colour perception has attracted considerable research effort because of its importance in arts, crafts and industries such as painting, dying, lighting, printing, television and computing. The HSL colour model is part of a theory of colour which has been extensively developed through psychological experiments on human perception. The subjective dimensions of colour are hue, saturation and lightness which covary with the physical attributes of wavelength, spectral content and intensity of the visible electromagnetic spectrum. The arrangement of colours is called natural because changes in colours are systematic and observable due to the alignment of the axes of the model with human perception. The appearance of a colour may be varied independently in each dimension without affecting the other dimensions, so for example a green hue may be made lighter or darker without altering its green-ness. Arranging the model in a geometric way allows colours to be compared in terms of position - the further two colours are apart the greater the difference in their appearance. It also enable the use of geometric paths for specifying ordered sequences of colours, such as lines, circles and spirals. The polar cyclindrical geometry of the HSL colour model is shown in Figure 79. This geometric system consists of a chromatic wheel, made up of a hue angle (H) and a saturation radius (S), fixed upon a vertical lightness axis (L). grey axis Lightness (L) Saturation (S) Hue (H) chromatic wheel Figure 79. The HSL colour model 47. Sound Perception The subjective attributes of sounds are loudness, pitch, timbre and duration [42]. At first glance it seems that a simple sound model may be formed from a combination of these four dimensions. However “timbre” is a catch-all label for the many variations in sounds which differentiate a guitar from a piano, or identify a person from their voice, or allow the recognition that an object is made of metal rather than wood when it is hit. The perception of timbre covaries with amplitude and frequency modulation, attack time, spectral content and many other objective aspects of acoustic events. These variations are heard in subjective changes described by words such as roughness, brightness, or thickness. Because timbre is a multidimensional perception all timbres cannot be mixed from a small number of primary components, as is the case with colour. This is a significant problem because timbre will not collapse neatly into a low dimensional space. However it is possible to systematise a subset of timbres in a way that is observable, and this system can provide a foundation for a sound model. 47.1 Timbre models There have been a variety of research efforts which have attempted to more clearly define the subjective components of timbre, and to extract the most salient dimensions. 47.1.1 Spectral Timbre Early studies in timbre perception were of steady sounds which had spectral components that did not change over time. Ramps, blocks, trapezoids and humps were among the spectral envelopes that Von Bismarck[45] applied to harmonic and white noise sources. Subjects were asked to rate the resulting sounds against 30 verbal scales consisting of opposite meaning pairs such as hard-soft, sharp-dull, violent-gentle, dark-light, rough-smooth, coarse-fine, dirty-clean, thin-thick, compact-scattered, empty-full, solid-hollow. Four main dimensions which spanned 90% of the variation were found by factor analysis. Sharpness was the most dominant factor, followed by compactness. The compactness dimension showed a clear discrimination between sounds with harmonic sources and those with noise sources. Sharpness was related to the centre of gravity of the spectral envelope, with a progression from sounds with dominant low harmonics to sounds where the upper harmonics were emphasised. In further experiments Von Bismarck demonstrated that sharpness can be doubled and halved in a similar fashion to loudness and pitch [46]. The shape of localised regions of the spectrum can be described by parameterised formants. Formants provide a means to specify and manipulate complex spectra more simply than operating on individual spectral components. Pollard and Jansson proposed a tristimulus formant model [37] of timbre in which the three vertices of a triangle represent three frequency regions - the fundamental, the mid frequency partials (2,3,4) and the residual upper partials. Spoken vowels fall into unique locations in this space which indicates that it may be perceptually significant. Slawson [43] proposed that a two dimensional F1-F2 plane based on the two most significant formants is fundamental to timbre perception, and that timbres are naturally ordered in this space. He also attempted to show that a path representing a timbre sequence in this space can be moved or transposed whilst maintaining its perceptual order. The Multidimensional Scaling (MDS) technique allows comparison scales to be used to construct spatial structures where quantitative distance is a measure of qualitative variables. This technique was applied by Plomp [42] to find the dimensions of timbre. His results showed that three orthogonal axes accounted for 90.4% of the variation in the 9 steady state musical instrument timbres used in the experiment. A difficulty with the MDS method is that the dimensions found may not have a direct physical interpretation. Plomp used a principle components analysis of the band rate spectra of the sounds as a means of ascertaining the meaning of the MDS results. One of Plomps conclusions was that sharpness is the main attribute of timbre, and that it is related to the centre of gravity of loudness on a frequency scale in which critical bandwidths have equal lengths. 47.1.2 Temporal timbre In many sounds the spectral components are not constant over time, but change in number, amplitude and relationship to each other over the course of the event. The importance of the temporal changes was shown by Wedin and Goude [9] who found 33% successful identification of a musical instrument from a sample versus 20% from just the steady state portion of the sample. Saldhana and Corso [9] found 45% correct identification of a sampled instrument from the attack transient plus part of the steady state versus 32% for steady state plus decay transient. An MDS study of complete musical instrument samples was made by Wessel [39] who found a temporal and a spectral dimension to be the most important. The temporal dimension was grouped by instrument family: trumpet-trombone-French horn, oboe-bassoon-clarinet, and violin-viola-cello. In the spectral dimension sounds with most energy in the low harmonics were at one extreme, and those with energy concentrated in the upper harmonics at the other. In another MDS study Grey [39] equalised the loudness, pitch and duration of each timbre by re-synthesising 16 musical instruments. He found that a 3 dimensional space was required to explain the results. The cartesian space consisted of a temporal plane defined by two orthogonal temporal dimensions, and a vertical spectral dimension. Graphic 3d visualisations of the results were used to show where the instruments were positioned relative to each other in the space. Grey analysed the results with respect to the amplitude, frequency, time spectrograms of the data points. His conclusion was that the Y axis related to spectral energy distribution, whilst the X and Z axes relate to temporal properties of timbre, covarying with synchronicity in the development of upper harmonics and the presence of low-energy high frequency noise during the attack segment. The temporal development of the spectrum over time is used as a cue to the material type in the “everyday” synthesis developed by Gaver [38]. This algorithm is an attempt to control sounds along the dimensions of materials and events, and is modelled upon a mallet hitting a bar made of wood or metal. The rate of decay of the upper partials defines the material - wood is damped and these partials decay rapidly whereas in metal they continue to ring much longer. The spectral brightness of the sound conveys the hardness of the mallet, and the fundamental frequency indicates the length of the bar. 48. From Colour Model to Sound Model In this section a model of sound is derived by considering the components of the HSL colour model as a framework. A perceptually organised cycle of timbres is constructed using the hue circle as a template. A naturally ordered and separable radial component of timbre is added which has a similar role to the saturation component of the colour model. The cyclindrical polar geometric model is completed by a vertical axis which is analogous with the lightness axis in colour. 48.1 The Hue Circle and the Timbre Circle The central role of the hue circle in the colour model leads to the consideration of a similar structure as the basis for the sound model. The hue circle consists of a continuous cycle of hues designated by an angle from 0 to 360 degrees. The arrangement, shown in Figure 80, has complementary hues diametrically opposite each other, and this is achieved by positioning the hues according to the orthogonal red-green and blue-yellow axes of colour opponent theory. Yellow S Yellow-Green H Green Orange Red Magenta Cyan Blue Figure 80. The hue circle Although the hue circle is continuous, the perception of hue is categorical. This is important in the application of colour to represent data in maps, graphs and visualisations where hue is used to separate distinct regions or display classification data rather than for smooth gradients or continuous variables. Humans do not have a consistent intuition of a natural ordering for hue, and the ordering of hue sequences must be explicitly remembered. Like hue, timbre is also a categorical quality which has no natural order. The categorical nature of hue and timbre perception leads to the proposal that timbre may assume a similar role to hue in perceptual presentations, and a circle of timbres, analogous with the hue circle, may provide the foundation for a sound model. This arrangement, which we call the Timbre Circle, has the most similar timbres adjacent to each other and the most dissimilar timbres diametrically opposite each other. To create a cycle the sequence should be seamless - there should be no apparent beginning or end when the it is continuously repeated. A Timbre Circle which meets these requirements can be constructed using an underlying orthogonal basis for ordering, and the salient dimensions identified in timbre research provide a number of possible options. Timbre circles based on the research of Von Bismarck, Slawson and Grey are shown in Figure 81. The Von Bismarck and Slawson Timbre Circles are spectrally based, the Grey Timbre Circle is temporally based . Onset noise F2 high Sharp Timbre Timbre T Scattered Compact Timbre T T F1 high F1 low Spread attack F2 low Dull Slawson Von Bismarck Synchronous attack No onset noise Grey Figure 81. Timbre Circles ordered by underlying orthogonal axes 48.2 Is timbre really circular? To ascertain whether timbre circularity is a real phenomenon the Timbre Circles shown in Figure 81 were synthesised. A circular path was traced through each coordinate system and a judgement of whether the sequence really did sound cyclic was made by listening for smoothness and seamlessness as it was repeated. The effectiveness of the underlying axes as a basis for perceptual ordering was judged by similarity of adjacent timbres and dissimilarity of opposite timbres. The equipment used was a Sun Sparcstation 10 unix workstation which includes 16 bit, 44.8 kHz audio as standard hardware. Csound [44] was used for sound synthesis and processing. An “instrument” was implemented for each of the timbre spaces. Each instrument had an interface controlled by X,Y parameters denoting a position in a cartesian coordinate system corresponding with timbre bases. The author found that repetitions of the resulting sequences did sound cyclic in each case, supporting the viability of the Timbre Circle. It was possible to hear “complementary” timbres as distinctly different sounds lying opposite each other at angles all around the circle, not just at the 0 and 90 degree positions aligned with the underlying axes. The implementation of each instrument is briefly described below to allow the reader to confirm the phenomenon. 48.2.1 Von Bismarck The ordering axes for the Von Bismarck Timbre Circle are compact/scattered and dull/bright. The instrument is implemented as a source/filter. The source consists of bands of noise centred at the first 20 harmonic frequencies. The X axis linearly controls the width of each noise band in the range 0.1 Hz to f0/2 Hz, so that at the compact end the trend is toward a pulse train and at the scattered end it is a band of noise. The brightness of the source is adjusted by the Y axis which linearly controls the centre frequency of a 2nd order bandpass filter with bandwidth bw = 5f0 in the range f0 to f20. 48.2.2 Slawson The ordering axes for the Slawson Timbre Circle are based on the F1-F2 formant space. The X,Y coordinates each control the centre frequency of a spectral formant region. The X axis linearly controls the position of the spectral peak in the formant region f0 to f5. The Y axis linearly controls the position of the spectral peak in the formant region f6 to f10. 48.2.3 Grey The ordering axes for the Grey Timbre Circle come from the temporal dimensions identified in Grey’s MDS study of timbre. The X axis has endpoints synchronous/spread and linearly controls the rise times of the upper harmonics in the range 0 to 0.3 seconds. The Y axis linearly controls the intensity of 0.1 seconds of an inharmonic high frequency onset noise which is mixed with the sound. 48.3 The Chromatic Wheel and the Timbre Wheel Colour chromaticness is a complex attribute consisting of hue and saturation. The chromatic wheel is made up of the hue circle filled in by the saturation moderator which radiates outward from the grey centre. Saturation has a natural order from grey through pastel shades to the most colourful hue. A similar scheme for sound can be created by identifying a timbre moderator which has a natural order and which is independent from the axes which order the Timbre Circle. This radial component should cause a smooth and seamless transition through the centre of the wheel in any direction, so that all radial sequences tend toward a single point at the centre. We consider the spectral and temporal dimensions of timbre as orthogonal components which make up a complex attribute analogous to chromaticness of colour. If one of these attributes is the basis for the timbre circle the other provides the independent radial modifier. The two possible forms of the Timbre Wheel are shown in Figure 82. S Spectral T Temporal T Temporal S Spectral Figure 82. Two forms of the Timbre Wheel based on a complex timbre attribute The temporal Grey Timbre Circle and a spectral radial modifier will now be further developed to explore the viability of the Timbre Wheel. The spectral attribute which is orthogonal to the temporal plane in Grey’s timbre space corresponds with Bregman’s [36] rough definition of brightness as “the balance of low and high partials in the spectrum”. Brightness satisfies the requirement for a perceptual order as shown by the ordering of timbres along the spectral dimension in Grey’s space, and the psychoacoustic sharpness scales constructed by Von Bismarck [46]. Brightness satisfies the requirement for a smooth transition across the centre of the wheel because the dullest sound is defined by a sinusoid at the fundamental and this is an atomic component of most spectra. There is an analogy between the “dull” point in this Timbre Wheel and the “grey” point in the chromatic wheel. The Grey Timbre Wheel was put to the test by extending the Csound implementation of the Grey Timbre Circle with a radial Brightness parameter which controlled a low pass filter and so attenuated the upper harmonics and shifted the balance point in the spectrum. The phenomenon was examined using sequences shown in Figure 83. A spiral path was created which rotated through the timbres and smoothly decreased in perceived brightness. Diagonal paths at several different angles demonstrated a smooth transition across the central axis of the wheel. B B T T Figure 83. Timbre sequences used to confirm the properties of the Timbre Wheel 48.4 A naturally ordered geometric model of sound Having established a Timbre Wheel the sound model can be completed by choosing an auditory counterpart to the lightness axis in the HSL colour model. The lightness axis is sometimes called the grey axis because all the shades of grey from black to white lie on it. In the sound model this axis will consist of all the dull points, which makes for a synaesthetically consistent analogy between greyness and dullness. The vertical axis must be separable from timbre and must have a natural order. Loudness and pitch are both potential candidates with low/high and down/up height associations commonly used to describe relative values. Pitch was chosen to complete the system because calibrated pitch scales are common on sound output devices, and because loudness has the drawbacks that it can be physiologically damaging, it is context sensitive, it can override other aspects of a sound, and most devices have a manual loudness adjustment which can affect the dynamic range. The TBP sound model, shown in Figure 84, consists of a complex Timbre Wheel made up of a temporal timbre angle (T), a spectral brightness radius (B), which is fixed upon a vertical dull axis ordered by pitch height (P). Pitch (P) dull axis Brightness (B) Timbre (T) Timbre Wheel Figure 84. The TBP sound model The generality of this framework is demonstrated by applying it to represent the timbre space defined by Gaver’s everyday synthesis algorithm. The source timbres are metal and wood and these are temporally defined by the rate of decay of the harmonics [38]. These two materials can be assigned opposite positions in a temporal Timbre Circle which has rate of decay as an underlying axis of arrangement. The hardness of the mallet corresponds with spectral brightness and hence the radial Brightness component, whilst the length of the bar correlates with the Pitch dimension. Thus Gaver’s algorithm is represented by a vertical slice through the TBP model. 49. The SoundChooser The SoundChooser is a graphical user interface to the TBP sound model which is designed to parallel the HSL colour chooser used for picking colours in many computer applications. The widget is illustrated in Figure 85, and consists of a dial with an arm which rotates through 360 degrees. This arm can be directly manipulated to select a timbre angle, or can be set specifically with a numeric entry box. On the dial arm is a bead which represents the radial brightness. This bead may be directly manipulated or set using a slider or a numeric entry. Pitch height is controlled with a vertical slider, or a numeric entry. A “play” button activates the current sound and a “cycle” button causes the dial to rotate and generate a timbre circle sequence. Different timbre circles can be selected with the numeric entry widget at the top of the panel, for example the Von Bismarck Timbre Circle is 1, the Slawson Timbre Circle is 2 and the Grey Timbre Circle is 3. The graphical user interface enables spatial interactions with the TBP sound model. The sounds are arranged in an intuitive manner because the axes are perceptually aligned. This arrangement makes it easy to search for particular sounds, to remember where sounds are, and to compare sounds.The user interface of the SoundChooser was implemented with tk/tcl [41] and the coordinates were sent to the Csound instrument through a unix pipe. Figure 85. SoundChooser graphical interface to the TBP sound model 50. Conclusions The Timbre, Brightness, Pitch (TBP) sound model was developed to have properties similar to those of the HSL colour model. The sound model was founded on the recognition of a link between the categorical nature of timbre and hue perception. The Timbre Circle was defined and the concept was confirmed by generating circular timbre sequences using Csound instruments. The Timbre Circle was extended with a radial Brightness modifier to create the Timbre Wheel, and this was tested by creating a timbre spiral sequence. Pitch was chosen over loudness as the vertical axis and a SoundChooser widget was implemented as an interface for selecting sounds. The advantages of the TBP sound model are: • Natural specification, comparison and matching - Timbre, Brightness, and Pitch are perceptually separable attributes of sounds. • Natural order - the Timbre Circle is ordered by an underlying perceptually orthogonal basis which arranges complementary timbres diametrically opposite each other. The Brightness and Pitch axes both have a natural order. • Independent control of perceptually aligned parameters - Timbre, Brightness, and Pitch can be changed independently in the TBP model. • Geometric interface - the 3d sound solid provides the opportunity for spatial interaction with sounds. • Transportability - the TBP model may be used to specify sounds in natural terms rather than device coordinates. 50.1 Limitations The colour model provides access to the entire range of perceptible colours. This is not the case with the TBP sound model. This limitation is due to the multidimensional nature of timbre. The selection of the axes which underly the Timbre Circle constrains the range of timbres to those which can be described in terms of those axes. Access to the greatest possible range of sounds can be enabled by using the most perceptually salient axes. The value of the Timbre Circle is that it allows timbres which are spanned by the nominated axes to be ordered in terms of those axes, irrespective of how they vary in other aspects. 50.2 Further work To present structure in sound it is necessary to preserve relationships between points in the sound space. We intend to use the TBP sound model as the basis for a perceptually linearised sound space which will provide a framework for mapping scientific data to sound. 51. Acknowledgements This work was funded by a postgraduate research scholarship from the Commonwealth Scientific and Industrial Research Organisation, Division of Information Technology, Australia. I would like to thank Dr. Phil Robertson of the CSIRO for his support, guidance, and expertise. I would like to thank Mr. David Worrall of the Australian Centre for Arts and Technology for his perspective and advice. 52. References [36] Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organisation of Sound, The MIT Press, Cambridge. [37] Fletcher N.H and Rossing T.D. (1991). The Physics of Musical Instruments, Springer-Verlag, New York. [38] Gaver W.W. (1993). Synthesizing Auditory Icons, Conference Proceedings of InterCHI `93, pp 228-235, ACM SigCHI 1993. [39] Grey J.M. (1975). Exploration of Musical Timbre, Phd. Thesis, Report No. STAN-M-2, CCRMA Dept. of Music, Stanford University. [40] Hunt, R.G.W. (1987). Measuring Colour, Ellis Horwood, Chichester. [41] Ousterhout J.K. (1994), Tcl and the Tk Toolkit, Addison-Wesley, Reading, Mass. [42] Plomp, R. (1976). Aspects of Tone Sensation, Academic Press, London. [43] Slawson A.W. (1968). “Vowel quality and musical timbre as functions of spectrum envelope and fundamental frequency”, Journal of Acoustical Society of America, 43, pp 87-101. [44] Vercoe B. (1991), CSOUND, A Manual for the Audio Processing System and Supporting Programs, Media Laboratory, M.I.T, Cambridge, Mass. [45] von Bismarck G. (1974). “Timbre of Steady State Sounds : a factorial investigation of its verbal attributes”, Acustica 30, pp 146-159. [46] von Bismarck G. (1974). “Sharpness as an Attribute of the Timbre of Steady Sounds”, Acustica 30, pp 159.