Synthesizing singing: what’s the buzz?
Sten Ternström* and David Howard**
*Dept of Speech, Music and Hearing, Kungliga Tekniska Högskolan, Stockholm,
Sweden
**Dept of Electronics, University of York, UK
The voice quality of synthesizers that are based on source-filter modeling is often perceived as being too mechanical and lacking in naturalness. Some of this criticism can be ascribed to phonetic shortcomings such as inappropriate prosody and improbable renderings of transitions between phonemes. Even on sustained vowels, however, source-filter formant synthesis is often found wanting, for example as regards appropriate perturbations of fundamental frequency (F0), realistic aspirative noise, and source spectrum control. In particular, source-filter synthesizers seem to have a strong tendency for vowels to sound buzzy and metallic, to a degree that is rarely found in the output from live speakers or singers. In this investigation, we attempt to identify some acoustic features that cue the perception of buzziness.
Guided by informal experimentation, we hypothesize the following: perceived buzziness will increase when (a) there is more energy in a frequency band between 5 and 8 kHz; and/or (b) the F0 is more stationary.
Listening tests are underway in which subjects rate the buzziness of a number of stimuli. The stimuli are both natural and synthetic, belonging to one of six categories, as follows:
For each category, filtered versions of the above stimulus tones were generated,
with sound level in a frequency band around 6 kHz being varied in five steps.
All stimuli were matched for equal equivalent level. In order to reduce listener
boredom and fatigue, additional redundant stimuli, with different pitches and
vowels, were interspersed amongst the test stimuli.
The results will show to what extent high-frequency content is a predictor of
buzziness, and whether or not stationarity in the fundamental frequency exhibits
a significant interaction effect. The responses to stimuli in categories 1 and
2 may serve to indicate the possible relevance to buzziness of other factors
in addition to F0 stationarity.
Work supported by STINT, the Swedish Foundation for International
Cooperation in Research and Higher Education, contract IG2002-2049