Sound and speech processing

by Koenraad De Smedt

Almost all sound which is recorded and played back today is at some point stored or transmitted in digital form. This holds for music as well as telephone speech. With the advent of digital radio, it may soon hold for broadcasting as well. The familiar audio CD format with a 16-bit encoding at a sampling rate of 44.1 Khz was the first widespread standard for digital sound encoding. Newer formats offer either better quality, better use of bandwidth, or both. MPEG-3, which offers a signal compression at around 12:1, is becoming popular for transmission on the Internet. Digitized sound signals on passive media are in themselves not especially interesting to humanities scholars. In contrast, computational methods for their processing and interpretation in terms of music or spoken language is clearly relevant to disciplines such as music and phonetics, which often fall within humanities faculties.

It is rather obvious that music and phonetics scholars can hardly be satisfied with textbooks. The use of computational techniques allows interactive and multimedia presentations, which are more useful than single-mode presentations. Consider, as a typical example, the McGurk effect, which consists of the fact that when a person hears "ba", while watching a face that says "ga", the combined signal is interpreted as "da". Obviously, the student can only fully appreciate this effect when it is heard and seen, rather than when it is read from a book. Using an interactive video with a demonstration of the effect, students can experiment at will (including by closing their eyes or turning off the sound).

Unfortunately, JAVA is not ideal for real-time sound processing due to current shortcomings in its mathematical operations. However, new sound handling systems, such as MATLAB, support powerful signal processing in a modern interface. Coupled with computer sound cards, such systems allow an on-line demonstration of sound analysis and synthesis. In the speech and hearing community, efforts are undertaken to exploit these conditions for developing modern educational tools aimed at students in music, phonetics and related disciplines. Examples of useful interactive demonstrations include the segregation of interleaved melodies; the effects of music quantization; the perception and identification of concurrent vowels, etc.

The following sound demonstrations have been developed as pilots for training materials in projects sponsored by ELSNET: (1) Models of Speech Perception by Cecile Fougeron and Francesco Cutugno http://www.unige.ch/fapse/PSY/persons/frauenfelder/SP/Model_speech.html

(2) The Linear Predictive Vocoder by Klaus Fellbaum http://www.kt.tu-cottbus.de/speech-analysis/

(3) Interactive demonstrations in speech and hearing by Martin Cooke http://www.dcs.shef.ac.uk/~martin.

We refer also to the volumes by Bloothooft et al. (1997-1999) on the work in Speech Communication Sciences.

-----

Authorship attribution and stylistic studies

An important way of using the computer in textual studies consists of applying quantitative linguistic methods. Cases in point are computational studies in authorship attribution and stylistic studies. These types of computational studies both involve refined ways of counting some kinds of words in texts. Statistic comparisons of such counts is performed with the hope of discovering relevant differences or similaries between texts. In the case of authorship attribution, the aim is to discover a kind of literary 'fingerprint' which distinguishes his language use from that of other authors. In the case of stylistic studies, the aim is to classify texts according to relevant characteristics. In recent times, it has been convincingly demonstrated that such statistical methods may be objectively better than classical methods of text criticism.

However, computational authorship attribution and stylistic studies constitute fields which are so interdisciplinary that they may present difficulties in rigid educational structures. Scholars of literature are often reluctant to take on board statistical and linguistic methods. At higher education institutions across Europe, the use of advanced computational methods in the literature curriculum is nearly absent. This suggests that efforts should be undertaken to break down discipline boundaries and stimulate methodical innovations in the text and literature fields.

-----