txt2sound :: a new audio interaction with the written words by e.g.ø

txt2sound
a new audio interaction with the written words based on the graphic forms of letters.

::introduction::

::how::

::concept::

::algorithm::

::conclusion::

::bibliography::

Introduction

The work submitted by egø for con|text is an algorithm to create sound from written text (as image format). The algorithm is based on the idea of the optophone¹ developed also by P.B.L. Meijer (see [4]). When egø wrote his algorithm, he didn't know the existence of the optophone and of the research made by Mr. Mejer, however, for this occasion, he wants to propose the algorithm in this context without any intention of stealing other people’s ideas. The words converted to sound are capital letters, because the form of these letters comes from the oldest signs of our western culture (see [3]).

A work by e.g.ø for con|text at stasisfield.com.
Contacts
Artspace
Copyleft info
May 2002.

How it works

It starts from an image with the written text: the algorithm create a map of the image from the spatial frequencies domain to the audio frequencies domain (see figure below). Vertical axis is the frequency (here is from 0 to 11 kHz) while horizontal axis is time.

Let's consider a column of the image: it is made by pixels with values from 0 to 1 (zero is a white pixel and one is a black pixel)². The lowest pixel in the column corresponds to the lowest frequency and so the highest pixel in the column corresponds to the highest frequency. The algorithm makes a sum of the various frequencies (sinusoids) by relating to each sine wave the amplitude of the corresponding pixel (from zero to one). For example, in a vertical line | the column is made by pixels equal to one: so it is a superimposition of all the frequencies as in the formula:

that is a white noise (white noise is made by uncorrelated superimposition of all frequencies). Instead, if the image contains a horizontal line _ only one frequency will be heard. The map between text and sound is a linear map (this is due to the properties of the Fourier transform; [6]), so, from the generated sound, we are able to get its original meaning, to recover the written words with the use of a spectrograph ³.

How the algorithm works.

The concept

Only the "graphical" aspect of words, the single letters, the sign, determine a different audio perception; this coding method become a sort of a Morse code which is no more limited to the pulsating of a single frequency, but extended to all audible (and not) frequencies. Gadamer formerly said that "in written text it is asserted the separation of language from its effective being spoken" [1], now words, letters, are no more related to sounds of past culture [3], there are no more "labial" sounds, "guttural" sounds, etc... the whole is shifted towards a new sound, that Perniola would call "inorganic" ("sounds, spaces, objects, words: when they are removed from their usefulness, they acquire an undetermined and fresher aspect, more splendent" [2]). A sound that is "out of the person", reducing words to the superimposition of pure tones. Listening to words like MAMA [mp3], LOVE [mp3], SEX [mp3], ART [mp3], KILL [mp3] in their new sound can left us dazed, puzzled; we are no more able to distinguish those words, those sounds that we would immediately associate with the basic feelings they convey. Also mathematical signs full of symbolism like "+" [mp3] or "-" [mp3] now become almost identical. Paradoxically only the blank space, the absence of information, the nothingness, with its symbolism, stays unchanged and so it is able to show us a way, to lead us in the listening of these new sounds.

"In written text it is asserted the separation of language from its effective being spoken".

Hans-Georg Gadamer

The algorithm

Now let's see how the algorithm works by using a pseudocode notation [5] (working implementations obviously change depending on the programming language used):



1. col[] = getColumnFromImage();  // col[] is a vector containing the pixels 
				  // values of the column.

2. a[] = [flip(col[]), col[]];    // the column is extended with even symmetry 
				  // to be computed in the FFT algorithm.

3. a[] = randomComplexNumbers[]*a[];  // the element in the vector for the FFT
				      // computation must have random phases
				      // (randomComplexNumber[] is a vector of random 
				      // complex numbers whose absolute value is one). 
				      // Random phase is needed because we want the 
				      // different sines to be uncorrelated (otherwise, for 
				      // example, white noise can't be built).

4. x[] = iFFT(a[]);	// inverse fft algorithm (from frequency to time domain).
			// now x contains the audio signal.

...and these 4 steps are repeated for each column of the image.

Final work

Finally, a famous aphorism by the poet Paul Claudel summarizes and concludes this egø's work.

The poem is not made from these letters that I drive in like nails, but of the white which remains on the paper.

Paul Claudel ^mp3

(English translation found on the web of "O mon âme! Le poème n’est point fait de ces lettres que je plante comme des clous, mais du blanc qui reste sur le papier").

"The poem is not made from these letters that I drive in like nails, but of the white which remains on the paper".

Paul Claudel

Bibliography

[1]. Hans-Georg Gadamer "Verità e metodo" ed. Studi Bompiani 1983 (pp. 441-490). [Available in English as "Truth and Method"]
[2]. Mario Perniola "Il Sex appeal dell’inorganico" ("the sex appeal of the inorganic") ed. Einaudi 1994, pp. 82-89 (pp. 162-168).
[3]. Adrian Frutiger "Segni & Simboli" ed. Stampa alternativa/Graffiti 1998 (pp. 121-133). [Available in English as "Signs and Symbols"]
[4]. P.B.L. Meijer, "An Experimental System for Auditory Image Representations", IEEE Transactions on Biomedical Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992. Available on the web here. Check also his project "the vOICe".
[5]. Thomas H. Cormen, Charles E. Leierson, Ronald L. Rivest "Algorithms" MIT press 1990 (pages 1-20, 776-800).
[6]. Alan V. Oppenheim, Ronald W. Schafer "Discrete-Time Signal Processing" Prentice Hall 1999 (pages 541-575, 629-650).

All work by e.g.ø. 2002. Copyrighted as stated in the GNU General Public License that in this case it means:
no copyright, but please quote the source and the author when using this material!

¹ The original idea of the optophone (to show images through sounds) was born in the beginning of 1900. More can be found here.
² Usually an image is coded with 0 corresponding to black and 1 to white; here the coding was inverted for obvious reasons.
³ A spectrograph is able to show sound like in the image above. If you go here you can download a shareware spectrograph for MSWindows.