26 Feb 96
Steve Tripp

I'm not an expert, but I play one on TV, so I will comment on some of the language-related issues raised by SOUTH AFRICA.

[quoting Alessi's paper] In five to ten years we will have small, portable, and inexpensive (by which I mean a few hundred dollars) hand-held computers, much like a Newton or personal digital assistant, which will have built in scanning and voice synthesis and be able to read. Wave one in front of a newspaper or magazine and it will read the pages aloud for you. Flip the pages of a textbook and it will read you the book. If this comes to pass, or perhaps I should say when this comes to pass because even if it is not in five or ten years, it is inevitable, what are the implications for teaching children to read?

This technology is more or less available now. The weakest link is the scanning of various fonts. Voice synthesis is now available for your WWW Browser . The problem of weird intonation is within the capability of articulatory phonetics to solve soon. However the problem is not this simple. In many parts of the world, spoken language and written language differ considerably. I'm thinking of Arabic, Greek, and Swiss German. Reading is not confined to texts. We read labels, roadsigns, schedules, maps, and posters. I doubt that handheld devices will be able to deal with these for a long time. Listening has some advantages over reading. You can listen while you are doing other things, but you cannot read easily while composing text as I am doing now. Many of you may be subscribed to TechBabble as I am, and I find that it is useful to play the news in the background while I'm doing other things. But if something is important or technical I always print it out, so I can digest it carefully.

What's next? In ten or 15 years that same cheap, hand-held computer will be able to take dictation and write. If so, what are the implications for learning to write?

According to my sources, this is overly optimistic. People have been predicting this for decades and we are hardly any closer to that goal. Speech recognition needs to deal with several parameters: Voice differences, vocabulary, and speed. All of these are extraordinarily difficult problems. Let me illustrate. When I am speaking, you can infer numerous things besides the content of my speech: (1) I am an American. (2) I have a vaguely eastern accent, with some other mixtures. (3) My age. (4) My sex. (5) Maybe whether I'm gay or not. (6) Whether I am tired. (7) Whether I am sincere. (8) Whether I am in a hurry. All of this is noise to the computer trying to determine what I am saying. There is some variety in the way we speak. Many Americans say "the problem is, is that..." I say "the problem is that..." Many Americans say "All's you have to do..." Many people pronounce something "sump'n." There is a general rule in English that you can omit the second of three consecutive consonants (Ol'man, coas'guard, two mon's, fis'fight). NPR is pronounced MPR. To put it mildly, this is a big problem. The computer will need to be as smart as you and I (or a five-year old). Linguists working with computers are pessimistic that this problem can be solved in the near future.

I should also point out that some languages rely on the written form more than English. Japanese, for example, creates new words by exploiting the two ways of pronouncing characters (on-yomi and kun-yomi). It is the single WRITTEN form which is the link between the two entirely different spoken forms. Not knowing the written language is unthinkable to people with languages like this.

Next? In 15 to 20 years those increasingly small and inexpensive computers will also translate between many of the world's languages.

Have you tried any of the translation software? I have tried Japanese-English translation software. To say it is unreliable is to be kind. Essentially, the problem of translation is similar to the problem of voice recognition. It requires a machine that is virtually as smart as a human. If you constrain the domain of discourse and don't require too much anaphoric reference you can get some reliability. But free continuous text? That's a long way off. (Meaning nobody yet understands how it can be done.)

Steve Alessi and Mike Spector raise the issue of the effects of reading and writing vs. listening and speaking (and learning to write with two hands as opposed to one). Please note that this contradicts Clark's "mere vehicles" hypothesis. (Which doesn't bother me.)

Steven Tripp, Professor
Center for Language Research
University of Aizu
Tsuruga, Ikki-machi
Aizu-Wakamatsu City
965-80, Japan

Phone: +81-242-37-2584
Fax: +81-242-37-2599
E-mail: tripp@u-aizu.ac.jp