Teaching computers to 'talk'
Linguist adds human element to computer-generated voices Some people are content to teach their dogs to sit or beg. Linguist John Goldsmith is teaching his computer to speak.
Goldsmith presses a button and his computer says, "Good morning, welcome to the University of Chicago" in a voice that sounds nearly human, with the flow and inflections of normal speech. At work on a two-part project for Microsoft Corp., Goldsmith is writing a computer program that will let computers "speak" in a way eerily similar to the human voice, while Microsoft engineers are developing voice-recognition software.
"I'm working on a project to get the computer to pronounce text -- any text -- in a natural way," said Goldsmith, Professor in Linguistics. "Usually, when computers sound good it's because someone recorded those words, and the computer is playing it back. But I want a human-sounding voice that is completely computer generated."
The key to this, Goldsmith said, is using linguistic rules. "Engineers who work on digital signal processing have clever ways of fashioning together the consonants and vowels of a computer's speech, but modulating computer sounds so that they sound like an intelligent, human speaker takes linguistic analysis."
Goldsmith, whose early work was one of the inspirations for AT&T's intonation system, has written a program that is able to figure out the natural "American English intonational pathways" for each sentence. To do this, he first had to figure out intonation rules for American English. "This is an area that has fascinated people for a long time," he said, "but we're just beginning to chart the waters now -- and much basic research remains to be done."
Over the winter quarter, Goldsmith hopes to refine the intonations. "For example, questions are really hard. If you use the wrong intonation for a question, it sounds funny. If I say, "Where's the coffee?" with a falling intonation through the sentence but a sharp rise on the last syllable, it can sound quite natural.
"But if I use the same intonation on the question, 'Where are you going this summer?' it sounds quite odd; it would be more natural to have a sharp fall after 'going,' ending on three low-pitched syllables. We have to be able to give the computer explicit rules for deciding which intonation to use with any given question -- not an easy task, by any means," Goldsmith said. "This is the challenge -- not to just have a good reader of text, but to make people feel like the computer is actually having a conversation with them."
The program that Goldsmith developed takes a sentence supplied by the computer user or a programmer and feeds it to a Microsoft program that analyzes the grammar of the sentence. The grammar program then provides Goldsmith's program with the correct parts of speech and phonemes for each word, as well as a grammatical analysis. Phonemes are sound groups -- for instance, "welcome" would be "W-EH-L-K-AHO-M. Finally, Goldsmith's program tells the voice synthesizer what the phonemes are and what pitch and duration each should be.
Virtually instantaneously, the computer "speaks" in rounded Midwestern tones.
Goldsmith became involved with Microsoft through his wife, Jessie Pinkham -- who herself became involved through a University alumnus-turned-Microsoft researcher, Joseph Pentheroudakis (Ph.D.'77). When Microsoft was looking for someone to work on a program to parse French grammar, Pentheroudakis called on her. In turn, Pinkham recruited Chicago Linguistics Department alumnae Jiang Zixin (Ph.D.'91) and Hisami Suzuki, who is writing her dissertation, to work on different projects. She also recruited her husband, Goldsmith.
Pinkham is now a full-time researcher at Microsoft in Seattle, so Goldsmith also considers Washington his home, spending three days a week there.
"Microsoft has a strong University of Chicago connection now," Goldsmith said. "In addition to the others who work there, we have a linguistics graduate student -- Jon Bernard, a third year -- who interned with Microsoft last summer."
Goldsmith hopes that the Linguistics Department will be able to provide more of a background in computational linguistics for students who are interested. "We're thinking about setting up a class so that linguistics students can learn how to program," he said.
Goldsmith is thrilled by the potential applications for computational linguistics in general and his work in particular. "One thing that people in the computer business are excited about -- but which I find pretty boring -- is that people could call up a computer and have their e-mail read to the them over the phone," he said. "And another possibility, of course, is that it can be used to help the elderly, or those with vision disabilities, to read books or Web pages."
"But when this is refined," he added, "it will totally revolutionize the computer industry. When we are able to carry on a natural conversation with a computer, we'll attribute personalities to them. We'll name our appliances. Computers now are just tools, but this can turn them into C3PO from Star Wars."
-- Jennifer Vanasco