By Michael Kanellos, 9 July 2003 07:32
NEWS While fully conversant computers, such as Hal in 2001: A Space Odyssey, may still be a long way off, giant strides are being made in speech recognition, with Microsoft leading the way. Microsoft today releases the first public beta of its Speech Server, which will let servers better handle voice commands, as well as a third beta of its Speech Application software developer kit. A partner program has begun to encourage third-party developers to promote Speech Server, which will debut in the first half of 2004. Speech Server, formerly .Net Speech Platform, will attempt to reduce the cost of creating automated phone response systems and it coincides with other phone-computer efforts at Microsoft. Automated response systems such as those used by many airlines can cost as much as $1m - too expensive for the bulk of the business market, said Kai-Fu Lee, vice president of Microsoft's speech technologies group. "Only a very small percentage of the call centre opportunity has been realised to date," Lee said. IBM, meanwhile, is using its research labs and services divisions to create showcase applications for large corporations. Financial services firm T Rowe Price has installed an account management system from Big Blue that lets its customers conduct transactions through common speech requests. Eugene Cox, director of mobile solutions in IBM's pervasive computing unit, said: "You can say, 'I'd like to make a trade,' and it will say 'What kind?'" Computers that can facilitate conversations between two people speaking different languages - a kiosk for dispensing advice to tourists who speak English and tour guides who speak only Chinese, for example - will also emerge from IBM's labs by year-end, according to the company. One challenge is that humans typically don't follow rigid rules when speaking. "Yes,", "yeah", "yep," "ya-ha," and "uh-huh" all mean the same thing to people, but present bewildering choices to machines programmed to accept rigidly defined input. When speaking quickly, people tend to use different grammar, making machine transcription even more difficult. Background noise and filtering have been persistent challenges as well. Now the directions of both research and marketing have changed. Rather than developing a machine that can converse, researchers are creating computers that can understand speech as a function of probability, the basis of much of Microsoft's artificial intelligence work. Yoda, a speech-to-text engine under development at Microsoft, can turn spoken word into coherent text email messages by studying a user's habits, said Alex Acero, manager of the speech research group at Microsoft. Yoda doesn't look for an object to follow a verb, but it knows that a particular sound pattern ("meet") will likely be followed by a limited number of your now familiar sound patterns ("in the conference room" or "tomorrow"). Acero said: "The way we are trying to teach machines to speak is very different than the way humans do it. It is still very primitive, but it is more intelligent than current applications." Michael Kanellos writes for News.com
In order to post a comment you need to be registered and logged in.
Log in or create your silicon.com account below