Anyone who has had the frustrating experience of telling the nice automated voice at the other end of a customer service help line "no, I meant change of
address" over and over again, only to be prompted to repeat themselves, knows that speech recognition software still has a long way to go.
Even more difficult is getting software to not only recognize what you're saying, but translate it into another language. The most advanced translation programs still only get it right about 75 per cent of the time—they still get one word out of every four or five wrong.
Until now, at least. Last month Microsoft's chief research officer, Rick Rashid, unveiled what appears to be a breakthrough in speech recognition and translation software. The new software, which provides simultaneous translation, not only cuts down substantially on errors, it mimics the voice of the original speaker when it produces the translation. (Video of a key part of Rashid's presentation is
online; the voice recognition and translation demonstration begins at the seven minute mark.)
The breakthrough is based on research conducted by Microsoft and the University of Toronto, which was published in 2010.
"By using a technique called Deep Neural Networks," writes Rashid
in a recent blog post, "which is patterned after human brain behavior, researchers were able to train more discriminative and better speech recognizers than previous methods."
Essentially, this works by processing a great deal more data than previous speech recognition programs had done, allowing the software to more closely mimic the human mind in its attempt to process language. The result is a 30 per cent decrease in errors, according to Rashid, and a much more natural translation experience. That is, if hearing your own voice speaking another language, one you don't even know, doesn't freak you out.
Writer: Hamutal Dotan
Source: Rick Rashid, Chief Research Officer, Microsoft