WIRED:It may not fulfil the vision of 'Star Trek' but speech technology can still find plenty of niche markets to succeed in, writes DANNY O'BRIEN
IT’S A cruel title, with a cruel premise: Rest in Peas by Seattle writer Robert Fortner was an article that performed an Emperor’s New Clothes attack on speech recognition.
Not only had usable, real-time comprehension of free form human speech not yet arrived, Fortner asserted, speech recognition rates reached a plateau in 2000, and have not improved since.
Experts in the field argued with Fortner about his grim analysis of a general stagnation in research, but nobody seems to have argued with the basic point of Rest in Peas that the instant computer understanding of human speech we see in Star Trek seems about as forthcoming as food replication or instant tele-transportation.
And yet, speech recognition continues to pop up in the corners of our lives. If you bought the latest Android phones, you’ll be faced with an old-fashioned microphone icon with every text entry field.
From searching on Google to sending a text message, the telephones will strive to convert what you say into plain text. It seems to get it right enough for many owners to use the feature regularly.
The Android feature is a spin-off of Google’s long-term interest in speech recognition. The company started with GOOG-411, an American-only service you might call “dial-a-search” anywhere else in the world.
Google was frank about its intention to use the voice calls to GOOG-411 as raw material to improve its voice recognition systems by comparing its successes and failures. Later, it grew confident enough with its technology to offer audio indexing of YouTube videos, as well as adding speech recognition to its Android OS and iPhone apps.
No one could claim Google’s voice recognition is 100 per cent accurate – especially if you’ve ever squinted at its textual transcription of your voicemail on its Google Voice phone service. Despite all the data scooped up by the company, its service still wanders into surrealist farce (For instance, the New York Times found one of its experimental voicemails translated the classic Monty Python line “This parrot is no more. It has ceased to be” to “this carriages no more in Tennessee seems to be”).
But, as many a Web 2.0 company has discovered, a service doesn’t have to be perfect to be usable – especially if it’s free.
The real benefit for companies using speech recognition these days is the ability to cram it into contexts where just a little bit of vocal help is useful. We may throw our phones out of the window over some computerised help systems, but just being able to yell “operator” and have a misbehaving system throw you to a real person can be a vital stress-reliever. And big companies can run speech recognition over their entire database of recorded helpdesk conversations to find key words that are rising in popularity.
These aren’t the expected uses of voice recognition, which most of us assume would be perfect transcription. They’re narrowly defined tasks which benefit from a sprinkle of magic recognition pixie dust.
In fact, that seems to me to be the perfect way to extrapolate a realistic business use from the technological futurism that usually heralds it. Take the mad projections of the earlier researchers in a promising field, then try and find an application that only needs a little of their optimism to succeed.
Extra points if your application would be too demeaning or expensive for the current experts in the field to sully their hands with.
Human transcription services charge medical or legal companies high rates for perfect transcription. Voice recognition will never compete with the high level of expertise and accuracy needed in these fields. But a computer that can give you a rough draft that lets you guess what 70 per cent of your voicemail is about, for a fraction of the cost of paying a secretary to write you a memo on all your phone messages: that’s useful enough.
You can see a similar market developing for rough-and-ready machine translation between human languages. Computer translation is incredibly shoddy compared to the premium work performed by trained expert human translators.
But throw a Japanese or Chinese news article into one of the free machine translators online, and you can get more of a flavour of its content than you ever would staring at the strange ideograms yourself.
I’m never going to be able to afford a translator for the dozens of times a week I’d like to know a foreign language. But if I can grab just a gist from a laughably bad machine translation – well, I’m going to keep using that machine translation, and benefit from its shoddiness, too.
Will we ever get voice recognition or language translation to work as well as we want? No, but then, no car can give as smooth a journey as a sedan chain carried by servants. That doesn’t mean that when someone comes up with a wobbly Model T, it won’t transform transportation.
Even if speech recognition gets no better, and never reaches Star Trek levels of accuracy, it still has plenty of market niches to fill. It’s not dead yet, Jim.