'Computer, on screen!': A look at Google's voice recognition engine

By Tim Conneally | Published June 16, 2009, 3:08 PM

Banner: Test Results

Capt. Picard reads an LCARS printout from Star Trek: First ContactGoogle's voice recognition technology took to the mobile sector with voice-powered search applications for iPhone, Android and BlackBerry. Naturally, Google's own mobile operating system Android has begun to reap special benefits of the powerful technology with some new voice-enabled features.

Yesterday, an unforced update to Android's native Google Maps application endowed the software with speech recognition capabilities. Addresses, business names, and attractions can all be searched by spoken word. The app is now one of several that tap into Google's speech recognition engine, such as the voice-to-text app which recently turned up in the Android Market, simply named Voice Text for Android. That app allows the users to dictate text messages.

While Android is inching toward full voice command, it has certainly not reached the point where natural speech will be easily transcribed. The engine requires a slower and more deliberate vocal cadence, which is easy to get accustomed to, but suffers even when the speaker's diction is conscientious. Rare and odd words are easily detected, but the most common ones consistently suffer, especially if they consist of fricative phonemes.

Of the approximately 42 phonemes in the American English language, fricatives constitute a small but prevalent handful. These include the /f/ as in "Fish" /v/ as in "Very", /th/ as in "thing," and /th/ as in "this." In our tests of the software with three different microphones, these sounds almost always registered incorrectly.

In Maps, a user's potential input is much more rigid than it would be in text message dictation. For example, in Betanews tests yesterday, the tricky statement "think this through" which we chose for its potentially common appearance, its three interdental fricatives and its potentially misspelled homophone (through/threw), stumped the text transcriber a grand total of twenty times, returning results such as "Sync vs. Subaru" and "St. Francis School." Unfortunately, a seemingly simple sentence can become frustrating when words like "then" and "them" can almost never be understood.

Fortunately in Maps, such terms are less likely to turn up.

Another problem in Maps comes up when looking for streets and towns with American Indian or adopted foreign language names which are so common throughout the United States. The town of Hauppauge, New York which many techies know for the computer company of the same name is practically impossible to find through voice search. The engine turned up such things as "Hot Dog New York" and "Paul Blog, Long Island" in our attempts. Attempting such names as "Quaqanantuck Lane," and "Napeague Meadow" sometimes resulted in hilarious misinterpretations.

And this problem is not strictly an American one either. Google said yesterday that the software recognizes North American, British, and Australian English pronunciation, but when our nearby locations can have names in any number of indigenous languages (Whip-ma-Whop-ma-Gate, anyone?), the engine is presented with recurring difficulties.

Comments

View comments by with a score of at least

I used Apple's voice recognition with my Mac OS 8.x machine back in 1997 and 1998 and it almost always worked fine with my mixed accent. I'm surprised that Google isn't doing better, but then, is Apple doing as well on iPhone?

Google might try something much simpler, such as endowing their maps with correct locations. Seeing the marker 5 blocks from where it should be makes life interesting in finding a building.

Score: 0

|

...in finding a building?

Try using it to find the bathroom.

...boy was my neighbor surprised. :p

Score: 0

|

Microsoft's Ray Ozzie: 'Nobody's going to be 100% open'

The mobile apps ecosystems of the world may converge over time, led by apps being ported over across platforms, according to the Chief Software Architect.

Will Firefox beat IE9 to Direct2D rendering?

Just days after Microsoft executives gave conference attendees a peek at a new rendering technology, a Mozilla contributor revealed he's working on the same thing.

Where there's smoke: Apple warranty stance raises troubling questions

Carmi Levy | Wide Angle Zoom: Smoking can be dangerous not only for your lungs, it appears, but for your Apple hardware warranty.

Apple invokes DMCA, claims Psystar is 'trafficking in circumvention devices'

In trying to close the book on possibly the last attempt at a Mac clone, Apple cites from its own landmark case...but may actually be misinterpreting it.

The fallacy of Facebook privacy

Carmi Levy | Wide Angle Zoom: If an insurance company learns something interesting about its client through the Internet, is that snooping?

Microsoft 'worked with Apple' for Silverlight on iPhone, says Goldfarb

By not making such a big deal out of trying to stream video to the iPhone, Microsoft got a big deal out of it, revealed the Silverlight product manager.

Clicker.com cuts through the Web video chaos

In a world where homemade video and Hollywood movies travel the same pipeline, it's good to have a real search engine to cut through the clutter.

A case study in improving software: What Office 2010 can learn from Notion 3

A music composition product gambles with a complete overhaul, in an effort to make headway against two well-known competitors in a tough market.

Kindle 2 update adds battery life, native PDF reader

Amazon has pushed out an update to the Kindle 2 e-reader that lengthens battery life and adds a native PDF viewer.

Safari on iPhone gets competition from a $1 browser app

Apple likes to say it gives iPhone users a full browsing experience, but a new competitor tries to incorporate more desktop browser features.