Advertisment

Audio in Mobiles

author-image
PCQ Bureau
New Update

Each day mobile devices are getting smaller, while human fingers are not, opening up the road to enabling voice on mobile phones and

PDAs. WAP-enabled mobile devices that display information are driven by powerful but small microprocessors. Embedding voice-enabled chips completes the interface, letting you speak to your machine. 

Advertisment

Early applications of voice recognition in mobiles restricted the user to 8-10 voice commands that needed to be voice trained. Voice technology now promises to overcome these shortcomings.

How it works



At the core of voice applications is ASR (Automated Speech Recognition), which works as follows. Consider a

cellphone. It receives a voice input, captures it and digitizes the sound. The input is then manipulated to reduce noise to form a clean digital stream. Specific echo cancellation algorithms or spectral subtraction (which discards unnecessary frequencies) is used for the purpose. The audio stream is then chopped into parts, which are analyzed to decipher words and sentences. Upon recognition, the application executes the command.

Voice drive

Voice-driven actions your mobile can do 

  • Dialing

  • Searching for data or contacts

  • Browsing and e-mail

  • Messaging (voice to message, message to voice)

  • Commands to run applications on your device

  • Recording and dictation

  • Authentication in lieu of passwords and identification pins 

Advertisment

To those who have experimented with voice technologies and have come away unimpressed, there are improvements on the voice input side as well. Initially users were given a set of single word commands, which when voiced out, triggered certain functions. It is possible now for the system to spot multiple words in a single request with multi-word spotting technology. This technology is, however, not comprehensive when it comes to intelligently responding to multiple tasks expressed in a single request. To do this, your system needs to understand natural language that is based on grammar. Consider a request like “Fax me today’s weather for Mumbai”. What the system will understand is: “Action: send a fax; to whom: to the caller; fax content: weather; city:

Mumbai; Period: Today;”. An example of natural-language technology at work is in the continuous message-recording feature of NAK (Voice enabled PDA from L&H

). 

For systems with ample resources (like PCs), voice capturing, processing and recognition can be carried out on one machine. However, for smaller devices resources are scarce. Voice-enabling here hence needs DSR (Distributed Speech Recognition). Instead of hosting all the components of the ASR on one device, they are distributed over many devices. So, the handheld will house an embedded voice-enabled chip, some RAM and a part of the application on a FLASH memory. It will capture the voice input locally, do some initial processing and send it to a back-end server. The final processing and execution of the command will be at the back end. 

Also note that the success of a powerful voice application rests on a good navigable visual interface. Interface design for WAP already does away with a lot of graphics. Useful voice architecture must also ensure a safe back-up if voice fails or the user chooses to interrupt the spoken system menu and prefer text browsing. The interface should allow quick switching to a human operator in critical situations.

Advertisment

Today and tomorrow 



L&H, a Belgian company, has come out with a talking PDA called NAK, short for a Hawaian term

Nakulu, which means echo. This is a speaker dependent system and requires about six minutes of training, and uses Voice Express (speech-to-text) and RealSpeak (text-to-speech). On the other hand, IBM’s PVA is a speaker independent system that uses IBM’s embedded ViaVoice technology. The NAK is versatile among the two with continuous message dictation. L&H hope to make the NAK intelligent to the user’s requirements while IBM is looking to add message dictation to their add-on. 

As we see this technology mature, it will in parallel spur the implementation of more meaningful services that will be specific to hands-free mobile environments. AT&T wireless, Sprint PCS, Japan Telecom and Quest Communications International allow users to browse the Web with voice commands. Tasks that are presently enabled, such as buying movie tickets, will lead to voice commerce solutions. IBM, too, has thrown in its chips for the future by announcing its strategy to develop wearable PCs. 

Priya Ramachandra

Advertisment