The mouse and keyboard are the most common interface tools used in computers today.But, wouldn’t it be great if you could control everything using your vocals? In other words, do it with speech. Till date, the interfaces available for adding this functionality were proprietary, and didn’t function beyond the supported voice-recognition system. That’s where SALT or Speech Application Language Tags come in. These are XML elements that can be applied to current markup languages like HTML or XHTML. So, in addition to using keyboard and mouse, you can use speech to access information from Web pages or applications using Web pages, and the computer can also respond back using audio/video instead of just text and graphics.
Using speech gives better end-user experience because it reduces the complexity of interaction between the user and computer. Being XML based, it can standardize speech-enabled products such as wireless devices. So developers can use standard software development kits to develop front-end for speech applications. The actual speech recognition and synthesis are not defined by SALT. It’s left to the developer to use any system he wants to. The front end can be anything ranging from a browser plug to a separate application having a built-in SALT parser.
A similar markup language that comes to mind is VoiceXML. Although both VoiceXML and SALT are markup languages that describe a speech interface, SALT has much wider goals. VoiceXML is bent towards telephony application and describes a simple high-level interface for IVR systems for accessing Web content. SALT’s scope includes PDAs, wireless devices, as well as telephony applications. There is a slight difference in the programming model, too. VoiceXML also focuses on data and control flow, while SALT limits itself to standardizing the speech interface.
One big plus point for SALT is that it’s totally royalty free and is open standard. SALT’s design principles include clean integration with current Web pages and reuse of existing grammar standards for speech. This makes it more cost effective rather than speech enabling these pages from scratch. Due to its XML origins, SALT can be used on a range of devices.
How it works
A typical SALT implementation contains the following three high level constructs:
-
configures the speech recognizer, executes recognitions and handles speech input events -
configures the speech synthesizer and plays out prompts -
configures and controls DTMF in telephony applications
The listen and the dtmf element may contain
Being open source, SALT is being adopted by many software vendors. Microsoft, for instance, has already included SALT in its .NET Speech SDK Version 1.0 (Beta). Similarly, another company called Intervoice has various SALT-based
applications.
Ankit Khare