Advertisment

Phone a Website

author-image
PCQ Bureau
New Update

It seems enabling voice is the ultimate challenge and the ultimate reward for both technologists and users. 

Advertisment

Why is voice so popular? Voice comes more naturally to man than other modes of communication, like pen and paper or computer-based applications. Also, using voice requires no special training. 

Voice of www



Considering the huge advantages voice offers, the World Wide Web is moving toward getting itself voice-enabled. And VoiceXML is the language it has adopted. VoiceXML emerged in 2000, through the collaborative efforts of AT&T, IBM, Lucent Technologies and Motorola, who got together and founded the VoiceXML Forum (www.voicexml.org). The Forum defines VoiceXML as a “Web-based markup language for representing human-computer dialogs”. 

Existing Web services, which are restricted through a Web browser can now be accessed through a telephone using VoiceXML. Good candidates to benefit from such an implementation would include information-intensive websites that categorize bits of information and serve it up over the Internet. 

Advertisment

VoiceXML in its present form can handle the following.

  • Synthesized speech output (text-to-speech)
  • Output of audio files
  • Recognition of spoken input
  • Recognition of DTMF input
  • Recording of spoken input
  • Telephony features like call transfer and disconnect
Advertisment

The technology 



VoiceXML is an extensible markup language whose primary application is in the area of ASR (Automated Speech Recognition) and IVR (Interactive Voice Response) systems. The architecture of VoiceXML consists of the following components.

Document server: The document server (which can be any Web-server) processes requests from a client application and serves up VoiceXML documents. 

VoiceXML interpreter context: The interpreter context is what reads the first VoiceXML documents and answers a call. It also monitors the caller input and executes events according to the VoiceXML document, with the VoiceXML interpreter.

Advertisment

VoiceXML interpreter: This sits on the client machine and processes requests from the document server with the help of the VoiceXML interpreter context. It processes the commands in the VoiceXML document and plays the prompts, listens for responses, matches them against the ‘grammar’ of the VoiceXML document and executes the application’s logic. 

Implementation platform: This includes the telephone hardware and related IVR and ASR resources that are controlled by the VoiceXML interpreter and the VoiceXML interpreter context. The implementation platform generates events in response to caller actions (for example, touch tone or spoken commands) and executes system events (for example, timers expiring).

Source:

www.ctlabs.com/Dr%20C/q55.htm 
Advertisment

How it works



A caller calls a phone number for a Web service that is VoiceXML-enabled. This call is routed to a VoiceXML interpreter, which works with the interpreter context to retrieve a VoiceXML document from the Web server and plays a pre-recorded or TTS (Text-To-Speech) generated audio prompt to the caller. 

The caller can now select a service/option by speaking it out or pressing appropriate keys to generate a DTMF tone. Speech responses have to be handled by an ASR (Automated Speech Recognition) system, which executes commands in the VoiceXML document based on the grammar contained in it. The interpreter then executes the commands in the document based on what the ASR returns. This continues till the caller hangs up, the application terminates, or both. We’ll look at an example on the next page. Before that, let us look at the way one can write a VoiceXML application and what are its basic components. 

The building blocks



Session: This is the entire caller-computer conversation, which starts when a call is put through, and ends with the caller hanging up or the VoiceXML document (or the interpreter context) requesting it to end. 

Advertisment

Dialog states: A set of named dialog states make up a VoiceXML application. The user passes from one dialog state to another and each dialog leads to the next. These are written in plain text documents with the extension

.vxml.

Forms: VoiceXML dialogs include forms and menus. Each form has a name and is responsible for executing some portion of the dialog. It defines an interaction that collects values for each of the fields in the form. Forms are submitted to a server just like HTML forms. 

Menus: A menu presents the user with a choice of options and defines the transition to another dialog state depending on the user’s selection.

Advertisment

Fields: A form has fields. These fields specify either the prompt, the expected input or the evaluation rules of the caller’s input.

Application: A set of VoiceXML documents is an application. These documents must share the same application root document. 

Grammar: Grammar is used to describe the expected user input, either spoken or touch-tone

(DTMF) key presses. Each dialog state has one or more grammar associated with it. 

Sub-dialog: A sub-dialog is akin to a sub-routine, which lets the control pass to a new dialog and then return to the original retaining the local state information for that dialog. 

Variables: Named variables can be used to hold data. These can be defined at any level (from the session down to a dialog) and their scope follows an inheritance model. Variable expressions can also be used for conditional prompts or grammar or

both. 

Events: Events are like exceptions during a conversation. These arise out of unclear (to the VoiceXML application) user responses or no responses. Events can be caught by writing event-handlers and follow an inheritance model. 

Dynamic VoiceXML(Scripting): ECMAScripts (VBScript, JavaScript) can be used to add more control to a VoiceXML application. 

Writing VoiceXML 



It is recommended to start all VoiceXML 


applications with the XML version tag just like in 


XML documents.





Next should be the vxml tag with the version attribute set to the VoiceXML version being used.

< vxml version=”1.0”>

Forms



The form has to be named, ideally according to what dialog element it is responsible for executing. A form is denoted by the use of the

tag and can be specified by the inclusion of the “id” attribute to specify the form’s name. Like this

< form id=”hello”>

This form will contain several elements–the “form items”, which can be field items or control items. Field items gather information from the caller to fill variables and may contain prompts guiding the caller about what to say, the grammar that defines the interpretation of what is said and any event handlers.

Gathers input from the user via speech or DTMF recognition as defined by a grammar



Records an audio clip from the user

Transfers the user to another phone number



Invokes a platform-specific object that may gather user input, returning the result as an ECMAScript object


Performs a call to another dialog or document(similar to a function call), returning the result as an ECMAScript

object. While, control items define non-recognition based tasks. 


Encloses a sequence of statements for prompting and computation


Controls mixed-initiative interactions within a form

Guard conditions: These test if the variable defined for each form element (which is done by default) has any value or not. If it has no value, the execution proceeds normally. If, however, a value is set, that item is skipped. 

So, now we are all set to write a small piece of VoiceXML code. However, to run it and actually hear some voice being spoken in your ear from a website, you will need an ASR or IVR system in place. 

To help developers, TellMe Studios (www.tellme.com) have set up the infrastructure required to create and deploy
VoiceXML-based applications that is accessible through their website (www.studio.tellme.com). Developers can create applications and then call a toll-free number provided by TellMe studios to test and run their apps. Sadly, we couldn’t find any Indian counterpart for this.

An example



Our example is very simple. It is supposed to say “Hello World” in keeping with the tradition of first-time programs. 

< ?xml version=”1.0”?>



< vxml application=”hello.vxml” version=”1.0”> 


< form id=”Hello”> 


 




”Hello World!”  



< /block> 


< /form> 


< /vxml> 

Shruti Pareek





Advertisment