Advertisment

Speech as the Next Interface

author-image
PCQ Bureau
New Update

As the calendar turns from one century to the next, the IT world is experiencing a paradigm shift in human-to-technology interaction. It’s called speech recognition, and just as it’s transformed how we communicate with our PCs, IBM’s efforts to bring speech to the IT environment will change the way we do business and access information. 

Advertisment

The growth of speech-recognition technology in the past five years has been amazing.

We’ve come from a market with high-priced products that relied on discrete dictation or speaking in a robotic way, to a market where speech recognition technology is common both in the office and at home. People can speak in a natural voice to interact with their computers. This, combined with affordable pricing and increased consumer demand, is leading to the evolution of transparent computing–where human-machine interaction is so natural that it’s almost invisible. 

Transparent computing aims at the growing needs of business and end-user communities. The network revolution has resulted in the growing need for enterprises to reach customers faster, provide better service, and handle more transactions while reducing costs. At the same time, end users need consistent, convenient access to the Internet wherever they are. With predictions of 600 million PCs and two billion networked handheld devices by 2003, the IT environment is primed for change to a new conversational user interface. 

A changing world 



Today’s professional on an average interacts with several machines a day–talking on a cellular phone, sending a fax, or programming a household appliance. And most of these machines have different interfaces that are not compatible with each other. As a result, when performing common tasks, human beings have to adapt to the machine in a way that sometimes appears unnatural. However, with the advent of speech-recognition technology, the complexity of dealing with multiple interfaces is simplified, and machines can now adapt to people and individual preferences. 

Advertisment

Speech is the most accessible, mobile, and easy-to-use interface that exists today. By applying speech technologies to handheld devices or mobile phones, information transactions can take place verbally. In addition, using voice-based access to the Web from a regular phone or a Web client could provide for improved customer services. 

One reason speech recognition can now be supported in mobile and enterprise environment is the convergence of the Web and Interactive Voice Response (IVR) servers. As a result, applications can now combine the use of voice with traditional graphics display, giving you more freedom in how you communicate with devices. So, your request for information through a phone to a Website can be responded verbally, or displayed on a handheld or desktop computer screen. This extends your ability to have ubiquitous access to information anytime, anywhere.

Also, by allowing voice access to information anytime, anywhere, from any device, speech recognition also provides a more effective means for companies to communicate with customers, save money, and facilitate e-business. For example, incorporating speech into self-service transactions such as personal banking or basic travel planning allows companies to provide faster customer service, while at the same time, save money by freeing up operators to handle more detailed requests. Eliminating keypad-driven menus or directed dialogue commands provides more natural conversational dialogue, making it easier for the customer to complete the transaction. 

Advertisment

Another area that speech recognition can impact is the open road. Think about the increasing number of “road warriors” and the growing need these professionals have for mobile technology and devices. On each business trip, mobile users struggle to carry their notebooks, date-books, files, and luggage. The need to downsize is a strong one. Lightweight, speech-enabled mobile devices let on-the-go business professionals take advantage of one application to perform various tasks. For instance, a speech-enabled mobile recorder lets you dictate notes and then converts your “words” to text once the device is plugged into a PC when you return to office. 

Today, speech-recognition technology is empowering the mobile workforce, and the future is the Web—using voice to navigate, browse, and conduct transactions with Web services. 



In addition to the telephone and mobile devices, speech recognition is revolutionizing the Internet, thanks to recent innovations. The magic of making Web-based information accessible via speech-enabled devices has been made possible through a new standard technology—Voice EXtensible Markup Language
(VoiceXML). VoiceXML will open enterprise applications for voice access, in the same way that HTML has enabled the development of graphical user interfaces. For example, VoiceXML would enable the end user to use a smart phone to access applications on the network by voice, touch, or key input as appropriate, see the results on the smart phone’s display, and hear the results read back. As convenient as a desktop browser, VoiceXML allows multiple devices to access company’s information because data access is controlled by voice and accessed from one single point. 

So, what does the future look like? The “less is more” mentality is also moving into the personal world. Think about the time you spend in your car. Wouldn’t it be great if the time spent driving to office could be spent checking your stock portfolio, ordering gifts through a catalogue, purchasing books online, conducting bank transactions, or sending e-mail to friends? By speech enabling multiple devices—the telephone, the Internet and the computer network—and making them compatible with other machines such as cars, gives rise to endless opportunities, limited only by our imagination. 

As speech-recognition technology grows as a common interface and existing interfaces fade away, transparent computing will be a reality. Our world will evolve into one where computers become more discrete, by slowly merging into the background, but nonetheless play an increasing role in our lives. Exciting, invisible interfaces will change how people work and live, in an unobtrusive way. With speech- recognition technology, people will no longer have to adapt to the computer or interface they’re faced with, and we’ll have more control in our personal and professional worlds.

Advertisment