Tech Explained

How Text-to-Speech Works

PCQ Bureau

12 Apr 2002 11:34 IST

New Update

Text-to-speech (TTS) is an area of major development work. This technology is meant to read out any kind of text using several techniques. A rule-based technique does the text to speech modeling based on a set of rules. These are derived from phonetic theories and acoustic analysis of the text data. This technique is, however, highly system dependant and thrives on the system architecture designed for it. It’s not something that can be replicated by others. The other procedure is known as the corpus-based approach, and can be replicated easily because of its structure. It has fixed data sets containing acoustic-phonetic labels and syntactic bracketing, which form the foundation for the system.

Advertisment

The basic challenge of speech synthesis from text is to produce natural and pleasant sound with correct pronunciation. So the input to the speech-synthesis engine in such a case would be a string of phonemes along with insertions of necessary accents and pauses. Some transformations would be applied to this to obtain the acoustical transcript. These include models to generate the fundamental frequency and duration of each speech segment. The last step is synthesis of the speech waveform using the parameters generated in the earlier stage. Three types of speech synthesizers are used: articulatory, format and concatenative synthesizers.

Tremendous research is being conducted on speech synthesis, and the challenge is to convert the text to speech and make it sound as natural as possible. Plus, of course, it also has to conform to the geographic location it’s catering to. For instance, speech synthesis with a French accent wouldn’t be suitable in India.

Hear it on your Mac

Sound on the Mac has three different layers. The first is the hardware API on top of which you have the Sound Manager and above that come the Speech Manager and QuickTime API. The Sound Manager allows applications to interact with the audio hardware on your machine. The Speech Manager is the programming interface that deals with synthesized speech and allows applications to communicate with the actual speech synthesizer. This is how it works.

An application passes a string of text to the Speech Manager that in turn sends it to the speech synthesizer (a code that sits in your system resources). The speech synthesizer contains dictionaries and punctuation rules to read the text. It also determines the type of voices that are used to read aloud the text string.

The speech synthesizer is in communication with the Sound Manager and hence the audio hardware further in the line. As for the reading pitch and speeds, the Speech Manager has the subroutines to control this.

Advertisment

Software: Cool App

CoolSpeaking is a text-to-speech application, which can read out text from e-mail, documents and Web pages. An added feature of this application is that it can read out text while you are typing in any application. It can also convert text to WAV, so any one can listen to it. The application can read out text from various sources. You can either copy and paste text from any document into its main window or open a text file. Under the Tools menu there is an option called Monitor Clipboard, which automatically reads out whatever is copied to the clipboard.

Another option is Real-time speaking, which reads out text as soon as you type it in any application, say a chat window or a Word document. You can also adjust the speed at which is reads by dragging the speed bar at the bottom. To change the voice, go to Tools and select Speaking Properties. Here you can choose from the pre-defined list, or even go to their website and download additional voices.

Advertisment

To convert text to WAV, go to File menu and choose WAV wizard. You can now choose from various voices, select the voice speed and specify the path to save it. Click on Next and you’ll be asked to paste the text, which you want to convert, into the window that pops up. Just click convert after that and you’re done.

You can change the look and feel of the interface by applying different skins. For this, go to File, select Skins and choose the one you want. Clicking on Get more skins takes you to their website, from where you can download more skins. Additional options for a welcome message and hourly time readout are available under the Tools menu in Preferences.

Sachin Makhija

Advertisment