How Voice Goes over IP

JPEG of the New Millennium

VoIP Deployment 

Putting Voice into Packets

Deploying VoIP

When we talk of sending voice over Internet connections, the first question that pops up is "How can you do that, when the connections are not even broad enough to handle regular data?" Traditionally, voice is a lot of data and if sent as raw analog data, it would simply clog the networks and won't be able to get through. Instead the voice is sampled, compressed and packetized to send it over an IP network. This way you need much less bandwidth. (Sampling is explained in the article ‘Putting Voice into Packets’ page 41). Here we'll look at the compression part.

Almost everyone is familiar with the concept of MP3s. MP3 is compressed audio. Regular music or audio is compressed by chucking out the noise, silence, and certain in audible frequencies, so that it takes up much less space. The idea behind VoIP is much similar except that here all this has to be done in real time, as you speak, that is, real fast! A good amount of hardware and technology goes behind it. The whole idea is to compress the users' voice, packetize it, send it over the network, decompress it at the other end and play it back. The critical aspect here becomes the encoding used, because it ultimately determines the bandwidth required, the quality of sound and the realism in playback. At the most basic level, voice can be sent using the PCM (Pulse Code Modulation) coding. PCM is primarily the encoding of analog voice data into digital form without much compression. With PCM, one voice channel requires 64 kbps of bandwidth. No how do you make it smaller, without losing out on quality?

This is where speech codecs, also called voice coders or 'vocoders' for short, come in. These are speech compression algorithms that let you drastically reduce the amount of data that goes into the network while still preserving voice quality. Devices that send and receive voice and video, work on the H.323 protocol. This ensures that any H.323-compliant system can talk to any other such system. However, the two systems can negotiate which particular vocoder is to be used, depending on the network conditions. This is a built-in feature and requires no use intervention. 

Processing hardware requirements remaining more or less the same, vocoders differ in the kind of algorithm used, speed (in kbit/s), buffer requirements, timing (latency issues) and software required for processing. Vocoders are real-time algorithms and when we talk of sending voice over IP, the overall realism depends on sound clarity and latency. Latency is the delay between the time the voice is spoken at the originating end and the time it's perceived at the receiving end. There needs to be a trade-off between the two, meaning if you go in for higher speech quality (less compression), you might have more latency. What we strive for is toll-quality (regular phone quality) speech. So VoIP solutions come with standard vocoders, but often offering their own proprietary algorithms for coding, like for example NetCoder from Audio Codes. 

The amount of network traffic is not the same at all times. So VoIP systems are generally dynamic, that is, they support multiple vocoders with the codec being automatically switched to match network conditions. So there is a real time variation in speech quality depending on network congestion. If the network traffic increases, the system switches to a high-compression rate vocoder (which is more processor intensive) and vice-versa. This however, increases the system complexity and resource requirements.

For example, if you have a DSL link of say 128 kbps, you would use, say G.726 or G.711. If you have lesser bandwidth, you may use say the G.729 or G.723. Also keep in mind that a particular vocoder might be used over one channel only. So if you have multiple channels, you may be using multiple vocoders simultaneously. In this case the memory (the footprint of the vocoder) and processor requirements are of prime concern. In the event of overloading of the DSP (Digital Signal Processor). For details on what a DSP is, refer to the article ‘Putting Voice into Packets’ page 41, calls may need to be dropped or rerouted in case you have multiple DSPs or systems.

Ashish Sharma

  • Follow PCQuest on
  • become a fan on
  • Stay updated via
  • RSS


Notify me of follow-up comments via e-mail address

Post Comment

Survey Box

Now that Microsoft has finally discontinued support for Windows XP, which OS are you likely to upgrade to?

Send this article by email