How does Voice over IP work?

Voice and signaling are sent using standard TCP/IP protocols over a physical link such as an Ethernet network. This exchange of signaling and voice information takes place in both directions at the same time with each endpoint sending and receiving information over the IP network.

In any telephony system, two things are carried by the network: voice data and signaling information. Voice is the sound information detected by the microphone in the telephone and transmitted to the receiver over a communication channel. Signaling is the information exchanged between stations participating in the call when a call is started or ended, or when an action (for example, call transfer) is requested.

Traditionally, both voice and signaling information have been sent together through dedicated circuit switched telephony channels (used, for example, with channel associated signaling and ISDN). However, with VoIP, voice and signaling are sent using standard TCP/IP protocols over a physical link such as an Ethernet network. This exchange of signaling and voice information takes place in both directions at the same time with each endpoint sending and receiving information over the IP network.

How is voice data sent over an IP network?

With VoIP, voice data is digitally encoded using µ-law or A-law Pulse Code Modulation (PCM). The voice data can then be compressed if necessary and sent over the network in User Datagram Protocol (UDP) packets. Standard TDM telephony sends voice data at a low constant data rate. With VoIP, relatively small packets are sent at a constant rate. The total overall rate of sending data is the same for each kind of telephony.

The advantage of VoIP is that one high-speed network can carry the packets for many voice channels and possibly share with other types of data at the same time (for example, FTP, HTTP, and data sockets). A single high-speed network is much easier to set up and maintain than a large number of circuit switched connections (for example, T1 circuits).

The User Datagram Protocol is used to transmit voice data over a VoIP network. UDP is a ‘send and forget’ protocol with no requirement for the transmitter to retain sent packets should there be a transmission or reception error. If the transmitter did retain sent packets, the flow of real-time voice would be adversely affected by a request for retransmission or by the retransmission itself; especially if there is a long path between transmitter and receiver).

The main problems with using UDP are that:
  • There is no guarantee that a packet may actually be delivered.
  • Packets can take different paths through the network and arrive out of order.

To overcome these problems, the Real-time Transport (RTP) is used with VoIP. RTP provides a method of handling disordered and missing packets and makes the best possible attempt to recreate the original voice data stream (comfort noise is intelligently substituted for missing packets).

Signaling

The Signaling Invite message is used by the VoIP phone that initiates a call (the calling party) to inform the called party that a connection is required. The called party can then accept the call or reject the call (for example, if the called party is already busy). Other signaling exchanges will be initiated by actions like near or far end hangup, and call transfer.

For VoIP, several signaling protocols are in general use:
  • Session Initiation Protocol (SIP) is a modern protocol that is becoming increasingly popular.
  • Media Gateway Control Protocol (MGCP) is used internally within telephone networks.
  • H.323 is an older VoIP protocol, the elements of which are very similar to ISDN telephony protocols. (Unlike SIP, which uses internet based URIs for addressing.)
Blueworx Voice Response supports SIP as the only Voice over IP signaling protocol. The Blueworx Voice Response version of SIP fully conforms to RFC 3261 which is the standard definition for SIP in the industry.
SIP is based on URI messages which are exchanged between endpoints whenever any signaling is required. These message exchanges are mapped by Blueworx Voice Response SIP support to standard telephony actions within the Blueworx Voice Response product. Standard telephony actions include:
  • Incoming calls
  • Outgoing calls
  • Near end hangup
  • Far end hangup
  • Transfers (several types are supported including ‘blind’ and ‘attended’)
SIP signaling messages can use either TCP (a reliable, guaranteed message exchange) or UDP (a non-guaranteed datagram protocol).

SIP is becoming established as the industry standard for multi-media session control over IP networks and is defined in the IETF standard RFC 3261 Session Initiation Protocol. The following diagram shows the exchanges which take place between two SIP endpoints in a simple two-way call with far-end hang-up.

Figure 1. A simple two-way call using SIP
This diagram shows a simple two-way call with far-end hang-up