Voice and signaling are sent using standard TCP/IP protocols over a physical link such as an Ethernet network. This exchange of signaling and voice information takes place in both directions at the same time with each endpoint sending and receiving information over the IP network.
In any telephony system, two things are carried by the network: voice data and signaling information. Voice is the sound information detected by the microphone in the telephone and transmitted to the receiver over a communication channel. Signaling is the information exchanged between stations participating in the call when a call is started or ended, or when an action (for example, call transfer) is requested.
Traditionally, both voice and signaling information have been sent together through dedicated circuit switched telephony channels (used, for example, with channel associated signaling and ISDN). However, with VoIP, voice and signaling are sent using standard TCP/IP protocols over a physical link such as an Ethernet network. This exchange of signaling and voice information takes place in both directions at the same time with each endpoint sending and receiving information over the IP network.
With VoIP, voice data is digitally encoded using µ-law or A-law Pulse Code Modulation (PCM). The voice data can then be compressed if necessary and sent over the network in User Datagram Protocol (UDP) packets. Standard TDM telephony sends voice data at a low constant data rate. With VoIP, relatively small packets are sent at a constant rate. The total overall rate of sending data is the same for each kind of telephony.
The advantage of VoIP is that one high-speed network can carry the packets for many voice channels and possibly share with other types of data at the same time (for example, FTP, HTTP, and data sockets). A single high-speed network is much easier to set up and maintain than a large number of circuit switched connections (for example, T1 circuits).
The User Datagram Protocol is used to transmit voice data over a VoIP network. UDP is a ‘send and forget’ protocol with no requirement for the transmitter to retain sent packets should there be a transmission or reception error. If the transmitter did retain sent packets, the flow of real-time voice would be adversely affected by a request for retransmission or by the retransmission itself; especially if there is a long path between transmitter and receiver).
To overcome these problems, the Real-time Transport (RTP) is used with VoIP. RTP provides a method of handling disordered and missing packets and makes the best possible attempt to recreate the original voice data stream (comfort noise is intelligently substituted for missing packets).
The Signaling Invite message is used by the VoIP phone that initiates a call (the calling party) to inform the called party that a connection is required. The called party can then accept the call or reject the call (for example, if the called party is already busy). Other signaling exchanges will be initiated by actions like near or far end hangup, and call transfer.
SIP is becoming established as the industry standard for multi-media session control over IP networks and is defined in the IETF standard RFC 3261 Session Initiation Protocol. The following diagram shows the exchanges which take place between two SIP endpoints in a simple two-way call with far-end hang-up.