Aricent Leaders in Communications Software - Aricent
 
HomeAbout UsProductsOutsourcing ServicesSolutionsSupport
Partners Partners  Our Financials  Investor Relations  Careers Careers  Locations Locations  Contact Us Contact Us  
          
  About Aricent
  Overview
  IETF
  Whitepapers
  Tutorials
  Glossary
  Techgurus
  TechSpeak
  Our Technology
  Archives

Your Location : Home > Learning Center > Tech Speak > Voice Over IP - How Stuff Works - Part II


Voice over IP - How Stuff Works - Part III

Author: Arjun Roychowdhury

Arjun RoychowdhuryYes, it has been a long time since I last picked up the internet pen to continue this series. I would like to thank everyone for their comments on the articles that I received via email. Unfortunately, my work has been keeping me occupied enough not to find time to write another article for almost over a year now.

Ever since I started writing these articles, a lot has changed in the VoIP field. Well, the technology is still similar but the economy is vastly different. Specifically, as I write this, the telecom sector is in quite a slump and the economy in general (specifically here in the US) is still trying to recover. However, the promise of VoIP is still strong with the difference being that people now are thinking before spending (which is a welcome change - hype is always a dangerous thing).
Enough about that already. In these articles, I would like to stick to describing the technology first and not the economics that rule them. I may touch on economics and market analysis in later articles.

Where we left off last

Alright then, lets get back to where we left off last time. In part II , I described the basic differences between circuit switch and packet switch communication and touched upon the advantages that this may offer. Lets now get into more details of how such a communication is set up.

The Elements for communication

Lets forget VoIP for a second and consider what is the 'protocol' that is used when two people communicate.

(Credit Note: This example is adapted from a nice tutorial I read on the internet a long time ago , the URI of which I have forgotten - if someone can point me to the relevant site, I would like to ensure that I credit the example appropriately)


The above illustrates an example of how two people may communicate. Taking the example above, we can broadly break up a communication into the following phases:

Phase 1 - Initiation (step 1-2): In this stage, the parties contact each other and the person wanting to start a dialog asks the other person permission to speak with him/her.

Phase 2 - Negotiation (step 3-4): Once both parties have an agreement that they want to speak each other, they "negotiate" the language which is commonly understood between them to actually be able to communicate [of-course, you may ask, "if they did not know how to talk to each other, how did they proceed past the initiation stage ? Well, assume that the "initiation" stage was passed with one party using hand signals to indicate intent of communication with the other person J ]

Note that the Negotiation phase is one of the more complex phases in VoIP. Amongst other things, in this phase, both phones need to exchange attributes like

  • What media protocol to use (typically this is RTP - explained later)
  • What codecs to use amongst those supported by both parties
  • Which UDP port to use for media communication
  • And others…
    The attributes listed above and some more constitute what is known as the 'media characteristics' for the call.

Phase 3 - Conversation (step 5): In this stage, the parties actually communicate with each other using the accepted language agreed upon in the negotiation phase.

Phase 4 - Termination (steps 6-7): The parties indicate to each other that they have completed the conversation.

Mapping the Elements to VoIP

If you have followed the description so far, this is exactly what happens in VoIP as well.
First, you have a 'universal protocol' that entities can use to 'indicate' to each other the intent for communication. Just like for human conversation, when a traveler visits a foreign country and waves his hand to a bystander indicating to him that he wants to have a conversation, in VoIP we have 'protocols' that are exchanges between entities indicating that someone wants to have a conversation.

Again, note that in the 4 phases I listed above, the 1st, 2nd and 4th phases actually 'set-up' and 'tear-down' the communication channel. It is Phase 3 that actually carries out the real conversation.

Therefore, even in VoIP we have two distinct requirements for protocols:
a) Signaling Protocol - deals with set-up, negotiation and tear-down of calls
b) Media Protocol - deals with carrying the actual voice/video stream to and fro the internet

The illustration above shows some examples of the various protocols that are used in VoIP. Protocols such as SIP & H.323 (discussed in the following sections) are used for 'managing' the call (setup, teardown, etc) while protocols such as RTP (Real-time Transmission Protocol) are used for media communication (in other words, RTP is used for carrying video/voice across the internet).

A logic question here would be why do we need two different protocols for media and signaling ? The answer lies in the fact that the factors that need to be taken into consideration while actually
transporting voice/video across the internet as compared to call-setup signaling are vastly different.
For example, when transmitting voice/video, issues such as delay, packet loss etc. matter much more since it affects the quality of video/speech that is seen/heard. Suffice to say for now that RTP is optimized for media transport while protocols such as SIP & H.323 are optimized for helping set up a communication channel.

Note that both SIP and H.323 use RTP for media transport across the internet.

Also, things get a little more detailed in the media protocol. RTP is a protocol that ensures that media packets are properly carried and sequenced across the internet from source to destination. It does not however, specify how media is encoded. Media encode is typically done by Codecs. Codecs are algorithms (hardware or software) that take speech and video as an input and compress it. On the other side, the Decoder uncompresses the coded information and plays the audio and video stream to the receiver.

The importance of compressing and decompressing lie in the fact that the internet has limited bandwidth. For a typically sampled human speech to be transported in its completeness across the internet, one would need a stream of 64 kbps. Clearly this is a lot of bandwidth when we start increasing the number of users who will use the internet for communication. Therefore Codecs play an important role in compressing the media stream to much lower levels so that bandwidth requirements are minimal.

G.729 and G.723 are examples of codecs that have good compression ratio. G.711 is the un-encoded 64kbps format.

A quick sidenote: Why so much effort in standardization ?

Before we start describing various protocols, lets address the basic question: why do we need standard based protocols ?

This is probably a good time to answer some questions I hear often:

Q: "I have been seeing Internet Phones for years now - what is so new about VoIP ?
A: Yes, Internet phones existed for a while. However, each one exchanged information in a way that only it understood. Today, VoIP aims at establishing standard protocols for communication across the internet.

Q: "So what is the use of a standard protocol ?"
A: Did you ever wonder how your Sony Phone can easily talk to your friend's AT&T phone which in turn can also talk to your colleague's Panasonic phone ? Well, they can talk because they all speak the same language (or protocol).

Q: "Alright, I understand the use of a standard protocol now - but if it already is in place in all the phones around the world, why is VoIP trying to do the same thing"
A: The point is, VoIP does not only target phones. It targets the internet and all devices that are connected to it. Imagine this: You are having a conversation with your wife who is a 1000 miles away about what vacation trip to plan next month. During the conversation you are able to pull up the website of Hawaii which both you and your wife navigate together. During the conversation, you realize that you forgot to shut off the garage door before you left your house -> you press a button and the garage door closes in your house. And all of this is happening over your cell-phone. VoIP is about expanding your horizon.

Introduction to VoIP protocols

Well, now that we know that it is essential for a 'standard way' or a 'standard protocol' to exist for entities or devices to communicate, lets now (finally) get into what these protocols look like.

As of today, there are two main (competing) protocols that are worth mentioning which address the VoIP space. In chronological order, they are:

H.323 - This was a protocol that was developed by the ITU (International Telecommunications Union) which is an international organization that has developed several of the standards that are followed today in the PSTN world.

SIP - an acronym for "Session Initiation Protocol" which was developed more recently by the IETF (Internet Engineering Task Force) which is an organization that has pioneered much of the protocols that are in use by the Internet as we know it today.

The lineage of the two different groups (one originating from PSTN roots and the other from the Internet model) would give you an indication of the nature of the protocols that they published.

H.323 was initially published as a protocol that address VoIP from the following direction: "Today we have an extensive network in the PSTN for communication. Lets figure out how to extend this network to converge with the Internet".

The SIP proponents addressed the same problem in the reverse direction. "Since VoIP is about using the internet, why not make a protocol that models itself around the internet and its successful protocols such as HTTP and then worry about how to make it work with the existing PSTN world".

How each protocol works

This section briefly describes the message flow that is exchanged before a 'call' can be set up.
Before we describe how a call works, lets describe some other entities that help in the call being set up.

Lets take a simple case:

Dilbert who works for HSS wants to speak to Wally who works at AT&T. Dilbert does not know the exact location of Wally - all he knows is that Wally works at AT&T. One of the advantages of VoIP is that the calling party need not know the exact location of the called party - the exact address can be discovered as the call traces its way across the internet.

For such a call to be setup, besides Dilbert's phone and Wally's phone, some other entities would be needed in the network.

Location Server: This would be an entity that can find out the current location of a user. In the case of H.323 , the Location server may be the Gatekeeper or another external server that the Gatekeeper can consult using alternate protocols (such as LDAP).In the case of SIP, the Location server may be the Proxy or another external server that the Proxy can consult using alternate protocols (such as LDAP).

Administration Server: This would be an entity that controls if Dilbert is indeed allowed to make a call to Wally and other such policy decisions. In the H.323 world, this entity is called a Gatekeeper. In the SIP world, there is no specific name for such an entity.

Router: This would be an entity that could forward the call to the next router till the call finally reaches the destination. In the SIP world, this is called a Proxy. In the H.323 world this operation is typically done by the Gatekeeper.

With this in mind, let's see how the call is actually setup.

Call setup in an H.323 Network:


The above illustration shows a typical VoIP call using the H.323 protocol.

Step 1-2: Dilbert asks permission from his domain gatekeeper if he can make a call to Wally. [ ARQ = Admission Request, ACF = Admission Confirm ]

Step 3-6: Dilbert's phone sends a 'Setup' message to its Gatekeeper. The Setup message contains the media characteristics that Dilbert's phone supports. The Gatekeeper routes this across the internet and the call arrives at the Gatekeeper in the AT&T network. This Setup message is forwarded to Wally's phone. At this stage, Wally's phone rings.

It may be worth touching upon how HSS gatekeeper routes the message to AT&T gatekeeper. If the HSS gatekeeper is able to connect directly to the AT&T gatekeeper it can forward the message directly to it. In the event that there isn't any direct connection between the two, the HSS GK can forward the message to its default route and this procedure will continue across several GKs in the internet until it reaches a GK which has a direct connection to the AT&T GK. This is similar to how IP packets are routes across the routers.

Step 7-8: Wally asks permission from his domain gatekeeper if he can receive a call from Dilbert. His gatekeeper grants him permission.

Step 9-12: Wally's phone sends and Alerting message to Dilbert. At this stage, Dilbert's phone indicates that Wally's phone is ringing by playing a remote ring tone.

Step 13-16: Wally's phone sends a Connect message which contains the media characteristics that Wally will use to speak with Dilbert's phone

Step 17: Wally and Dilbert speak to each other using RTP. Note that the RTP flow typically happens directly between the callers. The reason for this is that Gatekeepers and other in-between servers will get overloaded if every media packet needs to be routed through them.


Call setup in a SIP Network:


The above illustration shows a typical VoIP call using the SIP protocol.

Step 1-4: Dilbert's phone sends a SIP INVITE message inviting Wally to join the call. Similar to the H.323 scenario, the Proxy routes this across the internet and the call arrives at the Proxy in the AT&T network. This INVITE message is forwarded to Wally's phone. At this stage, Wally's phone rings. This INVITE message contains Dilbert's phone's capabilities.

Step 5-8: Wally sends a SIP 180 Ringing message to Dilbert indicating that it is ringing. At this stage, Dilbert's phone plays the remote ringback tone.

Step 9-12: When Wally picks up the phone, the phone sends a SIP 200 OK message all the way back to Dilbert's phone. This 200 OK contains the media characteristics that Wally's phone will use while speaking to Dilbert's phone.

Step 13-16: Dilbert's phone sends a SIP Acknowledge message to Wally's phone indicating that the call is now set up.

Step 17: Wally and Dilbert speak to each other using RTP. Note that the RTP flow typically happens directly between the callers. The reason for this is that Proxies and other in-between servers will get overloaded if every media packet needs to be routed through them.

Note: In both SIP and H.323 examples, I have illustrated sample call flows. Based on the network layout more or fewer nodes may be involved in the call-setup. Please refer to H.323 specificiation documents and the SIP RFC 3261 for more details on other scenarios.

Also, in the case of H.323 I have described a procedure commonly known as Fast Connect. This was a feature added in a subsequent release of the H.323 protocol which made the number of messages required for call-setup in H.323 comparable to SIP. Prior to this, H.323 needed more messages for call-setup making it significantly slower than SIP for call-setup.

Which protocol reigns supreme ?

SIP was developed at a time when it was commonly believed that H.323 was getting too complex to implement. In addition, its 'internet like' model opened up new models which could bring in new web-based services and business models which went way beyond the restrictive PSTN based approaches that exist today.

Till 2000, H.323 was clearly the market leader. However, the promised simplicity of SIP and its flexibility had a signification influence on the industry and within a year, SIP was adopted by diverse industries as the preferred protocol of choice for VoIP.

However, as of today, there are still many more H.323 based deployments than SIP in the network. The reason for this was primarily that people had already invested significantly in H.323 based networks before SIP came out. In addition, the economic slowdown has made people wary of adopting a new technology base.

Therefore, it would probably be appropriate for me to say "SIP has won mindshare. It has yet to win marketshare over H.323". Almost every network that is H.323 based has migration plans to SIP in the next 3-4 years. Therefore, on a long term, I would bet my money on SIP.

For a technical comparison between the protocols, see http://www.cs.columbia.edu/~hgs/sip/h323-comparison.html as well as http://www.packetizer.com/iptel/h323_vs_sip/ (to be fair to both camps)


So I am of the opinion that SIP is a superior protocol to H.323, right ?

No. We must remember that a technology's success is not only on technical merit. SIP came in as a 'breath of fresh air' to developers and deployers at a time when the economy was booming and people were getting tired of the complexities of H.323. The 'promise' of SIP excited the right mass of people to shift towards it and industry hype has a lot to do with it.

As of today, SIP has gotten more complex and H.323 has become simpler and wider in scope. Personally, having worked with both protocols, it would not be fair for me to still assert that SIP is a much more simple protocol than H.323. Both have evolved.

However, the industry is too far ahead to revisit choice of protocols again. Had this war started over again now, it could be any one of these two that would come to the fore. But right now, the protocol selection is clear. SIP is the way ahead and the reason is a combination on technology and hype. So lets stop worrying about what protocol is better and start worrying about given a technology (SIP), how do we make applications in the market that add to revenue streams.

Conclusion

In this tutorial, we discussed how VoIP calls are setup as well as discussed how two different protocols for VoIP ,SIP & H.323 behave.

 

Please send all feedback to whitepaper@flextronicssoftware.com

- BACK -


Last updated : August 8, 2006

 

Customer Quote
  Case Studies
  Press Releases
  Whitepapers
  Partners