Yes,
it has been a long time since I last picked up the internet
pen to continue this series. I would like to thank everyone
for their comments on the articles that I received via email.
Unfortunately, my work has been keeping me occupied enough
not to find time to write another article for almost over
a year now.
Ever since I started writing these articles, a lot has changed
in the VoIP field. Well, the technology is still similar but
the economy is vastly different. Specifically, as I write
this, the telecom sector is in quite a slump and the economy
in general (specifically here in the US) is still trying to
recover. However, the promise of VoIP is still strong with
the difference being that people now are thinking before spending
(which is a welcome change - hype is always a dangerous thing).
Enough about that already. In these articles, I would like
to stick to describing the technology first and not the economics
that rule them. I may touch on economics and market analysis
in later articles.
Where we left off last
Alright then, lets get back to where we left off last time.
In part II , I described the basic differences between circuit
switch and packet switch communication and touched upon the
advantages that this may offer. Lets now get into more details
of how such a communication is set up.
The Elements for communication
Lets forget VoIP for a second and consider what is the 'protocol'
that is used when two people communicate.
(Credit Note: This example is adapted from a nice tutorial
I read on the internet a long time ago , the URI of which
I have forgotten - if someone can point me to the relevant
site, I would like to ensure that I credit the example appropriately)
The above illustrates an example of how two people may communicate.
Taking the example above, we can broadly break up a communication
into the following phases:
Phase 1 - Initiation (step 1-2): In this stage, the
parties contact each other and the person wanting to start
a dialog asks the other person permission to speak with him/her.
Phase 2 - Negotiation (step 3-4): Once both parties
have an agreement that they want to speak each other, they
"negotiate" the language which is commonly understood
between them to actually be able to communicate [of-course,
you may ask, "if they did not know how to talk to each
other, how did they proceed past the initiation stage ? Well,
assume that the "initiation" stage was passed with
one party using hand signals to indicate intent of communication
with the other person J ]
Note that the Negotiation phase is one of the more complex
phases in VoIP. Amongst other things, in this phase, both
phones need to exchange attributes like
What media protocol to use (typically this is RTP - explained
later)
What codecs to use amongst those supported by both parties
Which UDP port to use for media communication
And others
The attributes listed above and some more constitute what
is known as the 'media characteristics' for the call.
Phase 3 - Conversation (step 5): In this stage, the
parties actually communicate with each other using the accepted
language agreed upon in the negotiation phase.
Phase 4 - Termination (steps 6-7): The parties indicate
to each other that they have completed the conversation.
Mapping the Elements to VoIP
If you have followed the description so far, this is exactly
what happens in VoIP as well.
First, you have a 'universal protocol' that entities can use
to 'indicate' to each other the intent for communication.
Just like for human conversation, when a traveler visits a
foreign country and waves his hand to a bystander indicating
to him that he wants to have a conversation, in VoIP we have
'protocols' that are exchanges between entities indicating
that someone wants to have a conversation.
Again, note that in the 4 phases I listed above, the 1st,
2nd and 4th phases actually 'set-up' and 'tear-down' the communication
channel. It is Phase 3 that actually carries out the real
conversation.
Therefore, even in VoIP we have two distinct requirements
for protocols: a) Signaling Protocol - deals with set-up, negotiation
and tear-down of calls
b) Media Protocol - deals with carrying the actual
voice/video stream to and fro the internet
The illustration above shows some examples of the various
protocols that are used in VoIP. Protocols such as SIP &
H.323 (discussed in the following sections) are used for 'managing'
the call (setup, teardown, etc) while protocols such as RTP
(Real-time Transmission Protocol) are used for media communication
(in other words, RTP is used for carrying video/voice across
the internet).
A logic question here would be why do we need two different
protocols for media and signaling ? The answer lies in the
fact that the factors that need to be taken into consideration
while actually
transporting voice/video across the internet as compared to
call-setup signaling are vastly different.
For example, when transmitting voice/video, issues such as
delay, packet loss etc. matter much more since it affects
the quality of video/speech that is seen/heard. Suffice to
say for now that RTP is optimized for media transport while
protocols such as SIP & H.323 are optimized for helping
set up a communication channel.
Note that both SIP and H.323 use RTP for media transport
across the internet.
Also, things get a little more detailed in the media protocol.
RTP is a protocol that ensures that media packets are properly
carried and sequenced across the internet from source to destination.
It does not however, specify how media is encoded. Media encode
is typically done by Codecs. Codecs are algorithms (hardware
or software) that take speech and video as an input and compress
it. On the other side, the Decoder uncompresses the coded
information and plays the audio and video stream to the receiver.
The importance of compressing and decompressing lie in the
fact that the internet has limited bandwidth. For a typically
sampled human speech to be transported in its completeness
across the internet, one would need a stream of 64 kbps. Clearly
this is a lot of bandwidth when we start increasing the number
of users who will use the internet for communication. Therefore
Codecs play an important role in compressing the media stream
to much lower levels so that bandwidth requirements are minimal.
G.729 and G.723 are examples of codecs that have good compression
ratio. G.711 is the un-encoded 64kbps format.
A quick sidenote:
Why so much effort in standardization ?
Before we start describing various protocols, lets
address the basic question: why do we need standard
based protocols ?
This is probably a good time to answer some questions
I hear often:
Q: "I have been seeing Internet Phones for years
now - what is so new about VoIP ?
A: Yes, Internet phones existed for a while. However,
each one exchanged information in a way that only it
understood. Today, VoIP aims at establishing standard
protocols for communication across the internet.
Q: "So what is the use of a standard protocol
?"
A: Did you ever wonder how your Sony Phone can easily
talk to your friend's AT&T phone which in turn can
also talk to your colleague's Panasonic phone ? Well,
they can talk because they all speak the same language
(or protocol).
Q: "Alright, I understand the use of a standard
protocol now - but if it already is in place in all
the phones around the world, why is VoIP trying to do
the same thing"
A: The point is, VoIP does not only target phones. It
targets the internet and all devices that are connected
to it. Imagine this: You are having a conversation with
your wife who is a 1000 miles away about what vacation
trip to plan next month. During the conversation you
are able to pull up the website of Hawaii which both
you and your wife navigate together. During the conversation,
you realize that you forgot to shut off the garage door
before you left your house -> you press a button
and the garage door closes in your house. And all of
this is happening over your cell-phone. VoIP is about
expanding your horizon.
Introduction to VoIP protocols
Well, now that we know that it is essential for a 'standard
way' or a 'standard protocol' to exist for entities or devices
to communicate, lets now (finally) get into what these protocols
look like.
As of today, there are two main (competing) protocols that
are worth mentioning which address the VoIP space. In chronological
order, they are:
H.323 - This was a protocol that was developed by
the ITU (International Telecommunications Union) which is
an international organization that has developed several of
the standards that are followed today in the PSTN world.
SIP - an acronym for "Session Initiation Protocol"
which was developed more recently by the IETF (Internet Engineering
Task Force) which is an organization that has pioneered much
of the protocols that are in use by the Internet as we know
it today.
The lineage of the two different groups (one originating
from PSTN roots and the other from the Internet model) would
give you an indication of the nature of the protocols that
they published.
H.323 was initially published as a protocol that address
VoIP from the following direction: "Today we have an
extensive network in the PSTN for communication. Lets figure
out how to extend this network to converge with the Internet".
The SIP proponents addressed the same problem in the reverse
direction. "Since VoIP is about using the internet, why
not make a protocol that models itself around the internet
and its successful protocols such as HTTP and then worry about
how to make it work with the existing PSTN world".
How each protocol works
This section briefly describes the message flow that is exchanged
before a 'call' can be set up.
Before we describe how a call works, lets describe some other
entities that help in the call being set up.
Lets take a simple case:
Dilbert who works for Aricent wants to speak to Wally who works
at AT&T. Dilbert does not know the exact location of Wally
- all he knows is that Wally works at AT&T. One of the
advantages of VoIP is that the calling party need not know
the exact location of the called party - the exact address
can be discovered as the call traces its way across the internet.
For such a call to be setup, besides Dilbert's phone and
Wally's phone, some other entities would be needed in the
network.
Location Server: This would be an entity that can
find out the current location of a user. In the case of H.323
, the Location server may be the Gatekeeper or another external
server that the Gatekeeper can consult using alternate protocols
(such as LDAP).In the case of SIP, the Location server may
be the Proxy or another external server that the Proxy can
consult using alternate protocols (such as LDAP).
Administration Server: This would be an entity that
controls if Dilbert is indeed allowed to make a call to Wally
and other such policy decisions. In the H.323 world, this
entity is called a Gatekeeper. In the SIP world, there is
no specific name for such an entity.
Router: This would be an entity that could forward
the call to the next router till the call finally reaches
the destination. In the SIP world, this is called a Proxy.
In the H.323 world this operation is typically done by the
Gatekeeper.
With this in mind, let's see how the call is actually setup.
Call setup in an H.323 Network:
The above illustration shows a typical VoIP call using the
H.323 protocol.
Step 1-2: Dilbert asks permission from his domain gatekeeper
if he can make a call to Wally. [ ARQ = Admission Request,
ACF = Admission Confirm ]
Step 3-6: Dilbert's phone sends a 'Setup' message to its
Gatekeeper. The Setup message contains the media characteristics
that Dilbert's phone supports. The Gatekeeper routes this
across the internet and the call arrives at the Gatekeeper
in the AT&T network. This Setup message is forwarded to
Wally's phone. At this stage, Wally's phone rings.
It may be worth touching upon how Aricent gatekeeper routes the
message to AT&T gatekeeper. If the Aricent gatekeeper is able
to connect directly to the AT&T gatekeeper it can forward
the message directly to it. In the event that there isn't
any direct connection between the two, the Aricent GK can forward
the message to its default route and this procedure will continue
across several GKs in the internet until it reaches a GK which
has a direct connection to the AT&T GK. This is similar
to how IP packets are routes across the routers.
Step 7-8: Wally asks permission from his domain gatekeeper
if he can receive a call from Dilbert. His gatekeeper grants
him permission.
Step 9-12: Wally's phone sends and Alerting message to Dilbert.
At this stage, Dilbert's phone indicates that Wally's phone
is ringing by playing a remote ring tone.
Step 13-16: Wally's phone sends a Connect message which contains
the media characteristics that Wally will use to speak with
Dilbert's phone
Step 17: Wally and Dilbert speak to each other using RTP.
Note that the RTP flow typically happens directly between
the callers. The reason for this is that Gatekeepers and other
in-between servers will get overloaded if every media packet
needs to be routed through them.
Call setup in a SIP Network:
The above illustration shows a typical VoIP call using the
SIP protocol.
Step 1-4: Dilbert's phone sends a SIP INVITE message inviting
Wally to join the call. Similar to the H.323 scenario, the
Proxy routes this across the internet and the call arrives
at the Proxy in the AT&T network. This INVITE message
is forwarded to Wally's phone. At this stage, Wally's phone
rings. This INVITE message contains Dilbert's phone's capabilities.
Step 5-8: Wally sends a SIP 180 Ringing message to Dilbert
indicating that it is ringing. At this stage, Dilbert's phone
plays the remote ringback tone.
Step 9-12: When Wally picks up the phone, the phone sends
a SIP 200 OK message all the way back to Dilbert's phone.
This 200 OK contains the media characteristics that Wally's
phone will use while speaking to Dilbert's phone.
Step 13-16: Dilbert's phone sends a SIP Acknowledge message
to Wally's phone indicating that the call is now set up.
Step 17: Wally and Dilbert speak to each other using RTP.
Note that the RTP flow typically happens directly between
the callers. The reason for this is that Proxies and other
in-between servers will get overloaded if every media packet
needs to be routed through them.
Note: In both SIP and H.323 examples,
I have illustrated sample call flows. Based on the network
layout more or fewer nodes may be involved in the call-setup.
Please refer to H.323 specificiation documents and the
SIP RFC 3261 for more details on other scenarios.
Also, in the case of H.323 I have described a procedure
commonly known as Fast Connect. This was a feature added
in a subsequent release of the H.323 protocol which
made the number of messages required for call-setup
in H.323 comparable to SIP. Prior to this, H.323 needed
more messages for call-setup making it significantly
slower than SIP for call-setup.
Which protocol reigns supreme ?
SIP was developed at a time when it was commonly believed
that H.323 was getting too complex to implement. In addition,
its 'internet like' model opened up new models which could
bring in new web-based services and business models which
went way beyond the restrictive PSTN based approaches that
exist today.
Till 2000, H.323 was clearly the market leader. However,
the promised simplicity of SIP and its flexibility had a signification
influence on the industry and within a year, SIP was adopted
by diverse industries as the preferred protocol of choice
for VoIP.
However, as of today, there are still many more H.323 based
deployments than SIP in the network. The reason for this was
primarily that people had already invested significantly in
H.323 based networks before SIP came out. In addition, the
economic slowdown has made people wary of adopting a new technology
base.
Therefore, it would probably be appropriate for me to say
"SIP has won mindshare. It has yet to win marketshare
over H.323". Almost every network that is H.323 based
has migration plans to SIP in the next 3-4 years. Therefore,
on a long term, I would bet my money on SIP.
So I am of the opinion that SIP is a superior protocol
to H.323, right ?
No. We must remember that a technology's success is not only
on technical merit. SIP came in as a 'breath of fresh air'
to developers and deployers at a time when the economy was
booming and people were getting tired of the complexities
of H.323. The 'promise' of SIP excited the right mass of people
to shift towards it and industry hype has a lot to do with
it.
As of today, SIP has gotten more complex and H.323 has become
simpler and wider in scope. Personally, having worked with
both protocols, it would not be fair for me to still assert
that SIP is a much more simple protocol than H.323. Both have
evolved.
However, the industry is too far ahead to revisit choice
of protocols again. Had this war started over again now, it
could be any one of these two that would come to the fore.
But right now, the protocol selection is clear. SIP is the
way ahead and the reason is a combination on technology and
hype. So lets stop worrying about what protocol is better
and start worrying about given a technology (SIP), how do
we make applications in the market that add to revenue streams.
Conclusion
In this tutorial, we discussed how VoIP calls are setup as
well as discussed how two different protocols for VoIP ,SIP
& H.323 behave.