By Paul Colmer, Paul Colmer & Associates | October 29, 2014

I think of a dusty town, where tumbleweeds roll and the incumbent operators squint into the distance with the echo of Ennio Morricone in their ears. VoIP has come to town.

It has been said that voice over IP (VoIP) has “come of age”, that it is no longer an emerging technology since Alon Cohen, an entrepreneur, inventor and widely recognised along with Lior Haramaty as the creators of the voice over IP industry in 1994. VoIP technology took the world by storm and in 1996 we saw the emergence of the first major protocol H323, in 1999 session initiation protocol (SIP) became the standard and is still in use today. In the same year Mark Spencer developed Asterisk, the first open source telephony and VoIP platform and the ball began to roll.


In a historic case Altech, challenged in court that its Autopage subsidiary – which held a value added network services (VANS) licence should be able to build its own infrastructure. The High Court ruled in Altech’s favour and, by doing so, ruled against the status quo in South Africa’s telecoms markets.

The High Court decision meant that all VANS licence holders in South Africa, which numbered over 300, could now “self-provide” or build their own networks. They were no longer obliged to lease backbone capacity from either the country’s two fixed-line operators (Telkom and Neotel), or the three mobile network operators (Vodacom, MTN and Cell C).

The next generation network was born.

The NGN’s are licensed by the Independent Communications Authority of South Africa (ICASA), and are issued with individual electronic communications network service (I-ECNS) licences. These licences began being issued in 2008 and the old VANS licences originally held by all the early VoIP providers have been converted to I-ECNS where applicable.

Under the bonnet

VoIP is a general term for a family of methodologies, communication protocols, and transmission technologies for delivery of voice and multimedia sessions over IP networks, such as the internet. The steps involved in originating a VoIP call are signalling and media channel setup, digitisation of the analogue voice signal, compression, packetisation, and transmission as IP packets over a packet-switched network. On the receiving side similar steps reproduce the original voice stream.

VoIP systems employ session control protocols to control the set-up and tear-down of calls as well as audio codecs which encode speech allowing transmission over an IP network as digital audio via an audio stream. Codec use is varied between different implementations of VoIP (and often a range of codecs are used); some implementations rely on narrow band (G729 32 kbps) and compressed speech, while others support high definition codecs such as G722 64 kbps. Virtually all IP phones are G722 HD voice ready but sadly our networks are not.

Theoretically once a voice signal has been digitalised it should be easy to send all the packets across the internet, as the internet is a packet base transport mechanism but sadly this is not the case as the internet was not designed for voice. Most traffic on the net uses TCP/IP which works well for data as it has error correction but for streaming media, which VoIP is, it is pretty useless in a “real time”’ application. To get over this we use user datagram protocol (UDP ) which unlike TCP/IP has no ‘handshaking’ or error correction and so allows for continuous packet streaming in real time protocol (RTP) but the downside is that there is no guarantee of delivery and thus we enter into the Achilles heel of VoIP which is packet loss.

The solution to this is to make the packets very small so if we lose a few we will not experience and degradation of the voice but this comes at a price: IP overhead. So let’s break down the overhead.

Example: A call using the G.729 codec

Each RTP packet contains 20 ms of audio (typical)
Each 20 ms of audio requires 20 bytes
Each second of audio will require 50 packets, each containing an audio payload of 20 bytes. Before being transmitted over the network the IP packet will contain:

20 bytes for the audio payload
12 bytes for the RTP header
8 bytes for the UTP header
20 bytes for the IP header
…for a total of 60 bytes per packet

If the transmission medium is Ethernet, the IP packet is encapsulated in an Ethernet rame which adds an 18-byte header, for a total of 78 bytes per frame — 50 packets per second — 8 bits per byte.

This equates to 31200 bits per second or 31,2 kbps.

However, on an ATM circuit, as in DSL each IP packet of 60 bytes will first have a header and trailer added to it (typically an 8-byte header and an 8-byte trailer), and then the resulting 76 byte payload needs to be split into cells of 48 bytes each, and a 5-byte routing header added to each cell – meaning that each cell is 53 bytes in size. So the IP packet would be split across 2 cells, for a total of 53 bytes per cell — 2 cells per packet — 50 packets per second — 8 bits per byte.

This equates to 42200 bits per second, or 42,2 kbps.

This situation changes dramatically if we create a virtual private network (VPN) tunnel to get around network address translation (NAT) problems, the tunnel resolves an issue of static IP but in reality there is no such thing as a true static IP on a DSL.

The South African ISPs will create a tunnel to the client DSL router with L2TP or IPsec and then assign a fixed IP to the tunnel end-point.

The problem with this is that the overhead when using a lot of small packets is massive. If using IPsec, the overhead is 52 bytes. By contrast, an RTP packet with G.729 voice payload is 60 bytes. Assuming this running on DSL with ATM cell overheads, etc, you’re up from 42,2 kbps per call at Layer 2 to 63,6 kbps a 50% increase!

So as we can see the overhead is huge for carrying a single 8 k packet and if we did the same calculation using the G711 codec the answer comes out to 106 kbps and this is the codec that we are forced into using for IP fax. This is why the last mile connectivity is so critical to VoIP, you have to have stable connectivity that can support the overhead of these media stream or the quality will degrade.

There are a few get around tricks such as ViBE( voice over broadband enhancement) and IAX2 an Asterisk point to point protocol both use tunneling that preserves the first packet stream of 42 kbps on G729 for the VoIP call, subsequent calls as they are all going to the same place can be stripped of their routing header and latch onto the first call, so all other calls that are routed in the tunnel will use only 8,1 kbps. This translates that you then only require 256 kbps to drive 28 concurrent VoIP calls down the same tunnel. This is great technology but don’t think for one minute that it can push a DSL line to the capacity of a PRI circuit, it can resolve a lot of issues on DSL but it can’t fix the underlining instability of DSL.

The good

The fact of the matter is that the mainstream adoption of the technology is more based on the improvement of the underlining network infrastructure than advancements in VoIP itself. The rollout of FTTX networks is playing a major part in the acceptance of VoIP at last consumers can get the stable last mile connectivity that this technology requires. The SME business are now even utilising LTE as this can be a viable option in good coverage areas, the cellular networks are busy as we speak in preparation for 4G and test sites are now up and running.

The bad

The downside to this is that the FTTX and LTE rollouts are slow and very much limited to the metro areas, the other challenge is the scarcity of spectrum available to the cellular carriers and wireless providers to expand the capacity of their networks.

The ugly

So back to the dusty town and rolling tumbleweeds…

As the cellular networks gain greater coverage and speed, mobile VoIP is certainly a viable option, we already using Skype, Viber, Fringect and many people have loaded third party SIP clients as the smart phone uptake continues to grow exponentially. It not a new thing that the cellular carriers have viewed this VoIP traffic as potentially a threat to their voice traffic revenue streams and have repeatedly threatened to charge excessive data rates or shaping of the data to degrade the service. They are more than capable of identifying VoIP traffic by its UDP, port 5060 signaling and small packet nature, so as it is easy to identify it is easy to control. Using IAX2 protocol which is available on many third party SIP clients will probably fly under the radar as it is a tunneling protocol similar to a VPN.

Why the mobile VoIP question has resurfaced now is the announcement that WhatsApp is about to launch a voice service and knowing how many millions of subscribers are using it , the threat is massive and I believe the carriers are cocking their guns!

Editor’s note: VoIP is great, as the author states, if the underlying network is capable of carrying the required packets. The problem encountered is that many companies have IP networks to carry their data; and to save costs add their VoIP voice without shaping the network to give priority to packets carrying voice. The result is poor quality audio. The main culprits are the large companies tha carry their inter branch voice traffic over the data networks. The same happens with the virtual PBXs.