Skype’s New SILK Wideband Codec Set Free

skype_logoThe release a couple of months back of the Skype v4.o client for Windows was noteworthy as the introduction of their in-house developed SILK codec. Earlier today during an eComm 2009 presentation Jonathan Christensen, Skype GM Audio & Video, announced that SILK was being released under a royalty free license.

SILK was notable as being capable of narrowband (8KHz), wide band (16KHz) and super-wideband (24KHz) sample rates. Skype claims the codec dynamically adapts both sample rate and bitrate in response to variable network quality. They have published a PDF with a very general overiew of codec performance expressed in terms of bitrates, CPU requirements and MOS scores.

This release has fairly obvious implications for the Skype For Asterisk project at Digium. It clears the way for the creation of chan_silk allowing wideband calls to pass between Skype and Asterisk. The hope is that various hardware manufacturers and SIP trunking providers will adopt the codec as well, making wideband calling more broadly available.

ipevo_s010_ga_c03With a considerable stable of hardware partners Skype will likely be able to see the codec implemented in hardware from companies like Belkin, Ipevo, Netgear, Philips & RTX Telecom. Note that none of these are names that you normally associate with SMB or enterprise telecom.

To sustain a wideband call the call path cannot touch the PSTN at all. That means IP-to-IP transit end-to-end. This is most easily done in corporate installations where voice is being integrated into the network anyway, perhaps on a WAN between regional offices. It’s harder to do with respect to the broader outside world. Hopefully there will come a day when IP-based peering will allow wholly IP based exchange of calls between parties, but that’s not the case today.

Bear in mind that others are playing this game as well. Polycom released their Siren7 & Siren14 codecs under a royalty free license last summer. There are starting to be quite a number of wideband capable codecs around but many are still saddled with burdensome licensing requirements. That alone could take them out of play in the face of newer royalty-free codecs.

It also pays to be very clear what “royalty free” actually means. You may not have to pay a per-user or per-end-point license fee, but in some cases  there may well be an considerable ante to get access to the SDK. Some do this possibly as a source of revenue, but more likely as a means of not having to deal with every hobbyist programmer who feels like writing their own soft phone this weekend. It keeps only serious players making more than casual inquiries.

Finally, there is some debate about sample rates. Sampling audio at 8 kHz results in a usable pass-band of typically 3.4 KHz, consistent with a normal G.711 call…that is, narrowband. Wideband generally considered to be sampled at 16KHz, resulting in 8 KHz pass-band. There are those who propose super-wideband 24KHz sampling. And going even further, the latest open source CELT codec supports sample rates from 32-96KHz.

The human voice typically has limited spectral content above 10 KHz so there’s a point of diminishing return with regard to ever-higher sample rates if the application in question is telephony. There are some in-the-know who argue that the difference between wideband and super-wideband is not appreciable. Of course, if the application is conveying music then that’s a whole ‘nuther matter.

With companies like Audio Codes adopting a pro-wideband stance we may finally see hardware gateways that can handle a variety of wideband codecs. It’ll certainly be interesting to see how SILK fares in the marketplace. For now the broad hardware support is behind the more established G.722 and AMR-WB, but that could change.

Resources:

Skype Blog: SILK, our super wideband audio codec, is now available for free

Skype Journal:  SILK:  Skype’s New Audio Codec Sets New Performance Standards for Voice Conversations

  • Henry Huang

    Hi, just found your blog yesterday and love the contents that you’ve put up. I have just recently answered one of my customer’s question regarding using Polycom’s new HD voice. And I told him the same thing that you described in the article. And I do believe as long as we are not using it to play music, we don’t really have a need for HD voice. Well, who knows what application or new function can come along with all this high quality voice trend.

    Well, I am currently working on some residential all in all media + home networking + blah blah blah server. And I’ve found from your “About” page that you have years of experience on media and telephony stuff. Just thought I would be exchanging idea and thoughts with you from time to time. Hopefully I will start putting my own blog soon.

  • Thanks for the commentary. I think you mistake my impression of “HD Voice.” I think it has real merit. I use Polycom IP650s in my office and really like the improved quality offered by G.722. If you can keep calls off the PSTN, say between offices over a WAN then the benefits are very real.

    I’m just not certain that sampling rates over 16 KHz offer much real improvement for voice. When Skype offers 24KHz sampling, or CELT 32KHZ sampling I wonder what application would see tangible benefits.

  • Henry Huang

    I didn’t make myself clear on this subject. I was referring to the new Polycom HD codec G722.1 that utilized 16 KHz & 32 Khz sample rate. And FreeSWITCH is the first opensource project to support the codec. You can read it here:
    http://www.freeswitch.org/node/153

    Although I have heard about the original G722 codec, but I am still not familiar with how it is comparing to G711. What’s the advantage it has over G711? Voice quality wise they both use 8 KHz sample rate right?

  • Henry Huang

    I have found the answer from your other article reviewing 2 Polycom IP phones. So even though G.711 is sample at 8Khz, but it only samples a range of 3Khz. That’s why G.722 has higher quality calls.

    • Nyquist theorum defines the relationship between sample rate of a digital signal and the usable pass-band. In the case of G.711 the sampling rate is 8 KHz, which yields a maximum possible pass-band of 4 KHz. When you take into account anti-aliasing filtering it gets rounded down to 3.4 KHz.

      In contrast, G.722 with a sample rate of 16 KHz provides pass-band of just under 8 KHz. About twice the usable bandwidth. Hence the higher call quality.