Asterisk & HDVoice: Hearing The Siren’s Song Part 2

Asterisk & HDVoiceIn part 1 I gave you an introduction to Polycom’s Siren7 & 14 codecs, as well as a brief overview of their implementation in Asterisk v1.6. Now  it makes some sense to try and understand their advantages in use. This is really a more generalized exploration of narrowband (G.711 ala PSTN) vs wideband (G.722/G.722.1) vs Super-Wideband (G.722.1C)

I set about creating a series of audio recordings to illustrate the difference between the three codecs. If Asterisk had been capable of handling all three codecs then recording samples encoded in each fashion would have been relatively simple. The trouble is that in the period leading up to Astricon I didn’t yet have a version of Asterisk capable of handling Siren streams beyond pass through.

As we detailed previously, even if I had such software it would not have handled the 32 KHz sample rate of Siren14. In creating valid samples I needed a process that would let me start with an uncompressed audio file, pass it through a call path with an arbitrary encode method, and reliably record the result.

audio workflow-500

The process I used relied upon a soft phone capable of all three codecs on my laptop. I also used a program called Voice Emotion. This software allows the me to inject any audio file into the PCs audio subsystem in advance of the soft phone. Thus I was able to “play” an uncompressed wave file into a call path.

The receiving end of the call was a Polycom VVX-1500, which is also capable of handling all three codecs.

In the case of G.711 and G.722.1 I used the built-in call recording feature to record the audio to a USB stick plugged into the VVX-1500. This feature records the stream without requiring any record level adjustment, making it absolutely consistent from file-to-file.

However, I found that the VVX-1500 was not able to sustain the full call quality of a Siren14 call stream when recording to the USB stick. So in those cases I used the 3.5mm headset out jack to feed the audio to a Zoom H2 digital audio recorder.

All the recorded examples were taken into an audio editor for basic trimming and level adjustment. The audio editing software that I have on-hand is an ancient copy of Cool Edit Pro v2.1 from Syntrillium Software. Some years ago Adobe bought Syntrillium and morphed Cool Edit Pro into Adobe Audition.

waveform-vs-spectral-display

Cool Edit has a nifty, and in this case very useful display mode. It will show you the audio waveform as you would expect, but it can also display a “spectral view.”  In this mode the distribution of energy by frequency is clearly visible, giving us a visual equivalent to what we’re hearing.

When in spectral display mode the Y axis of the graph reads from 0 to 22,000 Hz in all of my screen shots. This reflects the fact that I migrated all the audio samples into files sampled at 44.1 KHz, just like a commercial audio CD. Thus all the samples can be compared directly.

  • Thanks for doing this, I’ve been thinking about something similar for months, but kept putting it on the back burner as it was obviously a low priority. The visuals and video are great for showing customers and potential customers the real advantages of VoIP’s codec flexibility.

  • Warmbowski

    Michael, you’ve just set up and instant sales demo for Asterisk/Freeswitch system resellers and for IT folks who are pitching Asterisk or Freeswitch to their CTO’s. Thanks for doing that work and posting it online!

  • Thanks for highlighting HD voice. I have been using HD voice and FreeSWITCH for the better part of a year and it’s been wonderful! FreeSWITCH comes ready-to-run with both Polycom Siren codecs (7 and 14) and FreeSWITCH runs them in 16kHz and 32kHz beautifully. A polycom phone connected to FS is like magic – nothing to do but enjoy REALLY high quality audio with other HD-supported polycom phones.

    FYI, we do a weekly FS conf call each Friday and we’ve got people calling in from all over the world with various codecs and signaling protocols. SIP, Skype, PSTN, etc. are presented and people using Siren, CELT, Speex, G.711a/u, etc. It’s way cool. BTW, we have an audio clip of crickets chirping that we play when there’s silence. Interestingly, the narrowband codecs don’t carry the audio. You have to be on 16k, 32k, or 48k to hear it. (Yes, we have people calling in on 48kHz CELT. It sounds incredible, assuming that the audio equipment being used is up to snuff.)

    Keep up the good work!
    -MC

    • It would be interesting to know of any other software that supports CELT. Better yet, any hardware that might support it? I still fear that lack of hardware support dooms open source codecs like SPEEX & CELT to a minor role in industry.

  • Good point. The only software that I personally know of that says it supports CELT is Ekiga. However, our users report lots of headaches with Ekiga. YMMV.

    We use CELT for FS-to-FS setups mostly. Anthony Minessale and Brian K West connected their Macs together with FreeSWITCH and CELT. Tony fired up his guitar and did a solo and Brian listened to it over the Internet. (Tony was in WI, Brian in OK.) Brian said it sounded awesome. It probably helps that Tony has awesome audio equipment. 😉

    CELT probably won’t be widely used because Polycom his getting Siren out there at a price point that is difficult to ignore. Also, our experience with Speex is that it sucks up a lot of CPU relative to the bandwidth it saves. We have lots of Polycoms out in the wild with the “HDVoice” stamp on them, so I’m guessing that’s the way things are going to go.

    -MC

    • Evgeniy

      >>our experience with Speex is that it sucks up a lot of CPU relative to the bandwidth it saves.

      Now FS supports bv32 codec 8)
      X-lite 3.0 /win supports bv32 , speech, dv* too.