Asterisk & HDVoice: Hearing The Siren’s Song Part 2
mjgraves | October 22, 2009
In part 1 I gave you an introduction to Polycom’s Siren7 & 14 codecs, as well as a brief overview of their implementation in Asterisk v1.6. Now it makes some sense to try and understand their advantages in use. This is really a more generalized exploration of narrowband (G.711 ala PSTN) vs wideband (G.722/G.722.1) vs Super-Wideband (G.722.1C)
I set about creating a series of audio recordings to illustrate the difference between the three codecs. If Asterisk had been capable of handling all three codecs then recording samples encoded in each fashion would have been relatively simple. The trouble is that in the period leading up to Astricon I didn’t yet have a version of Asterisk capable of handling Siren streams beyond pass through.
As we detailed previously, even if I had such software it would not have handled the 32 KHz sample rate of Siren14. In creating valid samples I needed a process that would let me start with an uncompressed audio file, pass it through a call path with an arbitrary encode method, and reliably record the result.

The process I used relied upon a soft phone capable of all three codecs on my laptop. I also used a program called Voice Emotion. This software allows the me to inject any audio file into the PCs audio subsystem in advance of the soft phone. Thus I was able to “play” an uncompressed wave file into a call path.
The receiving end of the call was a Polycom VVX-1500, which is also capable of handling all three codecs.
In the case of G.711 and G.722.1 I used the built-in call recording feature to record the audio to a USB stick plugged into the VVX-1500. This feature records the stream without requiring any record level adjustment, making it absolutely consistent from file-to-file.
However, I found that the VVX-1500 was not able to sustain the full call quality of a Siren14 call stream when recording to the USB stick. So in those cases I used the 3.5mm headset out jack to feed the audio to a Zoom H2 digital audio recorder.
All the recorded examples were taken into an audio editor for basic trimming and level adjustment. The audio editing software that I have on-hand is an ancient copy of Cool Edit Pro v2.1 from Syntrillium Software. Some years ago Adobe bought Syntrillium and morphed Cool Edit Pro into Adobe Audition.

Cool Edit has a nifty, and in this case very useful display mode. It will show you the audio waveform as you would expect, but it can also display a “spectral view.” In this mode the distribution of energy by frequency is clearly visible, giving us a visual equivalent to what we’re hearing.
When in spectral display mode the Y axis of the graph reads from 0 to 22,000 Hz in all of my screen shots. This reflects the fact that I migrated all the audio samples into files sampled at 44.1 KHz, just like a commercial audio CD. Thus all the samples can be compared directly.





