Asterisk & HDVoice: Hearing The Siren’s Song Part 2

Michael Graves

17 years ago

In part 1 I gave you an introduction to Polycom’s Siren7 & 14 codecs, as well as a brief overview of their implementation in Asterisk v1.6. Now it makes some sense to try and understand their advantages in use. This is really a more generalized exploration of narrowband (G.711 ala PSTN) vs wideband (G.722/G.722.1) vs Super-Wideband (G.722.1C)

I set about creating a series of audio recordings to illustrate the difference between the three codecs. If Asterisk had been capable of handling all three codecs then recording samples encoded in each fashion would have been relatively simple. The trouble is that in the period leading up to Astricon I didn’t yet have a version of Asterisk capable of handling Siren streams beyond pass through.

As we detailed previously, even if I had such software it would not have handled the 32 KHz sample rate of Siren14. In creating valid samples I needed a process that would let me start with an uncompressed audio file, pass it through a call path with an arbitrary encode method, and reliably record the result.

The process I used relied upon a soft phone capable of all three codecs on my laptop. I also used a program called Voice Emotion. This software allows the me to inject any audio file into the PCs audio subsystem in advance of the soft phone. Thus I was able to “play” an uncompressed wave file into a call path.

The receiving end of the call was a Polycom VVX-1500, which is also capable of handling all three codecs.

In the case of G.711 and G.722.1 I used the built-in call recording feature to record the audio to a USB stick plugged into the VVX-1500. This feature records the stream without requiring any record level adjustment, making it absolutely consistent from file-to-file.

However, I found that the VVX-1500 was not able to sustain the full call quality of a Siren14 call stream when recording to the USB stick. So in those cases I used the 3.5mm headset out jack to feed the audio to a Zoom H2 digital audio recorder.

All the recorded examples were taken into an audio editor for basic trimming and level adjustment. The audio editing software that I have on-hand is an ancient copy of Cool Edit Pro v2.1 from Syntrillium Software. Some years ago Adobe bought Syntrillium and morphed Cool Edit Pro into Adobe Audition.

Cool Edit has a nifty, and in this case very useful display mode. It will show you the audio waveform as you would expect, but it can also display a “spectral view.” In this mode the distribution of energy by frequency is clearly visible, giving us a visual equivalent to what we’re hearing.

When in spectral display mode the Y axis of the graph reads from 0 to 22,000 Hz in all of my screen shots. This reflects the fact that I migrated all the audio samples into files sampled at 44.1 KHz, just like a commercial audio CD. Thus all the samples can be compared directly.

Audio Sample #1: The Female Voice

The first of my audio samples is Mrs Evelyne Resnick, wife of VUC founder Randy Resnick. That Mrs Resnick is a native of France is also noteworthy. In many cases the advantages of wideband audio are very evident when dealing with cross-cultural conference calls, where heavy accents and foreign languages can obscure what’s being said.

I had Randy ask Evelyne the following question, “What’s it like to be married to a VoIP geek?” Here’s her response in English.

The video clip is a screencast of the Cool Edit Pro timeline in spectral display mode. It clearly highlights the fact that I’ve intercut three clips to create this comparative version. The Y axis of the display reads from 0 to 22,000 Hz, consistent with the fact that the clip is sampled at 44.1 KHz like a commercial audio CD.

The first portion of the clip is G.711 encoded. This is reflected by the fact that there is no energy above about 3 KHz. The middle section, being G.722 encoded, contains energy to just over 7 KHz. The final section has energy present all the way up to 14 KHz, as we’d expect from Siren14/G.722.1C encoding.

I’m not completely certain that YouTube doesn’t in some way process the audio when I upload such clips. To overcome this potential here is an MP3 encoded version of each sample, along with a screen shot to illustrate the energy distribution in each case. Click on the text label below each screen shot to play the example file.

Mrs Evelyne Resnick, English comparative version

Mrs Evelyne Resnick, English original uncompressed wave file

Mrs Evelyne Resnick, English G.722.1C encoded

Mrs Evelyne Resnick, English G.722 encoded

Mrs Evelyne Resnick, English G.711 encoded

Audio Sample #2: The Male Voice In English

The second of my audio samples is Asterisk developer Michael Iedema from the Askozia Project. Michael is an American working in a lab in Germany. This recording gave him a chance to address the crowd at Astricon, explain his project and convey his best wishes for the show.

Michael Iedema’s greeting comparative, includes: G.711, G.722 & G.722.1C

Here again are the complete voice samples encoded four different ways. They step down in quality from uncompressed wave file down, to “super-wideband” (G.722.1C), merely “wideband” (G.722.1/G.722) and finally narrowband G.711, just like what we once called “toll quality” on the PSTN of old.