Wideband Telephony: Seeing What You Hear

As expected the VUC call on Nov 7 about wideband VoIP proved very interesting. It was well attended with a dozen participants on the ZipDX wideband bridge and another twenty on the Talkshoe narrowband conference bridge. Our guest, David Frankel of ZipDX, did a good job of introducing wideband telephony, it’s advantages and some of the issues surrounding its implementation.

We recorded the call in several places so that we have both wideband and narrowband recordings available for comparison after the fact. History has shown that many people download the conference recordings, even many months after the original conference date. It’s evidence of “the long tail” phenomenon that we hear about so often.

However, some people are very visual so I thought I’d bolster the archive recordings by doing some simple visual analysis of the spectral energy distribution in each type of call. Happily, Cool Edit Pro (now Adobe Audition) makes it really simple to generate both waveform and spectral views of an audio clip.

I thought that three examples would be most illustrative; instrumental guitar, a female voice and a male voice.

I started by cutting down the main body of the conference to just the short (45 sec) clip of instrumental guitar that I inserted into the conference. I cut down the same section from both the G.711 and G.722 recordings. I also normalized both clips so there would be very little difference in level.

Each of these two clips is shown below. Click on any image to see a full sized version of the display. Click on the text to invoke an MP3 player and hear the clip.

Music inserted into G.711 encoded call (click to play the file)
Music inserted into G.722 encoded (click to play the file)
Spectral display of the guitar clip played into the call. The first half is the G.711 encoded stream, the second half is the G.722 encoded call. Note the dramatically increased energy above 4 kHz in the wideband (right) portion of the display.

Next, I isolated that portion of the call where Randy’s wife, Evelyne, made a momentary guest appearance. She joined the call using their Siemens S675IP cordless SIP/DECT phone, which is G.722 capable.

Female voice on G.711 encoded (click to play the file)
Female voice on G.722 encoded (click to play the file)
Comparative spectral display of Evelyne’s voice. The first half is the G.711 encoded stream, the second half is the G.722 encoded call.

My final example is a male voice, that of VUC regular Karl Fife making mention of the ENUM tree. I picked Karl because he was on the wideband bridge, so he was in both recordings. Also because I dislike the sound of my own voice.

Male voice on G.711 encoded (click to play the file)
Male voice on G.722 encoded (click to play the file)
Comparative spectral display of Karl’s voice. The first half is the G.711 encoded stream, the second half is the G.722 encoded call.

In all three examples we can plainly see the greater energy distribution above 3 kHz. This is consistent with the technical properties of the codecs. The plots of energy vs frequency just gives image to what we can plainly hear. The wideband  sound is dramatically better, easier to listen to, and something that we should definitely strive for as telecom continues to evolve beyond the PSTN.

The very astute amongst you will want to know that these sample files were cut down from the MP3 format recordings offered at the VUC community web site. Thus they have been edited and exported as MP3s at the same bit rate and sample rate as the original files. While the process is lossy every file was processing the exactly the same manner.