Considering Opus Implementations
Opus promises to be a great tool for online audio. In technology, as in music, not all opus are implemented equally. Allow me to explain.
Earlier this week I happened into a Twitter exchange with Mike Phillips. Mike is a podcaster. VUC founder Randy Resnick has introduced us once before. Mike is seeking a replacement for the role that Skype plays in his online toolbox.
It came to light that Mike has tried to leverage various soft phones, even giving some focus to finding one that implements the Opus codec. Opus is after all, open source, the current state-of-the-art in audio codecs, and a new IETF standard. However, in Mike’s attempts to tap its potential he has to date come up short relative to Skype.
Faced with this statement I had to ask myself why? No sooner than I asked that question, the answer became obvious. Opus is not simple. Opus implementations can vary wildly. There are many factors to be considered.
The Xiph.org page on the codec lists its properties as follows:
- Bit-rates from 6 kb/s to 510 kb/s
- Sampling rates from 8 kHz (narrowband) to 48 kHz (fullband)
- Frame sizes from 2.5 ms to 60 ms
- Support for both constant bit-rate (CBR) and variable bit-rate (VBR)
- Audio bandwidth from narrowband to fullband
- Support for speech and music
- Support for mono and stereo
- Support for up to 255 channels (multistream frames)
- Dynamically adjustable bitrate, audio bandwidth, and frame size
- Good loss robustness and packet loss concealment (PLC)
- Floating point and fixed-point implementation
Compare that to a similar synopsis of the baseline HDVoice codec, the aged G.722:
- Bit rate of 64 kbps
- Sample rate of 16 KHz
- Frame size of 20ms
One set of attributes is fairly absolute. The other is in contrast an entire universe of possibilities.
Opus is starting to show up in soft phones. When it’s offered it’s there along side the more standard fare, often with no options or settings beyond merely enabled or disabled.
Clearly there’s more to it than just enabled or disabled. In such implementations there may be several properties of the codec that are being hard coded by the developer. Who knows what presumptions they made about where its use might be advantageous? Which parameters might be most ideal for any specific use case?
If an existing soft phone is architected around 16 KHz sample rates the developer might initially implement Opus in a manner that lacked the sample rate flexibility to achieve audio bandwidth beyond that supported by G.722. It would be the fastest approach given the framework of the existing codebase.
Implemented in that manner Opus would allow them to get that nominal HDVoice frequency response (50 Hz –7 KHz) at a lower bitrate than G.722, and with much better packet loss capability. Such an improvement should not be trivialized, but it may not be the solution desired by someone like Mike Phillips. After all, Skype supports “super-wideband” which is typically 50 Hz – 14 KHz.
If seeking to replace Skype he needs to find a more reliable way to deliver the same or even better audio streams. As a codec Opus can deliver upon this need, but the specific implementation needs to accommodate those goals.
In fact, there are many factors that need to be considered. Opus specifies the encoding and provides for packet loss compensation. The soft phone may require acoustic echo cancelation and noise reduction capability that is known to be effective on full-band signals. These factors are beyond the scope of the codec alone.
I intend to spend some time examining some of the soft phones that presently implement Opus to see what’s available. I’d like to answer some of Mike’s questions and see exactly how we can leverage the available tools to pass truly production quality audio over IP via convenient means.
Comments are closed.
I look forward to your review of Bria!
Would WebRTC (with Opus) be a contender in the context of a podcasting operation, assuming that ‘remote’ guests could be varying from one episode to the next? This would avoid softphone download, install and update issues where guests could join the podcast simply by pointing their Chrome or Firefox browser to a pre-shared URL. Perhaps the technology is still a little immature but the concept seems sound 😉
Maybe. Maybe not. It depends upon how the codec is implemented. Will the sample rate be adjustable or definable as something suitable? Will NAT traversal be robust & reliable? The codec alone is not a solution, although it shows great promise.